- Implemented an alternative to pidfd_open for kernels older than 5.3.
mldr should send a "lifetime pipe" to darlingserver during process start.
When the process dies, darlingserver should receive a POLLHUP event.
- Set increased_limit.rlim_cur to default_limit.rlim_max on systems without
/proc/sys/fs/nr_open. On WSL1, this greatly increases the number of open file
descriptors available.
- For systems without NSpid in /proc/self/status, implemented a way to manage
thread IDs in darlingserver during checkin. darlingserver should receive a hint
address on the thread's stack, and then compare it with a stack pointer retrieved using
PTRACE_GETREGS
- Avoided sending socket messages when msg_hdr.msg_name->sun_path is an empty string.
A null msg_name is used instead, otherwise, on some systems, this would fail with EINVAL.
Unmanaged calls are those that can come from unmanaged processes,
i.e. processes that the server does not control. They can also come from
managed processes, but they don't have to.
This commit does not introduce any unmanaged calls, however.
During local development, I created one and later decided to discard it.
However, this does seem like a useful feature, so it's being added with
this commit.
When a microthread went to sleep with a continuation, we discarded its
call. This would lead to the call being disposed before we had a chance
to reply to it. Instead, now we keep a reference to it in the thread
until we send a reply for it.
This RPC call gives the caller a socket to which it can write to to log
to the server's log stream.
This is used to give userspace a place to put messages for "/dev/console".
Most notably, launchd tries to log to this device for important log
messages. This allows us to capture those messages.
The most important change here is the ability to perform `mmap` and `munmap` in managed Darling processes. This is enabled via the new S2C call system.
Other notable changes:
* Move the server socket to the prefix root because launchctl clears `var/run` on startup
* Create an IPC importance structure for each duct-taped task; this is required by `ipc_importance_send`
* Initialize the MPSC thread deallocation daemon; this is also used by turnstiles
* Clean up a thread's timers and waits when destroying it
* Check whether we should actually block in `thread_block_parameter` before doing so; this helps avoid missed wakeups
* Support creating kernel threads without immediately starting them
* Update a thread's address when receiving a message from it; this fixes an issue with keeping an outdated thread address when a process performs an exec (since we re-use its main thread)
Largely just ported over from the current LKM code.
Also, set XNU_TARGET_OS_OSX=1 to fix an incorrect default setting that was causing Mach messaging to fail trying to send a task control port (the task self port).
Additionally, with regards to RPC:
* Send architecture information along with RPC calls
* Log replies sent on the server side
* Allow replies expecting FDs to handle the case when no valid FDs were sent back
In our previous in-kernel kqueue implementation, we followed suit with newer macOS version and dropped support for NOTE_TRACK and NOTE_CHILD. This implementation, however, reintroduces support for those flags to allow for backwards compatiblity with older software that makes use of these features.
Additionally, the RPC wrappers have been fixed to allow either side (both the client and the server) to specify a negative value for an FD parameter (in order to leave it absent and avoid actually sending an FD).
This is almost a direct port of what we were previously doing in the LKM, except that we need to use duct-taped semaphores in order to put the calling microthread to sleep (rather than a real semaphore that would put the worker thread to sleep).
Mach port kqchannels allow libkqueue to listen in on Mach port events that happen on the server side. The implementation consists of a socket pair used to communicate between the client and the server for that particular channel.
When the server receives an event, it sends a notification message on the socket, which makes the socket readable to the client and thus wakes up epoll. When the client is ready to read the event, it sends a message to the server asking it to read the event and send back the necessary data.
This is done this way (rather than proactively sending the event data over the socket) to closely mimic the actual process that kevent does when reading events. This is even more important when the client specifies MACH_MSG_RECEIVE, which asks the server to try to receive the message directly into a buffer (if there is enough space) when reading the event. In that case, we would *definitely* not want to read the Mach message before the client is actually ready to do so, as it could starve others from reading the Mach message while the client hasn't even acknowledged the event yet--or worse, the client could have died before reading our event and that message is now lost forever.
In other news, many different parts of the code have been updated to function properly now.
For example, all of the direct Mach traps can call thread_syscall_return now. This allows things like semaphores to work.
Timers (with timer_call) are also working now and have been tested in conjunction with timed semaphore waits.
Threads are now able to impersonate other threads for the purposes of running duct-taped code. The primary use case for this is for running code in a kernel microthread but pretending to be user microthread (e.g. kqchan does this). This makes current_map() and friends return the information for the thread we're impersonating (useful for e.g. copyout).
This commit adds a bunch of RPC calls, mostly XNU trap calls (calls that go directly to duct-taped XNU Mach trap calls).
The wrapper generator can now automatically generate server-side wrapper/boilerplate code these XNU trap calls.
These calls have not yet been tested and some (most of the non-IPC calls) probably require functions that haven't been implemented yet.
Implement some general RPC calls (corresponding to calls from the LKM): mach_port_deallocate, thread_set_handles, uidgid (a combination of get_uidgid and set_uidgid), and vchroot.
Additionally, we now have some RPC calls that do pass descriptors. Surprisingly, the code I had previously written was *almost* functional (just 2 minor generation and compilation errors). However, that code has now been tested for sending FDs from clients to the server, not vice versa, so that other direction might have issues.
Additionally, a few fixes have been made in the duct-tape code. For example, tasks now handle audit and security tokens like we used to do in the LKM. They also properly initialize and destroy their semaphore queues. Both threads and tasks now properly free their allocated structures.
More importantly, threads and tasks are now properly destroyed. In order to do this, a "kernel" microthread had to be introduced to perform "kernel" work from the managing code (since certain duct-tape destruction operations expect to be running in a microthread context). Additionally, this had to be an additional microthread because the managing code can't use thread calls, since those already expect a microthread context.
The Server can now easily monitor arbitrary descriptors using Monitors. Process monitoring has been converted to this system as well.
Most importantly, however, is that we can now detect `execve`s. libsystem_kernel opens a close-on-exec pipe and sends the read end to the server. When `execve` succeeds, the pipe is simply closed. When `execve` fails, libsystem_kernel writes a single byte to the pipe and then closes it. On the server side, we listen for a hang-up (this indicates the write end of the pipe has been closed). If we are able to read a byte, we know the execve failed; otherwise, if we read nothing (EOF), then we know it succeeded.
Together with changes in libsystem_kernel, this commit allows startup to progress to vchroot doing its thing (vchrooting) and then executing launchd. launchd then proceeds to die when trying to open a kqueue (as this still uses the LKM API).
Most of the newly added functions are just stubs for MIG calls. However, we now properly initialize IPC and related subsystems and we now have copyin/copyout that allows basic `mach_msg_overwrite_trap` usage.
dyld now progresses to `getHostInfo` and successfully retrieves `host_info` with a kernel MIG call (and then proceeds to die on `mach_port_deallocate`, since it hasn't been updated yet).