The most important change here is the ability to perform `mmap` and `munmap` in managed Darling processes. This is enabled via the new S2C call system.
Other notable changes:
* Move the server socket to the prefix root because launchctl clears `var/run` on startup
* Create an IPC importance structure for each duct-taped task; this is required by `ipc_importance_send`
* Initialize the MPSC thread deallocation daemon; this is also used by turnstiles
* Clean up a thread's timers and waits when destroying it
* Check whether we should actually block in `thread_block_parameter` before doing so; this helps avoid missed wakeups
* Support creating kernel threads without immediately starting them
* Update a thread's address when receiving a message from it; this fixes an issue with keeping an outdated thread address when a process performs an exec (since we re-use its main thread)
Mach port kqchannels allow libkqueue to listen in on Mach port events that happen on the server side. The implementation consists of a socket pair used to communicate between the client and the server for that particular channel.
When the server receives an event, it sends a notification message on the socket, which makes the socket readable to the client and thus wakes up epoll. When the client is ready to read the event, it sends a message to the server asking it to read the event and send back the necessary data.
This is done this way (rather than proactively sending the event data over the socket) to closely mimic the actual process that kevent does when reading events. This is even more important when the client specifies MACH_MSG_RECEIVE, which asks the server to try to receive the message directly into a buffer (if there is enough space) when reading the event. In that case, we would *definitely* not want to read the Mach message before the client is actually ready to do so, as it could starve others from reading the Mach message while the client hasn't even acknowledged the event yet--or worse, the client could have died before reading our event and that message is now lost forever.
In other news, many different parts of the code have been updated to function properly now.
For example, all of the direct Mach traps can call thread_syscall_return now. This allows things like semaphores to work.
Timers (with timer_call) are also working now and have been tested in conjunction with timed semaphore waits.
Threads are now able to impersonate other threads for the purposes of running duct-taped code. The primary use case for this is for running code in a kernel microthread but pretending to be user microthread (e.g. kqchan does this). This makes current_map() and friends return the information for the thread we're impersonating (useful for e.g. copyout).
Implement some general RPC calls (corresponding to calls from the LKM): mach_port_deallocate, thread_set_handles, uidgid (a combination of get_uidgid and set_uidgid), and vchroot.
Additionally, we now have some RPC calls that do pass descriptors. Surprisingly, the code I had previously written was *almost* functional (just 2 minor generation and compilation errors). However, that code has now been tested for sending FDs from clients to the server, not vice versa, so that other direction might have issues.
Additionally, a few fixes have been made in the duct-tape code. For example, tasks now handle audit and security tokens like we used to do in the LKM. They also properly initialize and destroy their semaphore queues. Both threads and tasks now properly free their allocated structures.
More importantly, threads and tasks are now properly destroyed. In order to do this, a "kernel" microthread had to be introduced to perform "kernel" work from the managing code (since certain duct-tape destruction operations expect to be running in a microthread context). Additionally, this had to be an additional microthread because the managing code can't use thread calls, since those already expect a microthread context.
The Server can now easily monitor arbitrary descriptors using Monitors. Process monitoring has been converted to this system as well.
Most importantly, however, is that we can now detect `execve`s. libsystem_kernel opens a close-on-exec pipe and sends the read end to the server. When `execve` succeeds, the pipe is simply closed. When `execve` fails, libsystem_kernel writes a single byte to the pipe and then closes it. On the server side, we listen for a hang-up (this indicates the write end of the pipe has been closed). If we are able to read a byte, we know the execve failed; otherwise, if we read nothing (EOF), then we know it succeeded.
Together with changes in libsystem_kernel, this commit allows startup to progress to vchroot doing its thing (vchrooting) and then executing launchd. launchd then proceeds to die when trying to open a kqueue (as this still uses the LKM API).