This is actually a valid state for `thread_block_parameter` to enter.
If the caller gave us a continuation but we were unable to wait, we
should simply invoke the continuation with the wait result, much like
we would if we were returning the result.
The main debugging code added is for keeping track of port names and
their associated IPC objects, as well as keeping track of the members of
port sets.
Additionally, when extended debugging is enabled, the server can now
wait for a debugger with the new env var `DSERVER_WAIT4DEBUGGER`.
The AsyncWriter class was originally written for some additional
debugging code I wrote but later decided wouldn't really be useful.
I kept the AsyncWriter class, however, as it seems it might be useful
for future code (it's basically fire-and-forget asynchronous writing).
Note that it has not been tested at all.
We now handle the sigexc calls as normal calls, with the exception that
it's okay for them to become active while another call was active.
We also set the thread's wait result to THREAD_INTERRUPTED and handle
syscall returns in interrupted continuations by jumping back to the
sigexc_enter code.
It doesn't support memory sharing or copying to a map other than the
current task yet. However, the LKM didn't support the latter case either,
so the only thing we're really missing is the ability to create a shared
region from a previously private one.
This function works properly now.
Additionally, add some more debug code to Mach port kqchannels,
but remove some debug code from `misc.c` (it causes issues with ASAN).
So, when I discovered the mistake in `knote_post`, at first,
I was baffled at why we were even getting events if we were checking for
them and returning if there *were* events.
As it turns out, because the event checks for single ports and portsets
were switched, `dtape_kqchan_mach_port_has_events` was always returning `false`.
I fixed the issue there before I found the issue in `knote_post`,
so when I did that, suddenly, `knote_post` wasn't getting generating any
events. "WTF," I wondered. I spent *hours* trying to figure out why the
hell `knote_post` wouldn't return any events. Turns out that I was
missing an exclamation mark.
In the end, I decided to remove that check altogether, since the worst
that would happen is we have a spurious event and notify userspace and
userspace asks for an event when we don't have one, which we just tell them.
And with this commit, we can now reliably and consistently enter a shell
using darlingserver! 🎉
Thanks to the recently enabled assertions, it turns out that we actually need to disable IPC in tasks and threads before terminating it.
Additionally, let's lock threads when we should and not do so when we shouldn't. This should fix some thread synchronization issues I was running into.
* Delete duct-tape/Makefile (it's a copy of the LKM's Makefile)
* Add a slash between the prefix directory and the socket filename so the socket is actually *in* the prefix and not next to it.
* Disable console logging by default and introduce a `DSERVER_LOG_STDERR` env var to optionally enable it
* EPIPE means the peer of the first message in a message queue died, so just drop the message; the main event loop will soon notice (via pidfd) that the peer died.
We use a kernel waiter thread to wait on the port set waitq. This is probably too much just for waiting on a port set; however, this is the best non-invasive solution. The other way to do this would be to modify the mqueue code to perform KNOTE on a port's port set (which is not easy to find; you'd have to walk through the port's waitq's waitq_sets looking for something that looks like a portset waitq_set).
Additionally, when a peer asks to read a Mach port kqchan and there are no events available, report it like we do for process kqchannels.
S2C calls were always failing because `_s2cPerform` was moving `_s2cReply` into a local variable (as it should) but then using `_s2cReply` (which is invalidated by the move) for error checking and returning that value instead of the moved local variable.
Also, copyinmap/copyout had the order of the arguments to memmove mixed up for the kernel_map case.
Kernel threads can now be created and started in two separate actions (and it actually works now).
Also, this means we can remove the stupid hack we in dtape_thread_enter (that didn't even work); we should always clear TH_WAIT when the thread is going to run.
The most important change here is the ability to perform `mmap` and `munmap` in managed Darling processes. This is enabled via the new S2C call system.
Other notable changes:
* Move the server socket to the prefix root because launchctl clears `var/run` on startup
* Create an IPC importance structure for each duct-taped task; this is required by `ipc_importance_send`
* Initialize the MPSC thread deallocation daemon; this is also used by turnstiles
* Clean up a thread's timers and waits when destroying it
* Check whether we should actually block in `thread_block_parameter` before doing so; this helps avoid missed wakeups
* Support creating kernel threads without immediately starting them
* Update a thread's address when receiving a message from it; this fixes an issue with keeping an outdated thread address when a process performs an exec (since we re-use its main thread)
Largely just ported over from the current LKM code.
Also, set XNU_TARGET_OS_OSX=1 to fix an incorrect default setting that was causing Mach messaging to fail trying to send a task control port (the task self port).
Additionally, with regards to RPC:
* Send architecture information along with RPC calls
* Log replies sent on the server side
* Allow replies expecting FDs to handle the case when no valid FDs were sent back
This is almost a direct port of what we were previously doing in the LKM, except that we need to use duct-taped semaphores in order to put the calling microthread to sleep (rather than a real semaphore that would put the worker thread to sleep).
Mach port kqchannels allow libkqueue to listen in on Mach port events that happen on the server side. The implementation consists of a socket pair used to communicate between the client and the server for that particular channel.
When the server receives an event, it sends a notification message on the socket, which makes the socket readable to the client and thus wakes up epoll. When the client is ready to read the event, it sends a message to the server asking it to read the event and send back the necessary data.
This is done this way (rather than proactively sending the event data over the socket) to closely mimic the actual process that kevent does when reading events. This is even more important when the client specifies MACH_MSG_RECEIVE, which asks the server to try to receive the message directly into a buffer (if there is enough space) when reading the event. In that case, we would *definitely* not want to read the Mach message before the client is actually ready to do so, as it could starve others from reading the Mach message while the client hasn't even acknowledged the event yet--or worse, the client could have died before reading our event and that message is now lost forever.
In other news, many different parts of the code have been updated to function properly now.
For example, all of the direct Mach traps can call thread_syscall_return now. This allows things like semaphores to work.
Timers (with timer_call) are also working now and have been tested in conjunction with timed semaphore waits.
Threads are now able to impersonate other threads for the purposes of running duct-taped code. The primary use case for this is for running code in a kernel microthread but pretending to be user microthread (e.g. kqchan does this). This makes current_map() and friends return the information for the thread we're impersonating (useful for e.g. copyout).
This commit adds a bunch of RPC calls, mostly XNU trap calls (calls that go directly to duct-taped XNU Mach trap calls).
The wrapper generator can now automatically generate server-side wrapper/boilerplate code these XNU trap calls.
These calls have not yet been tested and some (most of the non-IPC calls) probably require functions that haven't been implemented yet.
Implement some general RPC calls (corresponding to calls from the LKM): mach_port_deallocate, thread_set_handles, uidgid (a combination of get_uidgid and set_uidgid), and vchroot.
Additionally, we now have some RPC calls that do pass descriptors. Surprisingly, the code I had previously written was *almost* functional (just 2 minor generation and compilation errors). However, that code has now been tested for sending FDs from clients to the server, not vice versa, so that other direction might have issues.
Additionally, a few fixes have been made in the duct-tape code. For example, tasks now handle audit and security tokens like we used to do in the LKM. They also properly initialize and destroy their semaphore queues. Both threads and tasks now properly free their allocated structures.
More importantly, threads and tasks are now properly destroyed. In order to do this, a "kernel" microthread had to be introduced to perform "kernel" work from the managing code (since certain duct-tape destruction operations expect to be running in a microthread context). Additionally, this had to be an additional microthread because the managing code can't use thread calls, since those already expect a microthread context.
The Server can now easily monitor arbitrary descriptors using Monitors. Process monitoring has been converted to this system as well.
Most importantly, however, is that we can now detect `execve`s. libsystem_kernel opens a close-on-exec pipe and sends the read end to the server. When `execve` succeeds, the pipe is simply closed. When `execve` fails, libsystem_kernel writes a single byte to the pipe and then closes it. On the server side, we listen for a hang-up (this indicates the write end of the pipe has been closed). If we are able to read a byte, we know the execve failed; otherwise, if we read nothing (EOF), then we know it succeeded.
Together with changes in libsystem_kernel, this commit allows startup to progress to vchroot doing its thing (vchrooting) and then executing launchd. launchd then proceeds to die when trying to open a kqueue (as this still uses the LKM API).
Most of the newly added functions are just stubs for MIG calls. However, we now properly initialize IPC and related subsystems and we now have copyin/copyout that allows basic `mach_msg_overwrite_trap` usage.
dyld now progresses to `getHostInfo` and successfully retrieves `host_info` with a kernel MIG call (and then proceeds to die on `mach_port_deallocate`, since it hasn't been updated yet).