Fixes out of bounds issues when running Darling on a device/virutal machine that reports more then 64 cores. Fixes#7
Co-Authored-By: Janrupf <business.janrupf@gmail.com>
This new tool (`dserverdbg`) runs on the host but connects to
darlingserver and makes unmanaged calls to retrieve debugging
information.
The initial set of subcommands available in this tool are `ps`,
`lsport`, `lspset`, and `lsmsg`:
* `ps` lists processes currently registered with the server and how
many Mach ports they have
* `lsport` lists the ports of a given process (via PID) and their
rights and messages counts (for receive rights)
* `lspset` lists the members of a given portset (via PID and port
name) and provides the same information about each port as `lsport`
* `lsmsg` lists the messages of a given port (via PID and port name),
providing sender PID (if available) and size
This tool may be expanded later to allow e.g. modifying logging settings
while darlingserver is running or perhaps searching through and
filtering the logs.
Code was copied over from xnu/osfmk/vm/vm_user.c. I'll admit that I'm not sure if this is the right approach, but it seems to allow me to progress with debugging notifyd.
One significant change made here is that lck_mtx structures now directly
contain the internals of dtape_mutex structures. This was changed
because the old way of storing in a malloc'ed object led to memory leaks.
The problem is that there's a lot of XNU code that uses simple locks and
does not destroy them (because it doesn't need to in the XNU
implementation). Since the only structure that really cares about the
lock size is the waitq structure, we just patch that up. Besides, we
had modified the waitq structure in the LKM before and nothing blew up,
so this should be fine.
We were previously always updating the timer deadline. This meant that,
when a later deadline than the current one came along, we would update
the deadline to the later one. In effect, we were scheduling a timer for
the latest deadline available rather than the earliest.
The fix involves keeping track of the current deadline and not updating
it if the new deadline is later than the current one. There is an option
to override this behavior, however, because sometimes the timer_call code
changes the deadline on us to a later time and we *do* want to update it
when it tells us to do so explicitly. For example, the deadline returned
by timer_queue_expire is definitive: that's definitely the next deadline
we want. The deadline passed to timer_queue_assign, on the other hand,
is merely is a suggestion.
This fixes some crashes with syslogd because the mqueue was vanishing
and calling knote_vanish, indicating its klist was going to be emptied.
However, since we weren't storing this flag in the knote,
filt_machportdetach thought the knote was still attached and tried to
detach it, causing a NULL pointer access.
What this means is that we no longer release and destroy Thread and
Process instances when the threads and processes they manage die.
Instead, we keep them alive to perform some cleanup (like finishing
active calls).
This should fix the duct-tape panic where threads and tasks are still
referenced at death.
Best of all, there don't seem to be any leaks with this approach: for
each `process dying` or `thread dying` message in the log, there's a
`process being destroyed` or `thread being destroyed` message later
on. This means we're not leaking any processes or threads.
This commit allows Darling processes to convert private memory in other
Darling processes into shared memory that they can access. This is
necessary, e.g. for LLDB.
They were using the current task, but that's not always the case.
LLDB, for example, calls mach_vm_region_recurse with the map of the task
it's debugging.
This is only a subset of its actual behavior, but this is all that the
LKM supported and everything (read: LLDB) seemed to run fine with that,
so that should be enough for us as well.
This is actually a valid state for `thread_block_parameter` to enter.
If the caller gave us a continuation but we were unable to wait, we
should simply invoke the continuation with the wait result, much like
we would if we were returning the result.
The main debugging code added is for keeping track of port names and
their associated IPC objects, as well as keeping track of the members of
port sets.
Additionally, when extended debugging is enabled, the server can now
wait for a debugger with the new env var `DSERVER_WAIT4DEBUGGER`.
The AsyncWriter class was originally written for some additional
debugging code I wrote but later decided wouldn't really be useful.
I kept the AsyncWriter class, however, as it seems it might be useful
for future code (it's basically fire-and-forget asynchronous writing).
Note that it has not been tested at all.
We now handle the sigexc calls as normal calls, with the exception that
it's okay for them to become active while another call was active.
We also set the thread's wait result to THREAD_INTERRUPTED and handle
syscall returns in interrupted continuations by jumping back to the
sigexc_enter code.