Seccomp is a relatively complex feature that was added to Linux back in 2005, and was further extended in 2013 to support BPF based protections. Once seccomp is enabled, you can no longer disable seccomp but additional protections can be placed on top of existing seccomp filters. Additionally seccomp filters are inherited in child processes, which ensures the process tree can't escape from the secure computing environment through child processes. The basis of this feature is a shim that lives between userspace and the kernel at the syscall entrypoint. In "strict" mode, seccomp only allows read, write, exit, exit_group, and {rt_,}sigreturn to function. When in "filter" mode, a BPF filter is run on syscall entrypoint and returns state about if the syscall should be allowed or not. Multiple filters can be installed in this mode, all of which get executed. The result that is the most restricted is the action that occurs at the end. There are some significant limitations in filter mode that must be adhered to which makes executing this code inside of kernel space a non-issue and effectively limits how much cpu time is spent in the filters. Although these filters are free to do basically anything with the provided data, just can't do any loops. FEX needs to implement seccomp because there are multiple applications using the feature, the primary one being Chromium which some games embed without disabling the sandbox. WINE also uses seccomp for capturing games that do raw Windows system calls. Apparently Red Dead Redemption is one of the games that requires this. While FEX implements seccomp, it is not yet all encompassing, which is one of the reasons why it isn't enabled by default and requires a config option. **seccomp_unotify is not implemented** This is a relatively new feature for seccomp which lets the seccomp filter signal an FD for multiple things. Luckily Chromium and WINE don't use this. This will be tricky to implement under FEX since it requires ioctl trapping and some other behaviour **ptrace isn't supported** One feature of seccomp is that it can raise ptrace events. Since FEX doesn't support ptrace at all, this isn't handled. Again Chromium and WINE don't use this. **kill-thread not quite correct** This isn't directly related to seccomp but more about how we do thread shutdown in FEX. This will require some more changes around thread state tracking before fully supporting this. Chromium and WINE don't use this. kill-process also falls under this Features that are supported: - Strict mode and seccomp-bpf mode supported - All BFP instructions that seccomp-bpf understands - Inheriting seccomp through execve - This means we serialize and deserialize the calling thread's seccomp filters - An execve that escapes FEX will also escape seccomp. Not much we can do about it - TSync - Allowing post-mortem seccomp insertion which allows threads to synchronize seccomp filters after the fact Features that are not supported: - Different arch qualifiers depending on syscall entrypoint - Just like our syscall handler, we are hardcoded to the arch that the application starts with - user_notif - ptrace - Runtime code cache invalidation when seccomp is installed - Currently we must ensure all syscalls go through the frontend syscall handler - Runtime invalidation of code cache with inline syscalls will get fixed in the future. This currently isn't enabled by default because of the minor feature problems that haven't been resolved. Currently the Linux Kernel's test application works for the features that FEX supports, and WINE's usage can be handled by FEX. Chromium's sandbox doesn't yet work with this PR, but it only fails due to features unrelated to seccomp. Having this open for merging now so we can work to resolve the remaining issues without this bitrotting.
FEX - Fast x86 emulation frontend
FEX allows you to run x86 and x86-64 binaries on an AArch64 host, similar to qemu-user and box86. It has native support for a rootfs overlay, so you don't need to chroot, as well as some thunklibs so it can forward things like GL to the host. FEX presents a Linux 5.0+ interface to the guest, and supports only AArch64 as a host. FEX is very much work in progress, so expect things to change.
Quick start guide
For Ubuntu 20.04, 21.04, 21.10, 22.04
Execute the following command in the terminal to install FEX through a PPA.
curl --silent https://raw.githubusercontent.com/FEX-Emu/FEX/main/Scripts/InstallFEX.py --output /tmp/InstallFEX.py && python3 /tmp/InstallFEX.py && rm /tmp/InstallFEX.py
This command will walk you through installing FEX through a PPA, and downloading a RootFS for use with FEX.
Ubuntu PPA is updated with our monthly releases.
For everyone else
Please see Building FEX.
Getting Started
FEX has been tested to build and run on ARMv8.0+ hardware. ARMv7 hardware will not work. Expected operating system usage is Linux. FEX has been tested with Ubuntu 20.04, 20.10, and 21.04. Also Arch Linux.
On AArch64 hosts the user MUST have an x86-64 RootFS Creating a RootFS.
Navigating the Source
See the Source Outline for more information.
Building FEX
Follow the guide on the official FEX-Emu Wiki here.
RootFS generation
AArch64 hosts require a rootfs for running applications. Follow the guide on the wiki page for seeing how to set up the rootfs from scratch https://wiki.fex-emu.com/index.php/Development:Setting_up_RootFS