felix86 26.04

This month we reworked signal handling, made optimizations, and fixed bugs!

Signal rewrite

A lot of programs use signals internally: Runtimes may use them for garbage collection or thread preemption. Wine uses them to emulate asynchronous procedure calls. Just-in-time recompilers use them for self-modifying code detection. Even something like a child process exiting may send a signal to the parent. A lot of times, programs install dedicated signal handlers to handle these signals. However, these signal handlers are simply pointers to x86 code, which needs to be recompiled.

The problem mostly arises with asynchronous signals. Asynchronous signals may happen at any point in time. That could be when you are midway through executing RISC-V instructions that emulate an x86 instruction. Not all translations are 1-to-1, so if a signal happens in the middle of a sequence of RISC-V instructions that emulates a single x86 instruction, you get torn state. Signals may also happen when you’re not executing translated code, such as when you are recompiling. This isn’t good, because at that point in time locks may be held, and if you were to handle the guest signal immediately you may never return to the C++ code at the time of the signal, causing the lock to never be unlocked. Let alone having to worry about keeping your signal handlers async-signal-safe. Masking may be a solution to this, however you need to be careful to perform masking on every JIT exit, it may be expensive, and you still didn’t solve the torn state problem.

With this new version of felix86, we defer signals to an instruction called a safepoint. A safepoint instruction is simply a memory store to a dedicated page. When there’s unmasked deferred signals, safepoints will fault, because we protect the page the safepoint writes to. In the safepoint’s signal handler, we prepare the guest signal frame in the guest stack, set the arguments, and return from the host signal handler to the dispatcher to execute the guest signal handler as if it’s just another piece of code to recompile. When the signal returns, it will fetch the return address from the frame and all the register state will be correct and not torn. When there’s no unmasked deferred signals, the safepoint write just goes through. Safepoints exist at the start of each basic block, and right after a syscall for cases like sigsuspend which may need to handle a signal immediately.

This improves signal stability. There were cases where an asynchronous signal could mess up our program if we were unlucky enough for it to happen during an unfortunate point. Additionally, it allows us to run Golang apps without disabling asynchronous thread preemption. Golang apps send a SIGURG signal very frequently, which would trip up our previous signal handling implementation.

More reading on signal emulation: https://felix86.com/docs/devs/signals/

Fix inaccurate flag optimization

We omit flag calculations for instructions that calculate unused flags. However, if an instruction that calculates flags is followed by an instruction that may calculate flags or may leave them untouched, and that instruction is followed by an instruction that uses those flags, the first instruction needs to calculate the flags because we don’t know at compile time whether they will be used or overwritten.

An example is an add followed by a shl. A shl will calculate some flags and overwrite the ones calculated by add, but only when shifting by a non-zero amount. If it’s a shift by register we can’t know at compile-time whether it will calculate flags, so the add still needs to emit the flag calculations.

This was not handled properly, and this behavior was found in strstr-sse2-unaligned. This assembly path was only enabled when AVX support is enabled, since it relaxes alignment requirements, so it went under the radar for a long while. It is now fixed.

MPSADBW optimization

The MPSADBW and VMPSADBW instructions were formerly using functions rather than RVV assembly. They have now been optimized to use RVV assembly. For programs that make heavy usage of these instructions, such as ffmpeg with svt-av1, there’s a 30% performance improvement by using the RVV assembly version!

Disable TSO emulation for stack accesses

Since each thread has its own stack, in most well behaving programs a thread won’t access another’s stack. For this reason, and because stack accesses are some of the most common memory accesses, we don’t emit fence instructions for their TSO emulation by default.

Thanks for reading this post.

If you like this project, please give us a star on Github: https://github.com/OFFTKP/felix86

Written on April 1, 2026