felix86 25.11

This month felix86 was presented at the RISC-V Summit 2025 North America, and we added support for SSE 4.2 and made optimizations!

RISC-V Summit North America

The presentation “RISC-V for gaming: Emulating x86 on RISC-V” showcased the internals of the felix86 emulator, particularly the translation process. If you missed it, you can watch a recording below!

SSE 4.2

The instructions missing for SSE 4.2 were implemented. This includes horizontal instructions like PHADDW/PHADDD/... from SSSE 3. Applications that optionally use these instructions may see a performance improvement.

The x86 extension PCLMUL was also implemented. This extension adds a single instruction, PCLMULQDQ, which performs carryless multiplication. In RISC-V land we have CLMUL and CLMULH to achieve the same effect. Currently, support for this extension is disabled by default but can be enabled with the environment variable FELIX86_PCLMULQDQ.

Additionally, the CRC32 instruction from SSE 4.2 was implemented. RISC-V doesn’t have a dedicated CRC32 instruction (and even if it did, it’s unlikely it would use the same polynomial), but it’s possible to perform a CRC32 using the carryless multiplication instructions. There’s an Intel white paper describing how PCLMULQDQ can be used to calculate a CRC32 with any polynomial, and there’s also a blog post by merryhime for further reading. This allows us to implement the CRC32 instruction while taking advantage of the carryless multiplication instructions, thus greatly reducing instruction count.

More flag optimizations

It used to be the case that felix86 would omit flag calculations only in the current block. This would mean that whenever a block ends, the flags of the latest instruction that changed them need to be calculated in software. However, it is often the case that all targets of a branch don’t make use of the flags. A new optimization was made which allows checking the branch targets for flag usage. This works especially well in hot loops, since it shaves many instructions off the end of the block.

This optimization is also disabled by default and can be enabled with the environment variable FELIX86_SCAN_AHEAD_MULTI.


Thanks for reading this post.

If you like this project, please give us a star on Github: https://github.com/OFFTKP/felix86

Written on November 2, 2025