<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://felix86.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://felix86.com/" rel="alternate" type="text/html" /><updated>2026-03-04T19:58:20+00:00</updated><id>https://felix86.com/feed.xml</id><title type="html">felix86</title><subtitle>Run x86 and x86-64 games on RISC-V</subtitle><entry><title type="html">felix86 26.03</title><link href="https://felix86.com/felix86-26-03/" rel="alternate" type="text/html" title="felix86 26.03" /><published>2026-03-01T00:00:00+00:00</published><updated>2026-03-01T00:00:00+00:00</updated><id>https://felix86.com/felix86-26-03</id><content type="html" xml:base="https://felix86.com/felix86-26-03/"><![CDATA[<p>This month we added AVX, AVX2, BMI1 and F16C support!</p>

<h2 id="avxavx2-support-is-here">AVX/AVX2 support is here!</h2>

<p>This version of felix86 emulates all AVX and AVX2 instructions using RVV 1.0! Thanks to the <a href="https://github.com/FEX-Emu/FEX">FEX-Emu</a> test suite, each instruction is extensively tested and all tests pass.</p>

<p>AVX is an x86 SIMD extension that adds 256-bit vectors and new instructions. Additionally, it introduces a new version for most of the SSE instructions that can use the 256-bit vectors and perform unaligned access, amongst other benefits.</p>

<p>Overall, <strong>288 new instruction handlers</strong> were added, and <strong>333 single-instruction tests</strong> pass. Binary tests can now detect AVX support and pass.</p>

<h3 id="fun-parts">Fun parts</h3>

<p>Here’s some fun parts of AVX emulation.</p>

<h4 id="vzeroall-vzeroupper">VZEROALL, VZEROUPPER</h4>

<p>The RVV 1.0 statically allocated registers for YMM0-YMM15 were changed to v16-v31. This means we can use LMUL=8 to perform <code class="language-plaintext highlighter-rouge">VZEROALL</code> as two grouped <code class="language-plaintext highlighter-rouge">VXOR</code> instructions, and <code class="language-plaintext highlighter-rouge">VZEROUPPER</code> as two grouped and masked <code class="language-plaintext highlighter-rouge">VXOR</code> instructions.</p>

<h4 id="group-writebackrestore">Group writeback/restore</h4>

<p>Since the <code class="language-plaintext highlighter-rouge">XmmReg</code> struct is now 256 bits, we can writeback and restore the XMM state in two LMUL=8 stores/loads, in VLEN=256 systems. This may improve performance in some applications that end up exiting to the dispatcher too often.</p>

<h4 id="vgather">VGATHER</h4>

<p>The <code class="language-plaintext highlighter-rouge">VGATHER</code> set of instructions in x86 (not to be confused with RISC-V’s <code class="language-plaintext highlighter-rouge">VRGATHER</code>) are basically indexed loads that work slightly differently from RISC-V. The indices may be optionally scaled and may need to be sign-extended before passed to the equivalent RISC-V instruction.</p>

<p>With the <code class="language-plaintext highlighter-rouge">Zicclsm</code> extension, which is mandatory in RVA23, indices that end up performing a misaligned load are supported, without needing to resort to a potentially slower instruction sequence.</p>

<p>Currently, there’s no support for <code class="language-plaintext highlighter-rouge">VGATHER</code> instructions that may fault during one of the element loads. We’ll implement this feature when there’s a program that relies on it.</p>

<h3 id="annoying-parts">Annoying parts</h3>

<p>Below are a couple annoyances with emulating AVX on RVV.</p>

<h4 id="no-zeroing-of-agnostic-bits">No zeroing of agnostic bits</h4>

<p>RVV 1.0 has two modes of dealing with tail and mask bits: Undisturbed and agnostic.</p>

<p>Undisturbed mode doesn’t affect the tail and mask bits. This is helpful for SSE emulation, since the legacy SSE instructions don’t modify the top 128 bits of the YMM registers.</p>

<p>Agnostic mode does something weird. Based on implementation, it will work the same as undisturbed, or <em>replace the bits with ones</em>. In version RVV 0.7.1, the bits were replaced with zeroes. This was changed in RVV 1.0 to replacing with ones, with the justification:</p>

<blockquote>
  <p>The value of all 1s instead of all 0s was chosen for the overwrite value to discourage software developers from depending on the value written.</p>
</blockquote>

<p>This is unfortunate, because it means that there’s no instruction-free way of zeroing the tail or mask bits. Thus, performing the zeroing requires 1-3 extra instructions.</p>

<p>This zeroing of tail and mask bits is important for emulating AVX instructions that deal with XMMs, as they need to zero the upper 128 bits. It is also important for instructions like <code class="language-plaintext highlighter-rouge">VMASKMOVPD</code>, which zero the masked elements. It will also be important when/if we emulate AVX-512, as every instruction can perform similar zeroing masking.</p>

<p>Unfortunately, since RVV 1.0 is ratified, changing the agnostic bits to be zeroes instead of ones is not possible for software compatibility reasons. We hope there’s a mode that allows for zeroing agnostic bits in a future version, perhaps via a bit in vtype.</p>

<h4 id="cumbersome-shuffles">Cumbersome shuffles</h4>

<p><code class="language-plaintext highlighter-rouge">VRGATHER</code> is a powerful instruction in RISC-V, however it has some limitations. The iota cannot be an immediate, except in the case of <code class="language-plaintext highlighter-rouge">VRGATHER.VI</code>, which can only be used to broadcast a single element. The iota also cannot be picked from bits of a GPR. This means the iota needs to be loaded from a literal pool in memory in most cases.</p>

<p>An optimal instruction would be able to perform shuffles between two sources, using an immediate iota, similar to x86 shuffles. This would almost certainly need to be a &gt;32-bit instruction, which RISC-V is capable of.</p>

<p>Until then, we need to use literal pools and two <code class="language-plaintext highlighter-rouge">VRGATHER</code> instructions to emulate most <code class="language-plaintext highlighter-rouge">VSHUFPS</code> instructions.</p>

<h2 id="bmi1-support">BMI1 support</h2>

<p>BMI1 is a bit manipulation extension. Most of its instructions don’t match one-to-one with RISC-V equivalents, but are relatively simple to emulate.</p>

<h2 id="f16c-support">F16C support</h2>

<p>F16C is an extension that adds conversions from 16-bit floats to 32-bit floats. It is equivalent to the <code class="language-plaintext highlighter-rouge">Zvfhmin</code> extension, and is now supported when the extension is present, which is mandatory in RVA23.</p>

<h2 id="flatpaks">Flatpaks</h2>

<p>This month we also added initial support for Flatpaks!</p>

<p><img src="/images/hytale.webp" width="800" style="display: block; margin: 10px auto" />
<em>The Hytale launcher, which is a Flatpak, running with felix86</em></p>

<p>Unfortunately there may be unrelated bugs with Hytale, causing it to not work. Nevertheless, Flatpaks should be able to be installed now.</p>

<p>In <code class="language-plaintext highlighter-rouge">felix86 --shell</code>, use <code class="language-plaintext highlighter-rouge">flatpak install --user /path/to/my/game.flatpak</code> to install and <code class="language-plaintext highlighter-rouge">flatpak run com.MyOrg.MyGame</code> to run your Flatpak app. System-wide installations are not tested.</p>

<h3 id="preliminary-seccomp-support">Preliminary seccomp support</h3>

<p>Flatpaks needed seccomp to run, particularly the filter mode. The BPF is now supported and recompiled to RISC-V and validates each syscall. The support is limited to the scope of supporting Flatpaks, and not all filters are supported.</p>

<h2 id="aes-support">AES support</h2>

<p>The AES extension in RISC-V adds hardware acceleration for AES encryption and decryption. It is now supported in felix86!</p>

<p>The encryption instructions <code class="language-plaintext highlighter-rouge">AESENC</code> and <code class="language-plaintext highlighter-rouge">AESENCLAST</code> match up perfectly with the RISC-V equivalents. The decryption instruction <code class="language-plaintext highlighter-rouge">AESDECLAST</code> also matches up 1-to-1.</p>

<p>The <code class="language-plaintext highlighter-rouge">AESDEC</code> instruction performs <code class="language-plaintext highlighter-rouge">InvMixColumns</code> and <code class="language-plaintext highlighter-rouge">AddRoundKey</code> in the opposite order that <code class="language-plaintext highlighter-rouge">VAESDM.VV</code> does. My initial solution was to apply <code class="language-plaintext highlighter-rouge">MixColumns</code> to the key (by applying <code class="language-plaintext highlighter-rouge">InvMixColumns</code> via <code class="language-plaintext highlighter-rouge">AES64IM</code> 3 times in a row for each 64-bit word in the key) and then use <code class="language-plaintext highlighter-rouge">VAESDM.VV</code>.</p>

<p>As <a href="https://github.com/camel-cdr">camel-cdr</a> pointed out, there’s a much better solution, which is to use <code class="language-plaintext highlighter-rouge">VAESDF.VV</code> with a key of zeroes to apply the <code class="language-plaintext highlighter-rouge">InvShiftRows</code> and <code class="language-plaintext highlighter-rouge">InvSubBytes</code> transformations without <code class="language-plaintext highlighter-rouge">AddRoundKey</code>. Then, using a specific constant in <code class="language-plaintext highlighter-rouge">VAESDM.VV</code> can isolate the <code class="language-plaintext highlighter-rouge">InvMixColumns</code> transformation, and finally <code class="language-plaintext highlighter-rouge">AddRoundKey</code> can be done manually with a <code class="language-plaintext highlighter-rouge">XOR</code>. Overall, a great improvement over my initial solution.</p>

<p><code class="language-plaintext highlighter-rouge">VAESDM.VV</code> can also be used in this way to emulate <code class="language-plaintext highlighter-rouge">AESIMC</code>, thus not needing the scalar <code class="language-plaintext highlighter-rouge">AES64IM</code> instruction.</p>

<p><code class="language-plaintext highlighter-rouge">AESKEYGENASSIST</code> doesn’t match 1-to-1 with any instruction. It is possible to emulate some of its functionality with <code class="language-plaintext highlighter-rouge">VAESKF1.VI</code>, but until we encounter it in a game it is emulated in software.</p>

<h2 id="optimizations">Optimizations</h2>

<p>Felix86 is &gt;1.5 years old now, and some of the older instruction handlers were quite bad. One such example was the <code class="language-plaintext highlighter-rouge">PHMINPOSUW</code> instruction. While it would use <code class="language-plaintext highlighter-rouge">VREDMINU</code> to find the minimum element, a slow loop was used to find its index. When thinking about implementing <code class="language-plaintext highlighter-rouge">VPHMINPOSUW</code> and looking at my <code class="language-plaintext highlighter-rouge">PHMINPOSUW</code> implementation, I realized I can simply use <code class="language-plaintext highlighter-rouge">VFIRST</code> to find the position of the minimum element, vastly improving the instruction performance.</p>

<p>Another example is <code class="language-plaintext highlighter-rouge">UNPCKHPS</code>, which would use an ugly sequence of <code class="language-plaintext highlighter-rouge">VRGATHER</code> instructions, while widening adds and slides suffice.</p>

<hr />

<p>Thanks for reading this post.</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[This month we added AVX, AVX2, BMI1 and F16C support!]]></summary></entry><entry><title type="html">felix86 26.02</title><link href="https://felix86.com/felix86-26-02/" rel="alternate" type="text/html" title="felix86 26.02" /><published>2026-02-01T00:00:00+00:00</published><updated>2026-02-01T00:00:00+00:00</updated><id>https://felix86.com/felix86-26-02</id><content type="html" xml:base="https://felix86.com/felix86-26-02/"><![CDATA[<p>We have a few exciting news this month!</p>

<h2 id="vulkan-x11-thunking-support">Vulkan X11 thunking support</h2>

<p>The native Vulkan userspace driver can now be used on X11 with thunking. We also implemented the signatures for some missing extensions used by DXVK, which means DXVK on Wine can now use Vulkan thunking. This can be enabled on 64-bit games with <code class="language-plaintext highlighter-rouge">FELIX86_ENABLED_THUNKS=vk felix86 --shell</code>, as it is currently disabled by default.</p>

<p>Thunking improves performance and compatibility and reduces time spent on recompilation.</p>

<p><img src="/images/witcher3-dxvk.png" width="800" style="display: block; margin: 10px auto" />
<em>Witcher 3 with DXVK and Vulkan thunking on Milk-V Jupiter</em></p>

<h3 id="zink-support">Zink support</h3>
<p>The Vulkan extension signatures necessary for Zink were also added. For ease of use, there’s now a <code class="language-plaintext highlighter-rouge">zink</code> profile that enables thunking and Zink. Test with <code class="language-plaintext highlighter-rouge">FELIX86_PROFILE=zink felix86 --shell</code>.</p>

<h2 id="spacemit-k3">SpacemiT K3</h2>
<p>SpacemiT was kind enough to give us early SSH access to a Linux environment with SpacemiT K3 to test felix86 and run benchmarks!</p>

<h3 id="hardware">Hardware</h3>
<p>SpacemiT K3 is RVA23 compliant, which means we get access to vector crypto (important for x86 AES &amp; others), unaligned non-atomic accesses supported in hardware (necessary, see next month’s post as to why), Zvfhmin which we’ll eventually use for F16C, Zvbb which allows for some optimizations in our SIMD code, and Zfa which also allows for some optimizations.</p>

<p>Additionally, RVA23 covers the extensions necessary for felix86 operation, which are RVV 1.0 and Zba/Zbb/Zbs.</p>

<h3 id="benchmarks">Benchmarks</h3>

<p>The benchmarks we tried ran a lot faster on the SpacemiT K3 than on the K1!</p>

<p>First, the x86-64 7z benchmark run through felix86 26.02 on RISC-V hardware:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K1, x86-64 7z
                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       2081   567    357   2024  |      41069   746    469   3502
23:       2040   579    359   2079  |      42298   780    469   3659
24:       2042   589    373   2196  |      41572   779    468   3648
25:       2049   606    386   2340  |      40578   779    464   3611
----------------------------------  | ------------------------------
Avr:      2053   585    369   2160  |      41379   771    468   3605
Tot:             678    418   2882
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K3, x86-64 7z
                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       7392   579   1243   7192  |     123233   778   1351  10508
23:       7199   590   1244   7336  |     121907   784   1345  10545
24:       7248   609   1280   7794  |     119814   783   1343  10513
25:       7232   622   1328   8258  |     117826   787   1331  10484
----------------------------------  | ------------------------------
Avr:      7268   600   1274   7645  |     120695   783   1343  10513
Tot:             691   1308   9079
</code></pre></div></div>

<p>As you can see, on SpacemiT K3 the 7z benchmark run through felix86 gets a <strong>3.54x</strong> speedup on average in compression and a <strong>2.91x</strong> speedup in decompression compared to the same benchmark through felix86 on SpacemiT K1.</p>

<p>Next, the x86 stockfish benchmark (run as <code class="language-plaintext highlighter-rouge">stockfish bench</code> inside <code class="language-plaintext highlighter-rouge">felix86 --shell</code>).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K1, x86-64 Stockfish 17.1, felix86 26.02
Total time (ms) : 99364
Nodes searched  : 2030154
Nodes/second    : 20431
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K3, x86-64 Stockfish 17.1, felix86 26.02
Total time (ms) : 24425
Nodes searched  : 2030154
Nodes/second    : 83117
</code></pre></div></div>

<p>With SpacemiT K3 the Stockfish benchmark is <strong>4.07x faster</strong>.</p>

<p>An interesting observation, is that the native RISC-V Stockfish runs slower than the x86-64 Stockfish through felix86:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K3, RISC-V Stockfish 17.1
Total time (ms) : 42895
Nodes searched  : 2030154
Nodes/second    : 47328
</code></pre></div></div>

<p>This isn’t too unexpected, as the x86-64 Stockfish is significantly more optimized with assembly routines, and the RISC-V version should catch up in the future, but it shows how felix86 can translate the optimized x86-64 routines of Stockfish to improve performance.</p>

<p>Next, a run of <code class="language-plaintext highlighter-rouge">node -e "console.log('hello')"</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K1, x86-64 Node v24.13.0
2.72s user 0.70s system 110% cpu 3.100 total
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K3, x86-64 Node v24.13.0
real    0m1.408s
user    0m1.156s
sys     0m0.400s
</code></pre></div></div>

<p>Finally, a run of <code class="language-plaintext highlighter-rouge">ffmpeg -f lavfi -i testsrc=duration=10:size=1920x1080:rate=30 -c:v libx264 -benchmark -preset medium -f null -</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K1, x86-64 ffmpeg N-122467-gc3d3377fe1-20260116
bench: utime=612.988s stime=6.089s rtime=99.349s
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// SpacemiT K3, x86-64 ffmpeg N-122467-gc3d3377fe1-20260116
bench: utime=175.554s stime=1.746s rtime=32.661s
</code></pre></div></div>

<p>The benchmark completes three times faster.</p>

<div class="alert alert-info" role="alert"><i class="fa fa-info-circle"></i> <b>Note:</b> These benchmarks are using the <code class="language-plaintext highlighter-rouge">FELIX86_UNSAFE_FLAGS</code> option, which is safe in ABI conforming programs.</div>

<h3 id="a100-cores">A100 cores</h3>

<p>The SpacemiT K3 has 8 X100 RVA23 cores with VLEN=256 and 8 A100 non-RVA23 cores with VLEN=1024. Due to differences in VLEN, the same process can’t be scheduled on X100 and A100 cores. Nevertheless, felix86 can work on the VLEN=1024 cores.</p>

<p>To make any child processes get sent to the A100 cores, you need to write the PID of the current process to <code class="language-plaintext highlighter-rouge">/proc/set_ai_thread</code>. In many shell programs, such as bash or zsh, this can be done with <code class="language-plaintext highlighter-rouge">echo $$ &gt; /proc/set_ai_thread</code>.</p>

<p>Running a felix86 instance will now show us we have a VLEN of 1024:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Extensions enabled for the recompiler: g,v1024,c,b,zicond,zfa,zvbb,zvkned
</code></pre></div></div>

<p>Which means it should be possible to start some x86 processes on these cores through felix86 to squeeze some extra compute power out of the system if the X100 cores are fully occupied.</p>

<h3 id="final-thoughts">Final thoughts</h3>

<p>Performance has significantly improved with the SpacemiT K3, so we’re very excited to see it in boards and test games with it.</p>

<p>It seems that the SpacemiT K3 has a 39-bit address space, same as the SpacemiT K1. A 48-bit address space would be nice in future SoCs, as it enables emulation of some programs that rely on the larger address space that x86 offers. One example is PS4 emulators like <a href="https://github.com/shadPS4-emu/shadPS4">shadPS4</a>.</p>

<p>The <code class="language-plaintext highlighter-rouge">Zacas</code> extension is missing and it is important for x86-64 emulation on RISC-V. It gives us a way to do 128-bit CAS, which is heavily used by Unity. When absent, we use a global lock for all <code class="language-plaintext highlighter-rouge">CMPXCHG16B</code> operations, which is not actually atomic with regards to other memory operations. As we get faster cores, it is likely that the lack of <code class="language-plaintext highlighter-rouge">Zacas</code> leads to crashes in x86-64 games that use <code class="language-plaintext highlighter-rouge">CMPXCHG16B</code>. It is a relatively new extension so it’s understandable for it to not be implemented yet.</p>

<p>Some unaligned atomic support would be nice, such as <code class="language-plaintext highlighter-rouge">Zama16b</code>.</p>

<p>Both <code class="language-plaintext highlighter-rouge">Zacas</code> and <code class="language-plaintext highlighter-rouge">Zama16b</code> are optional extensions for RVA23, so we hope to see them in a future core.</p>

<p>Overall, a great improvement in performance. Looking forward to seeing how games perform under felix86!</p>

<h2 id="compatible-with-rva23">Compatible with RVA23</h2>

<p>After a few bug fixes, felix86 now officially supports RVA23 hardware with VLEN=256. We also ran tests on QEMU with <code class="language-plaintext highlighter-rouge">cpu=max</code>, and after some more bug fixes there’s support for VLEN=128 with all the extensions that QEMU supports.</p>

<h2 id="miscallaneous-niceties">Miscallaneous niceties</h2>

<h3 id="install-script-improvements">Install script improvements</h3>

<p>The felix86 install script (<code class="language-plaintext highlighter-rouge">bash &lt;(curl -s https://install.felix86.com)</code>) now supports installing a specific commit that was made in the last 90 days.</p>

<p>Additionally, some oversights in rootfs installation were addressed. The rootfs is now correctly owned by root, and only the <code class="language-plaintext highlighter-rouge">$HOME</code> folder inside the rootfs is owned by the user.</p>

<h3 id="documentation">Documentation</h3>

<p>We now have documentation at <a href="https://felix86.com/docs">https://felix86.com/docs</a>, for developers and users alike. Check it out!</p>

<h3 id="rss-feed">RSS feed</h3>

<p>There’s now an RSS feed available at <a href="https://felix86.com/feed.xml">https://felix86.com/feed.xml</a>.</p>

<h3 id="bluesky">Bluesky</h3>

<p>There’s now a Bluesky account for felix86 at <a href="https://bsky.app/profile/felix86-emu.bsky.social">@felix86-emu</a>.</p>

<hr />

<p>Thanks for reading this post.</p>

<p>Big things are coming next month. Stay tuned!</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[We have a few exciting news this month!]]></summary></entry><entry><title type="html">felix86 26.01</title><link href="https://felix86.com/felix86-26-01/" rel="alternate" type="text/html" title="felix86 26.01" /><published>2026-01-01T00:00:00+00:00</published><updated>2026-01-01T00:00:00+00:00</updated><id>https://felix86.com/felix86-26-01</id><content type="html" xml:base="https://felix86.com/felix86-26-01/"><![CDATA[<p>Happy new year! This month we addressed a few nasty bugs.</p>

<h2 id="new-logo">New logo</h2>

<p>Felix86 has a new logo!</p>

<p><img src="https://cdn.felix86.com/images/felix86.png" width="400" style="display: block; margin: 10px auto" /></p>

<p>For more info about the logo and the artist, visit this page:  <br />
<a href="https://github.com/felix86-emu/site/wiki/Logo">https://github.com/felix86-emu/site/wiki/Logo</a></p>

<h2 id="apt-fixes">apt fixes</h2>

<p>For a long time <code class="language-plaintext highlighter-rouge">apt</code> would be flaky on felix86. Packages would install, but often you would see a Segmentation fault happen after they finish installing, which was unpleasant and confusing.</p>

<p>Since <a href="https://github.com/OFFTKP/felix86/issues/371">an issue was opened</a> for this, we decided to take a closer look. It turns out it was an issue with how we handle the exit syscall when it is called from inside a signal handler. This would also cause issues in the <code class="language-plaintext highlighter-rouge">dash</code> shell, which will longjmp out of the signal handler without ever calling <code class="language-plaintext highlighter-rouge">sigreturn</code>, and when it was time to exit the same bug would appear. This is now fixed and <code class="language-plaintext highlighter-rouge">apt</code> no longer crashes on exit!</p>

<p>Another bug the issue report mentions is <code class="language-plaintext highlighter-rouge">apt-key</code> exiting unexpectedly. The reason for this bug has to do with how we handled config files in felix86. Each child instance would try to access the <code class="language-plaintext highlighter-rouge">config.json</code> file to read the configuration. However, if a child process doesn’t have access to the user’s home directory then felix86 will exit during startup. This is fixed by passing configurations to child processes via environment variables.</p>

<p>With these fixes, <code class="language-plaintext highlighter-rouge">apt update</code> and <code class="language-plaintext highlighter-rouge">apt install</code> can now cleanly install packages on your rootfs:
<img src="/images/apt.webp" width="700" style="display: block; margin: 10px auto" />
<em>Installing <code class="language-plaintext highlighter-rouge">cowsay</code> in the rootfs thru felix86</em></p>

<h2 id="game-fixes">Game fixes</h2>

<h3 id="genshin-impact">Genshin Impact</h3>

<p>Genshin Impact had a couple issues that prevented it from loading. It is not currently playable, as the 2FA messagebox is now a Chromium-based browser which seems to have some issue in felix86 that needs further investigation.</p>

<h4 id="16-bit-bswap">16-bit bswap</h4>

<p>Genshin Impact would use the bswap instruction with an operand size override prefix. This is not defined in the Intel manual, but is not an illegal instruction on real hardware. This instruction is now implemented.</p>

<h4 id="memory-usage-reduction">Memory usage reduction</h4>

<p>Each thread would use significant amounts of memory which did not have a positive impact on performance. One such area was a 2^20 entry address cache, allowing for lookups of recently translated guest addresses using their bottom 20 bits. This was reduced to 2^16 entries, using the bottom 16 bits. Each entry would be 16 bytes, containing a guest and host address pair. Thus, in total, this reduces memory usage by 15 MiB per thread. Additionally, the maximum code cache size was changed from 64 MiB to 32 MiB. The initial code cache size was also decreased from 8 MiB to 4 MiB. This means that applications that spawn a large amount of threads will use significantly less memory, thus starting up faster and not leading to OOM problems.</p>

<p>This change shaves off 2 GiB of memory usage from Genshin Impact.</p>

<h3 id="deponia">Deponia</h3>

<p>We have a custom allocator for the 32-bit address space that is used for mmap/mremap/munmap/… syscalls. This is necessary even for 64-bit applications, because of the x86-only mmap flag <code class="language-plaintext highlighter-rouge">MAP_32BIT</code>. We would previously not correctly mark the executable itself as allocated if it was loaded in the 32-bit address space, which would cause problems because the kernel and the emulator would disagree on allocatable regions.</p>

<p>With this bug fixed, the game Deponia now works.</p>

<h2 id="execution-profiles">Execution profiles</h2>

<p>felix86 can now load different “execution profiles”, specified by the <code class="language-plaintext highlighter-rouge">FELIX86_PROFILE</code> environment variable. Each profile is a partial or full configuration file, which will override the default configuration file. This allows for easily selecting a different array of configurations to do different things. You can have a profile that disables unsafe optimizations, or a profile that enables debugging options.</p>

<p>Additionally, this version introduces a configuration that will pass any specified environment variables to the guest program, separated by a semicolon. For example, <code class="language-plaintext highlighter-rouge">FELIX86_ENVIRONMENT=WINEDEBUG=+seh;GALLIUM_HUD=fps</code> will pass <code class="language-plaintext highlighter-rouge">WINEDEBUG=+seh</code> and <code class="language-plaintext highlighter-rouge">LD_DEBUG</code> to the guest program, without the emulator being affected by these environment variables. This feature allows execution profiles to define certain environment variables.</p>

<p>This feature can be used to automatically apply a bunch of different environment variables and felix86 configurations to games.</p>

<h2 id="repl-color-coding">REPL color coding</h2>

<p>The felix86 REPL can now color code each translation for easily understanding which RISC-V instruction corresponds to which x86 instruction. This can be enabled with the <code class="language-plaintext highlighter-rouge">color</code> command.</p>

<p><img src="/images/replColor.webp" width="500" style="display: block; margin: 10px auto" />
<em>Each translation is a different color</em></p>

<hr />

<p>Thanks for reading this post.</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Happy new year! This month we addressed a few nasty bugs.]]></summary></entry><entry><title type="html">felix86 25.12</title><link href="https://felix86.com/felix86-25-12/" rel="alternate" type="text/html" title="felix86 25.12" /><published>2025-12-01T00:00:00+00:00</published><updated>2025-12-01T00:00:00+00:00</updated><id>https://felix86.com/felix86-25-12</id><content type="html" xml:base="https://felix86.com/felix86-25-12/"><![CDATA[<p>This month we got a lot of work done. Performance improvements, quality of life changes, and a bunch of testing.</p>

<h2 id="quality-of-life-changes">Quality of life changes</h2>

<h3 id="trusted-directories">Trusted directories</h3>

<p>In the past, felix86 would only allow you to run applications that exist inside the rootfs. This is because filesystem syscalls are containerized to only allow access inside the rootfs, which is where the x86 libraries and binaries are. This isn’t very user friendly, because it requires copying or moving everything you want to run inside the rootfs. With felix86 25.12, you can run x86 programs outside the rootfs. The first time you do so, felix86 will prompt you whether you want to add that directory to the <strong>trusted directories</strong>. This will allow felix86 to access this directory and its subdirectories and files without it being inside the rootfs.</p>

<p><img src="/images/guiPrompt.png" width="500" style="display: block; margin: 10px auto" />
<em>Running an x86 executable from outside the rootfs for the first time</em></p>

<p>For more info and technical details, <a href="https://github.com/OFFTKP/felix86/pull/345">refer to the pull request</a>.</p>

<h3 id="remove-felix86-mounter">Remove <code class="language-plaintext highlighter-rouge">felix86-mounter</code></h3>

<p>Previously, an executable with higher privileges called <code class="language-plaintext highlighter-rouge">felix86-mounter</code> would be responsible for mounting <code class="language-plaintext highlighter-rouge">/dev</code>, <code class="language-plaintext highlighter-rouge">/proc</code> and other directories inside the rootfs. Since we introduced fake mounts with trusted directories, these directories are now fake mounted inside the rootfs so there’s no dependency on <code class="language-plaintext highlighter-rouge">felix86-mounter</code>. This means that <strong>felix86 can now run without any root privileges!</strong></p>

<h3 id="--shell-argument"><code class="language-plaintext highlighter-rouge">--shell</code> argument</h3>

<p>You can now use <code class="language-plaintext highlighter-rouge">felix86 --shell</code> to enter the rootfs, similarly to doing <code class="language-plaintext highlighter-rouge">felix86 /rootfs/bin/bash</code> but with <code class="language-plaintext highlighter-rouge">FELIX86_QUIET</code> set and a nice prompt.</p>

<h3 id="repl-environment">REPL environment</h3>

<p>A lot of felix86 work is optimizing the various x86 instructions into an optimal RISC-V instruction pattern. This version of felix86 introduces a REPL environment that allows easily viewing what each x86 instruction translates to.</p>

<p><img src="/images/repl.png" width="400" style="display: block; margin: 10px auto" />
<em>You can also compile instruction sequences by separating them with semicolons!</em></p>

<p>It allows for viewing how an x86 instruction compiles on 32-bit and 64-bit programs, and you can also disable flag generation. Additionally there’s the option of compiling multiple instruction sequences to view how multiple instructions interact with each other.</p>

<p><img src="/images/replCmov.webp" width="300" style="display: block; margin: 10px auto" />
<em>Opcode fusing at work, with <code class="language-plaintext highlighter-rouge">FELIX86_FUSE_OPCODES=1</code></em></p>

<div class="alert alert-info" role="alert"><i class="fa fa-info-circle"></i> <b>Note:</b> This is disabled by default, compile with <code class="language-plaintext highlighter-rouge">BUILD_REPL=1</code> to enable it</div>

<h2 id="even-more-optimizations">Even more optimizations</h2>

<h3 id="scan-ahead-multiple-blocks">Scan ahead multiple blocks</h3>

<p>The <code class="language-plaintext highlighter-rouge">FELIX86_SCAN_AHEAD_MULTI</code> option is now enabled by default. For a refresher on what it does, check <a href="https://felix86.com/felix86-25-11/">the felix86 25.11 post</a>.</p>

<p>It can now work with the <code class="language-plaintext highlighter-rouge">FELIX86_UNSAFE_FLAGS</code> option, which makes the recompiler not calculate flags for blocks ending in <code class="language-plaintext highlighter-rouge">call</code> or <code class="language-plaintext highlighter-rouge">ret</code>. This helps with blocks that have a conditional jump to a <code class="language-plaintext highlighter-rouge">ret</code> instruction and a block that overwrites flags. In these cases, the flags don’t need to be calculated as long as the program is ABI conforming. This flag is disabled by default for now, as it may potentially break programs.</p>

<h3 id="code-cache-is-placed-near-the-executable">Code cache is placed near the executable</h3>

<p>Constructing big immediates is a pain in RISC-V and it can take a hard-to-guess number of instructions. Most compilers have multiple paths and may even make multiple attempts at generating the best instruction sequence when loading a big immediate.</p>

<p>One source of big immediates in our case is x86 rip-relative access. We now place the recompiled code cache near the executable, so that most rip-relative addresses can be constructed with an <code class="language-plaintext highlighter-rouge">AUIPC</code>+<code class="language-plaintext highlighter-rouge">ADDI</code> combo. This only needs to be done for 64-bit programs, as 32-bit programs have smaller addresses that can already be constructed in two instructions.</p>

<h3 id="instruction-optimizations">Instruction optimizations</h3>

<p>This version comes with more instruction optimizations, such as optimizing the <code class="language-plaintext highlighter-rouge">PMADDWD</code> instruction, <code class="language-plaintext highlighter-rouge">BSWAP</code>, 32-bit effective address generation, and more.</p>

<h4 id="tzcnt-and-lzcnt">TZCNT and LZCNT</h4>

<p>The <code class="language-plaintext highlighter-rouge">TZCNT</code> behavior matches nicely with RISC-V’s <code class="language-plaintext highlighter-rouge">CTZ</code> instruction. When the source operand is 0, <code class="language-plaintext highlighter-rouge">TZCNT</code> will set the destination operand to the operand width. RISC-V does the same with <code class="language-plaintext highlighter-rouge">CTZ</code> and <code class="language-plaintext highlighter-rouge">CTZW</code>, so translation is now 1-to-1 (excluding flags). Previously we would use a branch to check if the source operand is zero, but this is unnecessary.</p>

<p><code class="language-plaintext highlighter-rouge">TZCNT</code> can also operate on 16-bit operands. There’s no <code class="language-plaintext highlighter-rouge">CTZH</code> instruction, but only two instructions are needed to emulate it. One is a <code class="language-plaintext highlighter-rouge">BSETI</code> to set the 16th bit, followed by a <code class="language-plaintext highlighter-rouge">CTZW</code>. This way if the low 16 bits of the source are zero, the <code class="language-plaintext highlighter-rouge">CTZW</code> will return 16 since the 16th bit is set.</p>

<p><code class="language-plaintext highlighter-rouge">LZCNT</code> was optimized in a similar fashion using <code class="language-plaintext highlighter-rouge">CLZ</code> and <code class="language-plaintext highlighter-rouge">CLZW</code>.</p>

<h2 id="requirement-upgrade">Requirement upgrade</h2>

<p>The felix86 requirements used to be <code class="language-plaintext highlighter-rouge">rv64gv</code>, with <code class="language-plaintext highlighter-rouge">RVV 1.0</code>. As we get closer to RVA23 compatible chips hitting the market, this requirement is upped to <code class="language-plaintext highlighter-rouge">rv64gvb</code>, where <code class="language-plaintext highlighter-rouge">b</code> is <code class="language-plaintext highlighter-rouge">Zba/Zbb/Zbc/Zbs</code>. It is highly unlikely we’ll get any new high-performance cores that have RVV 1.0 but no bit-manipulation instructions, so this is a pretty easy requirement.</p>

<h2 id="other-improvements">Other improvements</h2>

<p>The felix86 install script has been improved. It is now hosted at <code class="language-plaintext highlighter-rouge">install.felix86.com</code> and its source code can be found at <a href="https://github.com/felix86-emu/install">https://github.com/felix86-emu/install</a>.</p>

<h3 id="rootfs">Rootfs</h3>

<p>New rootfs options are now available. You can choose the <code class="language-plaintext highlighter-rouge">No Wine</code> rootfs, which saves ~1.6 GiB of space by not having wine installed, in case you don’t need to run Windows apps. There’s also the <code class="language-plaintext highlighter-rouge">Tiny</code> rootfs, which has just the bare minimum, weighing in at ~150 MiB uncompressed.</p>

<p>There’s plans for a future version of felix86 that allows running without a rootfs. This would require you to manually install the x86 libraries yourself and set <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code> accordingly. It would also give the emulated executable unrestricted access to your filesystem.</p>

<h2 id="testing">Testing</h2>

<p>Our CI was extended with <a href="https://github.com/felix86-emu/binary_tests">a bunch of new tests</a>. These will verify correctness and massively help with catching regressions.</p>

<p>Currently we use prebuilt tests from GCC, Valgrind and libuv. In the future, we want to expand the testing infrastructure with tests from Gvisor, POSIX, Node, Chromium, and more.</p>

<h2 id="benchmarks">Benchmarks</h2>

<p>We ran GeekBench 6 with <strong>felix86 25.11</strong> and <strong>felix86 25.12</strong>. This month’s version is around 6% faster, with some benchmarks being up to ~20% faster. <a href="https://browser.geekbench.com/v6/cpu/compare/15058133?baseline=14829077">Here is the full benchmark comparison</a>. Keep in mind that benchmarks are not really indicative of gaming performance, however it is nice to see that our performance is improving each month.</p>

<hr />

<p>Thanks for reading this post. See you in 2026!</p>

<p>If you want to support this project, please consider donating: <a href="https://ko-fi.com/felix86">https://ko-fi.com/felix86</a></p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[This month we got a lot of work done. Performance improvements, quality of life changes, and a bunch of testing.]]></summary></entry><entry><title type="html">felix86 25.11</title><link href="https://felix86.com/felix86-25-11/" rel="alternate" type="text/html" title="felix86 25.11" /><published>2025-11-02T00:00:00+00:00</published><updated>2025-11-02T00:00:00+00:00</updated><id>https://felix86.com/felix86-25-11</id><content type="html" xml:base="https://felix86.com/felix86-25-11/"><![CDATA[<p>This month felix86 was presented at the RISC-V Summit 2025 North America, and we added support for SSE 4.2 and made optimizations!</p>

<h2 id="risc-v-summit-north-america">RISC-V Summit North America</h2>

<p>The presentation “RISC-V for gaming: Emulating x86 on RISC-V” showcased the internals of the felix86 emulator, particularly the translation process. If you missed it, you can watch a recording below!</p>

<div class="video-container">
    <iframe src="https://www.youtube.com/embed/S_NkVBgOcoQ" height="315" width="560" allowfullscreen="" frameborder="0">
    </iframe>
</div>

<h2 id="sse-42">SSE 4.2</h2>

<p>The instructions missing for SSE 4.2 were implemented. This includes horizontal instructions like <code class="language-plaintext highlighter-rouge">PHADDW/PHADDD/...</code> from SSSE 3. Applications that optionally use these instructions may see a performance improvement.</p>

<p>The x86 extension <code class="language-plaintext highlighter-rouge">PCLMUL</code> was also implemented. This extension adds a single instruction, <code class="language-plaintext highlighter-rouge">PCLMULQDQ</code>, which performs carryless multiplication. In RISC-V land we have <code class="language-plaintext highlighter-rouge">CLMUL</code> and <code class="language-plaintext highlighter-rouge">CLMULH</code> to achieve the same effect. Currently, support for this extension is disabled by default but can be enabled with the environment variable <code class="language-plaintext highlighter-rouge">FELIX86_PCLMULQDQ</code>.</p>

<p>Additionally, the <code class="language-plaintext highlighter-rouge">CRC32</code> instruction from SSE 4.2 was implemented. RISC-V doesn’t have a dedicated CRC32 instruction (and even if it did, it’s unlikely it would use the same polynomial), but it’s possible to perform a CRC32 using the carryless multiplication instructions. There’s an Intel white paper describing how <code class="language-plaintext highlighter-rouge">PCLMULQDQ</code> can be used to calculate a CRC32 with any polynomial, and there’s also <a href="https://web.archive.org/web/20230606170122/https://mary.rs/lab/crc32/">a blog post</a> by <a href="https://github.com/merryhime">merryhime</a> for further reading. This allows us to implement the CRC32 instruction while taking advantage of the carryless multiplication instructions, thus greatly reducing instruction count.</p>

<h2 id="more-flag-optimizations">More flag optimizations</h2>

<p>It used to be the case that felix86 would omit flag calculations only in the current block. This would mean that whenever a block ends, the flags of the latest instruction that changed them need to be calculated in software. However, it is often the case that all targets of a branch don’t make use of the flags. A new optimization was made which allows checking the branch targets for flag usage. This works especially well in hot loops, since it shaves many instructions off the end of the block.</p>

<p>This optimization is also disabled by default and can be enabled with the environment variable <code class="language-plaintext highlighter-rouge">FELIX86_SCAN_AHEAD_MULTI</code>.</p>

<hr />

<p>Thanks for reading this post.</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[This month felix86 was presented at the RISC-V Summit 2025 North America, and we added support for SSE 4.2 and made optimizations!]]></summary></entry><entry><title type="html">felix86 25.10</title><link href="https://felix86.com/felix86-25-10/" rel="alternate" type="text/html" title="felix86 25.10" /><published>2025-10-04T00:00:00+00:00</published><updated>2025-10-04T00:00:00+00:00</updated><id>https://felix86.com/felix86-25-10</id><content type="html" xml:base="https://felix86.com/felix86-25-10/"><![CDATA[<p>Welcome to another monthly blog post! This month we made optimizations and fixed a bunch of 32-bit games!</p>

<p>But first, we have an important announcement to make!</p>

<h2 id="risc-v-summit-north-america">RISC-V Summit North America</h2>

<p>We’re going to be presenting at the RISC-V Summit North America on October 23, 2025. The presentation will be about how we emulate x86 on RISC-V, focusing on the low-level translation aspects.</p>

<p><a href="https://riscvsummit2025.sched.com/event/28OUJ/risc-v-for-gaming-emulating-x86-on-risc-v-paris-oplopoios-felix86">Check it out!</a></p>

<h2 id="x87-performance-and-compatibility-improvements">x87 performance and compatibility improvements</h2>

<p>As promised last month, we worked hard on improving x87 performance. We also fixed a few bugs along the way!</p>

<h3 id="stack-optimizations">Stack optimizations</h3>

<p>x87 registers are accessed like a stack. It consists of a “stack pointer” that points to one of eight registers. Then, whenever an instruction refers to <code class="language-plaintext highlighter-rouge">st0</code>, it refers to the register at that pointer, <code class="language-plaintext highlighter-rouge">st1</code> is the one right after, and so on. In RISC-V, we have to model that with memory loads and stores. We tried allocating the stack on registers and moving them around on push and pop operations but it was bad for performance as push and pop are very frequent.</p>

<p>The new solution involves a couple optimizations. First, if we already loaded a register, we don’t load it again on further uses. If we use <code class="language-plaintext highlighter-rouge">st1</code>, it will get loaded from memory, then if two new values are pushed and we use <code class="language-plaintext highlighter-rouge">st3</code>, the register that was previously allocated for <code class="language-plaintext highlighter-rouge">st1</code> will be used. The second optimization is that if we push values to the stack we don’t actually modify the top, tag word or stack until the end of the block. We keep track of what was pushed and do all the modifications at the end, saving on redundant stores.</p>

<h3 id="bug-fixes">Bug fixes</h3>

<p>We had a few bugs with our FPU tag word handling. After fixing them and verifying the correct behavior with tests, a bunch of older titles can now run on felix86. Titles include Far Cry, DOOM 3, Assassin’s Creed and others.</p>

<p><img src="/images/hitman.png" width="800" style="display: block; margin: 10px auto" />
<em>Hitman: Blood Money is now playable on felix86 25.10</em></p>

<h3 id="future-work">Future work</h3>

<p>The x87 registers support 80-bit floats and some games rely on this extra precision to function. Felix86 uses normal 64-bit floats to emulate them, which is fine for most but not all games. We plan to emulate this extra precision mode eventually.</p>

<h2 id="32-bit-signals">32-bit signals</h2>

<p>Felix86 now has some preliminary support for 32-bit signals, allowing games that use trivial signals to work. More testing is necessary, as well as supporting legacy x86 signal frames that some games use.</p>

<hr />

<p>Thanks for reading this post.</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Welcome to another monthly blog post! This month we made optimizations and fixed a bunch of 32-bit games!]]></summary></entry><entry><title type="html">felix86 25.09</title><link href="https://felix86.com/Summer-Slumber/" rel="alternate" type="text/html" title="felix86 25.09" /><published>2025-09-03T00:00:00+00:00</published><updated>2025-09-03T00:00:00+00:00</updated><id>https://felix86.com/Summer-Slumber</id><content type="html" xml:base="https://felix86.com/Summer-Slumber/"><![CDATA[<p>Welcome to the monthly felix86 blog post. Summer slowed us down a bit, but we are preparing some big changes for next month.</p>

<h2 id="optimizations">Optimizations</h2>

<p>We managed to squeeze in a few optimizations. We now automatically compress RISC-V instructions whenever possible, using a new feature implemented in <a href="https://github.com/lioncash/biscuit">Biscuit</a>. Additionally, we optimized the implementation of some x86 instructions.</p>

<h2 id="new-playable-games">New playable games</h2>

<p>After some bug fixes in the latest version, the compatibility list has been updated with new working titles, notably Sleeping Dogs and Counter-Strike: Source.</p>

<h2 id="what-we-are-working-on">What we are working on</h2>

<h3 id="32-bit-signal-support">32-bit signal support</h3>

<p>Some 32-bit games require signals to function. This is usually the case with Windows games run through <code class="language-plaintext highlighter-rouge">wine</code>.
Currently, felix86 only supports signals for 64-bit applications, but we are working on supporting 32-bit signals.
This may allow several 32-bit Windows games to run.</p>

<p>Not all Windows games require signals. Signals are primarily used in Wine for emulating the <a href="https://learn.microsoft.com/en-us/windows/win32/sync/asynchronous-procedure-calls">Asynchronous Procedure Call</a> mechanism. Some 32-bit Windows games operate fine without signal support, such as Fallout 2 or Trackmania Nations Forever.</p>

<h3 id="reducing-vector-pipeline-stalls">Reducing vector pipeline stalls</h3>

<p>After a chat with the folks at <a href="https://github.com/FEX-Emu/FEX">FEX-Emu</a>, it came to our attention that instructions that change the vector configuration may be a decent overhead in current hardware.</p>

<p>While modern hardware may utilize register renaming to avoid a pipeline stall whenever the configuration is changed, current RISC-V hardware likely lacks this optimization.</p>

<p>It is the case that felix86 avoids emitting <code class="language-plaintext highlighter-rouge">vsetivli</code> instructions whenever the configuration wouldn’t be changed at a basic block level. However, we have something in most SSE instruction handlers that would emit an additional <code class="language-plaintext highlighter-rouge">vsetivli</code> whenever the instruction has a memory operand. It was falsely assumed that all SSE instructions can do unaligned memory access (this is not true, it’s something that only exists in AVX land) so when loading the memory operands we would set the vector configuration to sixteen 8-bit elements, allowing for unaligned access. Since this is not necessary, it can reduce <code class="language-plaintext highlighter-rouge">vsetivli</code> instructions when a memory operand is involved, potentially improving performance. We should also look into instruction handlers that use more vector configuration changes than necessary!</p>

<hr />

<p>Thanks for reading the August post.</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Welcome to the monthly felix86 blog post. Summer slowed us down a bit, but we are preparing some big changes for next month.]]></summary></entry><entry><title type="html">felix86 25.08</title><link href="https://felix86.com/Browsers/" rel="alternate" type="text/html" title="felix86 25.08" /><published>2025-08-01T00:00:00+00:00</published><updated>2025-08-01T00:00:00+00:00</updated><id>https://felix86.com/Browsers</id><content type="html" xml:base="https://felix86.com/Browsers/"><![CDATA[<p>A major milestone for felix86 was reached this month!</p>

<h1 id="browsers">Browsers</h1>

<p>We were able to run the Chromium browser under felix86!</p>

<p>You may be wondering: why would anyone want to do that?</p>

<p><img src="/images/steam.png" width="800" style="display: block; margin: 10px auto" />
<em>The Linux version of Steam, running on RISC-V with felix86</em></p>

<p>A lot of modern apps use some kind of embedded version of Chromium. This ranges from the Windows Start Menu to Discord to Steam.</p>

<p>But Steam was always a huge goal for felix86. It has become the de facto launcher for games. Being able to launch games that use Steam DRM is great and allows us to test a wide variety of titles.</p>

<p>Steam itself is a 32-bit app. It uses a separate 64-bit tool called the <code class="language-plaintext highlighter-rouge">steamwebhelper</code> to render the GUI using the Chromium Embedded Framework. It also uses a tool called <code class="language-plaintext highlighter-rouge">pressure-vessel</code> for containerization. It does some tricky stuff with mounts and <code class="language-plaintext highlighter-rouge">pivot_root</code>.</p>

<p>If you’d like to try Steam, check out the <a href="https://github.com/OFFTKP/felix86/blob/master/docs/steam.md">Steam setup guide</a>.</p>

<h4 id="installation-guide">Installation guide</h4>

<p>There’s also a quick installation guide for felix86 and Steam:</p>

<div class="video-container">
    <iframe src="https://www.youtube.com/embed/SDTbd76VWws" height="315" width="560" allowfullscreen="" frameborder="0">
    </iframe>
</div>

<h2 id="optimizations">Optimizations</h2>

<p>Most of the work this month was on getting Steam to work. However, we also optimized a few instructions.</p>

<p>One of the more interesting optimizations we made was the fusing of <code class="language-plaintext highlighter-rouge">cmp</code> instructions.</p>

<p>On RISC-V, if you want to check if, for example, register A is less than register B, you use the <code class="language-plaintext highlighter-rouge">slt</code> instruction.</p>

<p>On x86, you use the <code class="language-plaintext highlighter-rouge">cmp</code> instruction. The <code class="language-plaintext highlighter-rouge">cmp</code> instruction does a generic comparison between two registers and sets the flags accordingly. If you wanted to then set a register if A is <em>less than</em> B, you’d use the <code class="language-plaintext highlighter-rouge">setl</code> instruction after the <code class="language-plaintext highlighter-rouge">cmp</code>. The real operation of the <code class="language-plaintext highlighter-rouge">setl</code> instruction is checking if the overflow flag is not equal with the negative flag.</p>

<p>RISC-V doesn’t have flags, so felix86 computes them in software. This is expensive, particularly for the overflow flag, which takes numerous instructions to calculate. Luckily, <code class="language-plaintext highlighter-rouge">cmp</code> has no side-effects other than flags. If we know that the flags are only used for the <code class="language-plaintext highlighter-rouge">setl</code> and then overwritten, we could emit the <code class="language-plaintext highlighter-rouge">cmp</code> and <code class="language-plaintext highlighter-rouge">setl</code> combo as a single <code class="language-plaintext highlighter-rouge">slt</code> RISC-V instruction.</p>

<p>This is currently only done for <code class="language-plaintext highlighter-rouge">cmp</code> and <code class="language-plaintext highlighter-rouge">cmovcc</code>, but in the future we will expand it to work with <code class="language-plaintext highlighter-rouge">setcc</code>. The most used combo would probably be <code class="language-plaintext highlighter-rouge">cmp</code> and <code class="language-plaintext highlighter-rouge">jcc</code>, but supporting this would require analyzing more than one block at once, which we currently don’t do.</p>

<p>It should be noted that this optimization only works if the <code class="language-plaintext highlighter-rouge">cmp</code> is immediately followed by the instruction that uses the resulting flags. This is usually the case, but it’s possible for a compiler to emit instructions that don’t modify the flags between the two instructions. If deemed necessary, we could fuse <code class="language-plaintext highlighter-rouge">cmp</code> instructions even if the instruction that uses the flags is further down the line.</p>

<div class="alert alert-info" role="alert"><i class="fa fa-info-circle"></i> <b>Note:</b> Theoretically, a program can rely on correct flag emulation even for unused flags by using signals. This would make the aforementioned optimization much trickier to do. In practice, signal handlers rarely rely on correct register state other than the current RIP (although felix86 will try to reconstruct the state at the time of the signal), and I’ve never seen them rely on correct flags. A well-crafted piece of DRM software could detect emulation when this optimization is enabled. If you are a DRM author consider forgetting about this.</div>

<p>This optimization is currently disabled by default as it needs more testing. To enable it, run <code class="language-plaintext highlighter-rouge">export FELIX86_FUSE_OPCODES=1</code>.</p>

<hr />

<p>Thanks for reading this post.</p>

<p>If you like this project, please give us a star on Github: <a href="https://github.com/OFFTKP/felix86">https://github.com/OFFTKP/felix86</a></p>]]></content><author><name></name></author><summary type="html"><![CDATA[A major milestone for felix86 was reached this month!]]></summary></entry><entry><title type="html">felix86 25.07</title><link href="https://felix86.com/AAA-Games/" rel="alternate" type="text/html" title="felix86 25.07" /><published>2025-07-01T00:00:00+00:00</published><updated>2025-07-01T00:00:00+00:00</updated><id>https://felix86.com/AAA-Games</id><content type="html" xml:base="https://felix86.com/AAA-Games/"><![CDATA[<p>Another month has passed, and felix86 development continues!</p>

<h1 id="aaa-games">AAA games</h1>

<p>Some AAA games are now playable! Titles include Linux games like Witcher 2, but also Windows games like Witcher 3 and Crysis!</p>

<p><img src="/images/crysis.png" width="800" style="display: block; margin: 10px auto" />
<em>Non-trivial Windows games would not work with the previous version</em></p>

<p>These just started working, so we haven’t had enough time to profile or optimize them yet. So don’t expect great performance yet!</p>

<p><img src="/images/witcher3.png" width="800" style="display: block; margin: 10px auto" />
<em>Witcher 3 on Milk-V Jupiter</em></p>

<h2 id="user-friendliness">User-friendliness</h2>

<p>The recommended way of using felix86 now is through the emulated bash.</p>

<p>To do so, install felix86 using the installation script (or register it with binfmt_misc using <code class="language-plaintext highlighter-rouge">sudo -E felix86 -b</code>), then run the x86-64 version of bash:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FELIX86_QUIET=1 felix86 /bin/bash
</code></pre></div></div>

<p>This should start an instance of the emulated bash that is now inside the rootfs. You can use it to run your games.</p>

<p>To learn more, <a href="https://github.com/OFFTKP/felix86/blob/master/docs/how-to-use.md">check out the updated usage guide</a>.</p>

<h2 id="appimage-support">AppImage support</h2>

<p>AppImages had a permission issue - basically they would try to run a privileged app like <code class="language-plaintext highlighter-rouge">mount</code>, but when this was run through the emulator, the emulator would not inherit the privileges.</p>

<p>Since we now install felix86 in binfmt_misc, the emulator should inherit the setuid bit of these apps, and AppImage executables should now work!</p>

<h2 id="better-filesystem-emulation">Better filesystem emulation</h2>

<div class="alert alert-danger" role="alert"><i class="fa fa-exclamation-circle"></i> <b>Warning:</b> The following has since been reworked - we now do our own path resolution</div>

<p>We try to restrict all guest file accesses inside the rootfs. In recent PRs we improved a few things.</p>

<p><code class="language-plaintext highlighter-rouge">/dev</code>, <code class="language-plaintext highlighter-rouge">/proc</code>, <code class="language-plaintext highlighter-rouge">/sys</code>, <code class="language-plaintext highlighter-rouge">/run</code> and <code class="language-plaintext highlighter-rouge">/tmp</code> were symlinks. The reason for this is that mounting requires superuser privileges. Using namespaces doesn’t cut it, because then we wouldn’t be able to run apps like <code class="language-plaintext highlighter-rouge">sudo</code> or <code class="language-plaintext highlighter-rouge">mount</code>. Additionally, using namespaces to get mount privileges means that other instances of felix86 won’t see the same mounts as that instance.</p>

<p>But symlinks have their own set of problems.  For example, <code class="language-plaintext highlighter-rouge">readlink(/proc)</code> would be equal to <code class="language-plaintext highlighter-rouge">/proc</code>, causing recursion in the guest. This was previously solved by detecting accesses to <code class="language-plaintext highlighter-rouge">/proc</code> (and the others) and redirecting them to the host <code class="language-plaintext highlighter-rouge">/proc</code>. All in all, it was hacky and horrible.</p>

<p>We now use a separate executable, called <code class="language-plaintext highlighter-rouge">felix86-mounter</code>. The installation script will install it and give it special privileges. Then, felix86 will call <code class="language-plaintext highlighter-rouge">felix86-mounter</code> with the rootfs path as the argument. <code class="language-plaintext highlighter-rouge">felix86-mounter</code> will then do a few things:</p>
<ul>
  <li>Create a temporary dir called <code class="language-plaintext highlighter-rouge">/run/felix86/mounts/mount-XXXXXX</code> (where the X’s are random using <code class="language-plaintext highlighter-rouge">mkdtemp</code>)</li>
  <li>Create a file that has the actual rootfs path in <code class="language-plaintext highlighter-rouge">/run/felix86/mounts/mount-XXXXXX/path.txt</code></li>
  <li>Create a directory <code class="language-plaintext highlighter-rouge">/run/felix86/mounts/mount-XXXXXX/rootfs</code></li>
  <li>Mount the rootfs there (this is good because now the rootfs itself is a mount, just like <code class="language-plaintext highlighter-rouge">/</code> is)</li>
  <li>Mount the necessary dirs (<code class="language-plaintext highlighter-rouge">/dev</code>, <code class="language-plaintext highlighter-rouge">/proc</code>, …) inside the rootfs</li>
  <li>Return the <code class="language-plaintext highlighter-rouge">/run/felix86/mounts/mount-XXXXXX/rootfs</code> path</li>
</ul>

<p>Further runs of <code class="language-plaintext highlighter-rouge">felix86-mounter</code> will detect that a mounted path already exists using the <code class="language-plaintext highlighter-rouge">path.txt</code> file.</p>

<p>This is better than doing it on the felix86 executable:</p>
<ul>
  <li>If we did this inside the felix86 executable itself, it would need special permissions which is definitely scarier if we don’t properly drop them after mounting</li>
  <li>The felix86 executable frequently changes and needs to be given permissions again, felix86-mounter will not</li>
  <li>Having to run sudo everytime you want to run felix86 would be clunky</li>
  <li>Having to run sudo on anything you attach to felix86 (gdb, strace, …) would also suck</li>
</ul>

<h2 id="symlink-resolving">Symlink resolving</h2>

<p>We had a bug in our previous implementation, which happened to go undetected:</p>

<p>The symlink syscall would resolve the old path to be inside the rootfs. For example, if you ran <code class="language-plaintext highlighter-rouge">symlink(/oldpath, /linktarget)</code>, <code class="language-plaintext highlighter-rouge">/linktarget</code> would now resolve to <code class="language-plaintext highlighter-rouge">/path/to/rootfs/oldpath</code>.</p>

<p>This would work with <em>most applications</em>. However, as we start to worry about programs calling <code class="language-plaintext highlighter-rouge">chroot</code> and changing our root, this is no longer a viable solution.</p>

<p>Symlink shouldn’t resolve the old path to be inside the rootfs. But if we don’t do that, then we need to resolve paths ourselves using the <a href="https://man7.org/linux/man-pages/man7/path_resolution.7.html">same algorithm the kernel uses</a>, while placing results in the rootfs. This is difficult to implement while avoiding potential bugs.</p>

<p>Luckily, the <code class="language-plaintext highlighter-rouge">openat2</code> syscall has a flag called <code class="language-plaintext highlighter-rouge">RESOLVE_IN_ROOT</code>. This is perfect for our use case, as it can open a file while resolving symlinks to be inside a file descriptor – our rootfs file descriptor. We can then readlink the <code class="language-plaintext highlighter-rouge">/proc/self/fd/&lt;fd&gt;</code> to get the absolute resolved path that is inside the rootfs.</p>

<p>This also helps with containerization, as it makes it harder for programs to escape the rootfs. felix86 isn’t a security application and should only be used with trusted programs, but some programs would do stuff like <code class="language-plaintext highlighter-rouge">fd = open("/")</code>, <code class="language-plaintext highlighter-rouge">openat(fd, "..")</code> and depend on the file descriptors being the same, which we previously had to hack around.</p>

<hr />

<p>Thanks for reading the July post!</p>

<p>It was an exciting month for felix86 progress. See you in a month!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Another month has passed, and felix86 development continues!]]></summary></entry><entry><title type="html">felix86 25.06</title><link href="https://felix86.com/Native-OpenGL/" rel="alternate" type="text/html" title="felix86 25.06" /><published>2025-05-31T00:00:00+00:00</published><updated>2025-05-31T00:00:00+00:00</updated><id>https://felix86.com/Native-OpenGL</id><content type="html" xml:base="https://felix86.com/Native-OpenGL/"><![CDATA[<p>Welcome to another felix86 blog post. This month we got Unity and 32-bit games working and made many performance improvements!</p>

<h2 id="easier-installation">Easier installation</h2>
<p>On Ubuntu/Debian/Bianbu and possibly other distros, you can now install felix86 and a rootfs with a single command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl -s https://raw.githubusercontent.com/OFFTKP/felix86/master/src/felix86/tools/install.sh -o /tmp/felix86_install.sh &amp;&amp; bash /tmp/felix86_install.sh &amp;&amp; rm /tmp/felix86_install.sh
</code></pre></div></div>

<p>So go ahead and try it out!</p>

<h2 id="unity-ue3-and-32-bit-games">Unity, UE3, and 32-bit games</h2>
<p>There were many bug fixes during the month of May! One of the features we miss the most in RISC-V for x86-64 emulation is 128-bit compare exchange (CAS). This is necessary for proper emulation of the <code class="language-plaintext highlighter-rouge">cmpxchg16b</code> instruction. Unfortunately, no hardware currently supports the <code class="language-plaintext highlighter-rouge">Zacas</code> extension, which adds 128-bit CAS. The <code class="language-plaintext highlighter-rouge">cmpxchg16b</code> instruction is used extensively by Unity games, and improper emulation leads to frequent crashes and bugs.</p>

<p>Until we get <code class="language-plaintext highlighter-rouge">Zacas</code> in hardware, we have implemented a half-solution of using a global lock on every <code class="language-plaintext highlighter-rouge">cmpxchg16b</code> instruction. This makes it atomic with respect to other <code class="language-plaintext highlighter-rouge">cmpxchg16b</code> instructions, but not with other memory reads/writes. Turns out this is good enough to fix Unity games!</p>

<p><img src="/images/cuphead.png" width="800" style="display: block; margin: 10px auto" />
<em>Cuphead, a Unity game, now playable on RISC-V with felix86</em></p>

<p>Other bug fixes have made at least one Unreal Engine 3 game playable, and maybe more that I haven’t tested.</p>

<p><img src="/images/outlast.webp" width="800" style="display: block; margin: 10px auto" />
<em>Not many Unreal Engine 3 games for Linux out there</em></p>

<p>Additionally, after even more bug fixes, 32-bit ioctl marshalling support for radeon, and new x87 instructions, some 32-bit games are playable!</p>

<p><img src="/images/portal2.png" width="800" style="display: block; margin: 10px auto" />
<em>Now you’re thinking with portals</em></p>

<h2 id="thunking-glx">Thunking GLX</h2>

<p>OpenGL thunk libraries now work! Games can load an overlay <code class="language-plaintext highlighter-rouge">libGL.so</code>/<code class="language-plaintext highlighter-rouge">libGLX.so</code> which will make calls into host code.</p>

<p>If you installed felix86 using the script, you can enable thunking using <code class="language-plaintext highlighter-rouge">export FELIX86_ENABLED_THUNKS=glx</code>. It is currently disabled by default, as it needs more testing.</p>

<p>If you compiled felix86 make sure to set <code class="language-plaintext highlighter-rouge">FELIX86_THUNKS</code> to <code class="language-plaintext highlighter-rouge">/path/to/felix86/source/src/felix86/hle/guest_libs</code>.</p>

<h3 id="technical-ramble">Technical ramble</h3>
<p>One benefit of userspace emulators is they are allowed to employ some tricks for performance gains. For example, many libraries have a stable API across architectures. This means that if a program runs <code class="language-plaintext highlighter-rouge">glDrawArrays</code> for example, the host <code class="language-plaintext highlighter-rouge">glDrawArrays</code> <em>could</em> be used.</p>

<h4 id="quick-brief-on-opengl-inner-workings">Quick brief on OpenGL inner workings</h4>
<p>Things aren’t so simple, however. OpenGL works with a global context. The functions are loaded from an implementation-defined function that returns function pointers. On Linux, this will be either <code class="language-plaintext highlighter-rouge">glXGetProcAddress</code> or <code class="language-plaintext highlighter-rouge">eglGetProcAddress</code>. This means that thunking OpenGL essentially means thunking GLX and/or EGL.</p>

<p>Thunking GLX brings other problems. GLX interacts with X11 – it uses structs from X11 like the <code class="language-plaintext highlighter-rouge">Display</code> struct. The inner layout of this struct <em>is</em> architecture dependent. This not only means that you’d need to thunk <code class="language-plaintext highlighter-rouge">libX11</code>, but that it’s also difficult to do so as some structs differ in their layout.</p>

<h4 id="fex-to-the-rescue">FEX to the rescue</h4>
<p>Thankfully there’s a way to get around needing to thunk <code class="language-plaintext highlighter-rouge">libX11</code>. You can create a host-side <code class="language-plaintext highlighter-rouge">Display</code> object and convert between the host and guest <code class="language-plaintext highlighter-rouge">Display</code> pointers when calling GLX functions that use <code class="language-plaintext highlighter-rouge">Display</code>. This idea is from <a href="https://github.com/FEX-Emu/FEX/blob/main/ThunkLibs/include/common/X11Manager.h">FEX-Emu</a> and it saves us a lot of frustration.</p>

<h2 id="thunking-luajit">Thunking LuaJIT</h2>

<p>Some games use Lua scripts. Other games like Balatro are entirely made in Lua. A popular speedy implementation of Lua is LuaJIT, which recompiles Lua code to machine code. However, LuaJIT does some things which invalidate our recompiled RISC-V code. This would previously cause significant slowdown in Balatro.</p>

<p>We now thunk the <code class="language-plaintext highlighter-rouge">libluajit.so</code> library, running the RISC-V LuaJIT in place of the x86-64 LuaJIT! This has caused significant performance improvements in Balatro and potentially other LÖVE engine games.</p>

<p><img src="/images/balatro.png" width="800" style="display: block; margin: 10px auto" />
<em>Balatro went from 2 FPS to 30 FPS and 100% GPU usage, so a GPU upgrade could push it even further!</em></p>

<p>LuaJIT can call C code from quite a few places, which means our host code needs to wrap the callbacks. You can see this happening in <code class="language-plaintext highlighter-rouge">thunks_luajit.cpp</code>. Whenever a C function is registered to be callable from Lua, we wrap it in code that will invoke the recompiler and fix up the arguments from the x86-64 ABI to the RISC-V ABI.</p>

<p>Enabling this needs you to compile <a href="https://github.com/plctlab/LuaJIT/commits/riscv64-v2.1-branch/">LuaJIT for RISC-V</a>, install it, and run <code class="language-plaintext highlighter-rouge">export FELIX86_ENABLED_THUNKS=lua</code>.</p>

<p>You can also thunk multiple libraries like we did for the Balatro image: <code class="language-plaintext highlighter-rouge">export FELIX86_ENABLED_THUNKS=glx,lua</code></p>

<hr />

<p>Thanks for reading the June post!</p>

<p><a href="https://felix86.com/contrib/">Contributions are welcome!</a> Anybody interested in RISC-V or x86 can help!</p>

<p>If you find this project interesting, please <a href="https://github.com/OFFTKP/felix86">star the repository</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Welcome to another felix86 blog post. This month we got Unity and 32-bit games working and made many performance improvements!]]></summary></entry></feed>