Prologue
Debugging segmentation fault in a program is always a nightmare, as the program
triggered a segmentation fault is killed immediately by signal, and may not
leave any useful information for us. It becomes even worse when you are trying to
debug code generated by a JIT engine, as the code is generated at runtime and
usually does not have any call stack information (like eh-frame
section or
dwarf
related data), which stop gdb
or lldb
from showing source map or
stack trace.
Unfortunately, in the LLVM ecosystem, the JIT engine is far more buggy than the ahead-of-time backend, and it is very common to see segmentation fault when running seemingly correct LLVM IR. It happens even when the same code runs perfectly in the ahead-of-time backend. What’s even more painful is that the problem may occur only on some specific platform, and it may work perfectly on others. Sometimes it’s really hard to tell whether the problem is in your code or in the LLVM JIT engine.
Since the birth of Pivot Lang JIT engine, I have been struggling with this kind of problem. In most cases, unfortunately, I failed to find the root cause or could not reproduce the problem on my local machine.
Of course, I’m not writing this article to complain about the LLVM JIT engine, I’m writing because, despite so many failures and disappointments, I finally managed to solve a segfault. I would like to share my experience with you and hope it can help you when you encounter similar problems.
The Segfault
The segfault happened not on my computer, but on a Linux GitHub action CI. Same code works well on all platforms if it’s AOT-compiled, and the JIT code fails every time on Linux, sometimes it also fails on Windows, but not macOS.
Pre-Analyzing the Problem
The log of failed CI only barely tells the exception location, and I don’t have a Linux available when the CI failed, so I started by analyzing the other information. The exception has a very significant attribute that it never happens on Mac runner, so why mac is special? Well, the first thing that comes to my mind is architecture: both Linux and Windows runners are x86_64, but the Mac runner is arm64.
Every CI is running the same IR code, so the JIT-produced instructions shall be similar, there are two possibilities:
- The x86_64 JIT engine has a bug, it’s producing wrong instructions.
- The instruction produced on x86_64 has more limitations than arm64
Narrowing Down the Problem
After I got home and launched my WSL2, I started to reproduce the problem on my local machine. I bisected the code and built a minimal reproducer like the following:
Note that I don’t know which line the exception happened, If I add more
println
or remove any of the lines the exception may disappear, which is pretty wield.
I immediately noticed that the i128
type might be the key, as it’s not supported natively,
which is special compared to other primitive types. So I tried to replace the i128
with
i64
and the exception disappeared.
Then I tried to use lldb
to find out the exact instruction that caused the exception.
Oh, great, lldb
cannot load any instructions before the exception and the call stack is empty,
just as expected.
On the disassembly view, obviously, the exception happened at the movaps
instruction, which is
a SIMD instruction. SIMD is used to play with bytes in parallel, if there’s anywhere the code above
related to SIMD instructions, it’s likely to be i128
, as other data types are not
large enough to benefit.
In line 5 of the disassembly view, there’s a cmpq
instruction, comparing a variable with 1
, which
matches the logic in source code line 22. This gives me the exact exception point in source code.
Analyzing The Root Cause
Now the problem is clear: the JIT engine is producing SIMD instructions movaps
for i128
type
variables, which is causing a segmentation fault on x86_64. The next step is to find out why the
instruction failed.
After doing some research, I found that movaps
instruction requires the destination address to be
aligned to 16 bytes. In our case, the destination address is rsp+0x18
, which is
0x00007fffffffc40 + 0x18=0x7FFFFFFFC58
, and it’s not aligned to 16 bytes. It never happens on
Mac because aarch64 allows unaligned access, but x86_64 does not.
Fix
The fix is simple: just make sure the alloca
instruction generated for i128
type variables
are aligned to 16 bytes, like:
%ptr = alloca i128, align 16
Conclusion
In primary school, my mom sent me to learn swimming. I learned freestyle, every weekend for 2 hours.
Whenever I was having trouble swimming, my coach would give me the same diagnosis: I was trying to do everything at once. I thought I learned theory well, and I tried to swim just like athletes, but I couldn’t. My body didn’t know how to swim yet. Instead of trying to swim immediately, I need to practice kicking with my legs, stroking with my hands, and turning to breathe. And after all that practice, I could swim.
I never got good at swimming, but I did take these lessons to heart in my programming. When dealing with problems that seem impossible to solve, it’s important to break them down into smaller, more manageable pieces. Step by step and ensure you’re moving forward, even if it’s just a little bit.