Debugging a Segfault in a LLVM JIT Engine

Prologue

Debugging segmentation fault in a program is always a nightmare, as the program triggered a segmentation fault is killed immediately by signal, and may not leave any useful information for us. It becomes even worse when you are trying to debug code generated by a JIT engine, as the code is generated at runtime and usually does not have any call stack information (like eh-frame section or dwarf related data), which stop gdb or lldb from showing source map or stack trace.

Unfortunately, in the LLVM ecosystem, the JIT engine is far more buggy than the ahead-of-time backend, and it is very common to see segmentation fault when running seemingly correct LLVM IR. It happens even when the same code runs perfectly in the ahead-of-time backend. What’s even more painful is that the problem may occur only on some specific platform, and it may work perfectly on others. Sometimes it’s really hard to tell whether the problem is in your code or in the LLVM JIT engine.

Since the birth of Pivot Lang JIT engine, I have been struggling with this kind of problem. In most cases, unfortunately, I failed to find the root cause or could not reproduce the problem on my local machine.

Of course, I’m not writing this article to complain about the LLVM JIT engine, I’m writing because, despite so many failures and disappointments, I finally managed to solve a segfault. I would like to share my experience with you and hope it can help you when you encounter similar problems.

The Segfault

The segfault happened not on my computer, but on a Linux GitHub action CI. Same code works well on all platforms if it’s AOT-compiled, and the JIT code fails every time on Linux, sometimes it also fails on Windows, but not macOS.

Pre-Analyzing the Problem

The log of failed CI only barely tells the exception location, and I don’t have a Linux available when the CI failed, so I started by analyzing the other information. The exception has a very significant attribute that it never happens on Mac runner, so why mac is special? Well, the first thing that comes to my mind is architecture: both Linux and Windows runners are x86_64, but the Mac runner is arm64.

Every CI is running the same IR code, so the JIT-produced instructions shall be similar, there are two possibilities:

The x86_64 JIT engine has a bug, it’s producing wrong instructions.
The instruction produced on x86_64 has more limitations than arm64

Narrowing Down the Problem

After I got home and launched my WSL2, I started to reproduce the problem on my local machine. I bisected the code and built a minimal reproducer like the following:

example

Note that I don’t know which line the exception happened, If I add more println or remove any of the lines the exception may disappear, which is pretty wield.

I immediately noticed that the i128 type might be the key, as it’s not supported natively, which is special compared to other primitive types. So I tried to replace the i128 with i64 and the exception disappeared.

Then I tried to use lldb to find out the exact instruction that caused the exception.

exception

Oh, great, lldb cannot load any instructions before the exception and the call stack is empty, just as expected.

On the disassembly view, obviously, the exception happened at the movaps instruction, which is a SIMD instruction. SIMD is used to play with bytes in parallel, if there’s anywhere the code above related to SIMD instructions, it’s likely to be i128, as other data types are not large enough to benefit.

In line 5 of the disassembly view, there’s a cmpq instruction, comparing a variable with 1, which matches the logic in source code line 22. This gives me the exact exception point in source code.

Analyzing The Root Cause

Now the problem is clear: the JIT engine is producing SIMD instructions movaps for i128 type variables, which is causing a segmentation fault on x86_64. The next step is to find out why the instruction failed.

After doing some research, I found that movaps instruction requires the destination address to be aligned to 16 bytes. In our case, the destination address is rsp+0x18, which is 0x00007fffffffc40 + 0x18=0x7FFFFFFFC58, and it’s not aligned to 16 bytes. It never happens on Mac because aarch64 allows unaligned access, but x86_64 does not.

Fix

The fix is simple: just make sure the alloca instruction generated for i128 type variables are aligned to 16 bytes, like:

%ptr = alloca i128, align 16

Conclusion

In primary school, my mom sent me to learn swimming. I learned freestyle, every weekend for 2 hours.

Whenever I was having trouble swimming, my coach would give me the same diagnosis: I was trying to do everything at once. I thought I learned theory well, and I tried to swim just like athletes, but I couldn’t. My body didn’t know how to swim yet. Instead of trying to swim immediately, I need to practice kicking with my legs, stroking with my hands, and turning to breathe. And after all that practice, I could swim.

I never got good at swimming, but I did take these lessons to heart in my programming. When dealing with problems that seem impossible to solve, it’s important to break them down into smaller, more manageable pieces. Step by step and ensure you’re moving forward, even if it’s just a little bit.

Li Boxiu

A Developer