GC And Async

We’ve talked a lot about our Immix GC before. Although it’s absolutely a state-of-the-art GC algorithm, some of its features are not so friendly to compilers.

One of the most annoying attributes in my opinion is the evacuation. Evacuation is a process that moves objects around to make the memory layout more compact. It is supposed to improve the locality of memory, and thus improve the performance. However, moving things around means GC needs to have knowledge of all pointers pointing to the evacuated object, and update them accordingly. Such knowledge must be carefully maintained by the compiler, and it’s not an easy job.

Our async implementation is using the well-known library libuv as the backend. libuv is a C library, it’s not aware of our GC and vice versa. As you may already know, our GC is using a binary stackmap generated by the compiler to crawl the stack with sp register during collection. The stackmap has information about all pointers’ offset in the function frame and the function frame’s size. However, the stackmap is generated at compile time which means it’s not aware of the frame information of the ffi functions. In most cases it doesn’t matter because ffi calls are mostly leaf functions, we can just ignore them during GC. But in the case of libuv, we sometimes need to pass a callback function to libuv which will be called later. When the callback is called, the stack is like:

----
caller frame
----
libuv frame
----
callback frame

If the callback frame triggers a GC, the GC will not be able to unwind through the libuv frame because it doesn’t have the stackmap to do so. This will cause the GC to miss some pointers and thus cause undefined behavior.

Solution: Pinning and Keepalive

To solve this problem, we introduce two new concepts: pinning and keepalive.

Pinning is a mechanism to tell the GC that a certain object should not be moved. We can pin an object by calling pin function on it.

Keepalive is a mechanism to tell the GC that a certain object should not be collected, even if it’s not reachable from the root. We can keep an object alive by calling keepalive function on it.

With these two mechanisms, we can make sure the GC behaves correctly by following rule:

Pin and keepalive all objects that are directly referenced by the libuv data.

In libuv we usually pass our custom data to the callbacks by associating it with the corresponding libuv handle. In such condition it is reference directly by libuv handle, so we should just pin and keepalive them.

/// # uv_handle_set_data
/// 
/// This function is used to set the data of the handle.
/// The function will remove the previous data from the GC,
/// tag the new data as pinned, and keep it alive.
/// 
/// It is intended to be used with the `get_data_for_handle` function,
/// which will undo the pinning and keep-alive when the data is retrieved.
/// 
/// ## Safety
/// 
/// This function should be used in pair with the `get_data_for_handle` function.
/// Both functions should be called in the same thread.
pub fn set_data_for_handle<H|T> (handle:*H, data:*T) void {
    // try rm previous live data
    let v = unsafe_cast<()>(handle);
    let re;re = uv_handle_get_data(v);
    gc::rm_alive_pinned(re);
    gc::pin(data);
    gc::keep_alive_pinned(data);
    uv_handle_set_data(unsafe_cast<()>(handle), unsafe_cast<()>(data));
    return;
}

Also, any libuv handle that is allocated by our GC should be pinned before it’s passed to libuv ffi functions.

pub fn new_uv_async_t() *uv_async_t {
    let re = gc::malloc_pinned(uv_handle_size(UV_ASYNC) as i64);
    return unsafe_cast<uv_async_t>(re);
}

Some may argue that if the libuv handle itself is alloced and managed by the GC, is it necessary to pin and keepalive the data associated with it? The answer is yes, because we are allocating the handle as an atomic object, which means the GC has no knowledge of the internal structure of the handle, making it impossible to update the pointers inside the handle when it’s moved.

Li Boxiu

A Developer

GC And Async

Solution: Pinning and Keepalive