Generational Garbage Collector


Robbie Fan (Fan Decheng) 2012-01-24

Yesterday evening before I fell asleep, I thought about the generational Garbage Collector (GC). How can it be implemented? Previously I didn’t think of a realistic solution. Yesterday’s thought has brought this one step further. This article is not a design. It’s just a discussion.

For a simple garbage collector, being naturally the stop-the-world type, to collect all garbage, the approach can be “mark-and-sweep”. To do mark and sweep, essentially it is traversing the reference graph (the graph connected by object references), marking every referenced object, and collecting all unmarked objects. The traversal can be breadth-first or depth-first. The mark bit must be present in every object. To do depth-first traversal, the current traversal stack of references must be maintained. To do breadth-first traversal, the list of current “surface” objects must be maintainted.

For a generational garbage collector, generations are used to speed up garbage collection. An example is the Microsoft .NET Framework. There can be three generations. Generation 0 is the newest generation. Every garbage collection cycle includes Generation 0. Generation 1 contains objects that are not reclaimed in previous Generation 0 cycles, and are collected less frequently. Generation 2 objects are those not reclaimed in the Generation 1 cycle, and are collected even less frequently.

Below I’ll discuss two methods. One method is my incomplete idea in yesterday evening. Another method is the .NET Framework approach in my understanding.

My yesterday evening’s idea about the collection cycle implementation, as only an imagination, is this: after the code runs, we record addresses of all newly-allocated objects. During running, we keep a trace of all outgoing references from generations 1 and 2 objects and roots (roots include call stack references and static references; keeping all references can be expensive if not properly implemented, but it speeds up the collection cycles afterwards). Before we do generation 0 collection, we have below four collections: three collections of references to objects in generations 0, 1 and 2; a collection of outgoing references from generations 1, 2 and roots (it may be divided into three sub-collections respectively for generations 1, 2 and roots). When objects are moved from one generation to an older generation, we mean references to those objects are copied to the older generation, and the references to those objects are deleted from the newer generation. We use the following wording interchangeably when the below-mentioned reference references the object: move an object from generation 0 to generation 1 / move a reference from generation 0 to generation 1.

When we do generation 0 collection, we walk through all newly-allocated objects since last collection (or since the start of the program), and check if their references exist in outgoing references in generations 1, 2 and roots (this is implied in the following discussion wherever appropriate). We repeat the following step until no references in generation 0 are in the outgoing references of older generations: we put the references into generation 1, update the collection of outgoing references in older generations (we know all outgoing references in all objects added to generation 1, so we just adjust the outgoing reference table), and scan the remaining generation 0 references again.

([Robbie 2012-05-30] The above-mentioned algorithm would have very bad performance when there is a deep singly-linked list from generation 1 to generation 0 outwards. One way to optimize is: in the first pass, use the above algorithm. During that pass, record the “surface area”, that is, the generation 0 objects that will be added to generation 1. Because after the first pass, only those can reference other generation 0 nodes. Then we do breadth-first or depth-first traversal, and we’ll find all generation 0 nodes.)

When no references in generation 0 are in the outgoing references of older generations, we execute all finalizers on all objects in generation 0 (each object has a “finalizer executed” bit, so that the finalizer won’t be executed twice–this is exactly the behavior I saw in C# programs). As the finalizer may cause changes in references, the outgoing references of older generations may increase or decrease, and we need to re-scan the generation 0 references. There are two possible cases of changes in references: 1. a new outgoing reference of older generations to a previous generation 0 object; 2. an outgoing reference of older generations disappears, possibly leaving a previously-referenced object unreferenced. In case 1, we need to move those new references from the generation 0 references into generation 1. In case 2, we can ignore the case in favor of efficiency assuming this happens rarely–otherwise we need to run the finalizer on the unreferenced object and do the previous step again. The object will be collected later in a generation 1 collection cycle.

When there are enough cycles run for generation 0 collection, we can run a generation 1 collection. To do this, we walk through all references in generations 0 and 1, and see if they exist in generation 2. It’s a similar process to generation 0 collection except that it is made to generations 0 and 1 together, and references are moved from generations 0 and 1 to generation 2.

([Robbie 2012-05-30] We may wish to prevent generation 0 objects from directly becoming generation 2. During the traversal, we may wish to add previous generation 1 objects to generation 2, and add previous generation 0 objects to generation 1. It is similar for a generation 2 collection.)

For a generation 2 collection, the check is made against the outgoing references from the roots, and objects referenced are merged into generation 2–apparently they can’t be moved into the roots.

For how to trace changes in outgoing references between the runs, one simple way is: don’t trace and compute before the collection is run. Since all outgoing references are always one level deeper from references in generations 1, 2 and roots, we can compute it fast. However this still defeats the purpose of generational GC, because we don’t want to deal with all those data when we do a collection of generation 0. So a better way is to still trace them.

To trace objects via software, following my yesterday’s idea, is to maintain a collection of all references, and record an address (of the referenced object) and a reference count (of the referenced object). Whenever an assignment is made to a reference in a generation 1 object or a generation 0 object becomes a generation 1 object, we modify the outgoing reference table for generation 1. Similar is done for outgoing reference tables for generation 2 and roots.

To trace them with hardware support, Microsoft .NET Framework is an example. See:

From my understanding, there is some differences between how the .NET Framework garbage collector is implemented and what’s in my yesterday’s idea. Firstly, for the .NET GC, one generation of objects are grouped together into a memory pool. For example, all generation 0 objects are grouped together. This group, as I see, may not be strict. The key point is not that all objects are in a single range in the memory space. Instead, it is to not let objects in different generations share the same memory page (a memory page is a unit used in paged virtual memory implementation; on x86/x64 architectures, it is typically 4KB). By using a range mapping table, memory ranges of all objects in a generation can be identified. Every page is mapped to a bit in the “card table” of the generation.

Whenever a reference assignment is being made, a piece of .NET Framework code is run to identify the generation and update the corresponding dirty bit of the page in the “card table”. If hardware support exists, for example on Windows running on x86 machines, it is possible to use the “GetWriteWatch” API to check which pages are modified between the last collection and the current collection, so the piece of .NET Framework code needn’t be run upon each assignment. The JITter automatically decides whether to use software support or hardware support.

The generation 0 collection is performed unlike in my yesterday’s idea. Firstly we clear the “marked” bits in generation 0 objects. Then we do a mark-and-sweep from objects intersecting with changed pages in generations 1, 2 and roots. There are two cases when outgoing references in such an object are checked: 1. It references an object in generations 1 and 2; 2. It references an object in generation 0. For situation one, the trace is over. We only go forward in situation two. Then, marked objects should be moved into generation 1. After the move, dirty page flags are reset and finalizers are called (just like what I described in my idea, earlier in this article). There may come new dirty pages, and we need to mark and sweep again in this case, but only from the new dirty pages.

Note that objects in generation 0 are moved into generation 1 memory area before we call the finalizers because the finalizers may change references, including those in the new generation 1 objects.

For a generation 1 collection, it can be performend without requiring a immediately preceding generation 0 collection. Objects intersecting with the dirty pages in generation 2 and roots are used to do a mark-and-sweep. Marked objects are moved into generation 2.

For a generation 2 collection, all objects in the roots are used to do a mark-and-sweep. Marked objects are merged into generation 2.

For all the above talk, we may use aids from various data structures to speed up the operation. B-Trees and hash tables always come handy.

Compared to my idea, the .NET Framework approach has the advantage of fast sweep. It has the disadvantage of having to move objects between memory pages. It can be solved by using a “large memory pool” that stores only large objects (those objects can only be collected in a generation 2 collection cycle). As my idea requires a lot of table look up and updating, it may be slower than the .NET Framework approach. On the other hand, the .NET Framework looks up which references exist in dirty pages, which may also involve a table look up for the first object in the page.