Garbage Collection

Current GC implementations ammeliorate large amounts of time spent doing GC, by doing it infrequently, ie, after a certain amount of memory has been allocated. This is wasteful of memory, which slows down performance. To my understanding, most JVM’s at the moment have a nursary where objects are “born” into. When the nursary is full, then the objects are garbage collected and moved into another region of memory. This means that the applications are going to use at least the same amount of memory as the nursary. If every application did this then we’d run out of memory long before we would normally.

Now I think garbage collection is a great idea. Programmers suck at memory management, at least I know I do. But I don’t think that the current methods, and approaches are the way forward.

Before garbage collection we had… what?

Before garbage collection is used, and even today in lower level languages, programmers explicitly free(3) memory themselves. In general this mostly works well, occasionally people have huge memory leaks in their program, but programmers are skillful at creating and using tools such as valgrind to help find and fix these problems. It’s not perfect, but it is possible.

Most objects die young

Most objects are destroyed just after they are created. It’s very rare for an object to last a long amount of time. There is a lot of established literature about this.

Most objecs cannot form cycles

Most objects can’t form cycles so long as type safety is enforced. An object such as a string cannot point to another object, and thus can be refcounted. In fact, std::string in C++ is often implemented using refcounting, so only a pointer is passed around from place to place.

I hypothosise that most objects that do form cycles are ones that implement collections of some description. Doubly linked lists have a prev pointer, and tree’s often have a parent pointer. If a good standard collection class library is provided that is smart enough to take care of it’s own memory management outside that of the compiler or even inside the compiler, but providing hints such as “weak”, then potentially circular objects become very rare.

The gripping conclusion

So, therefore I suggest that the best way for doing garbage collection is at compile time. Java and .NET can’t do this, as at compile time they are only compiling to a virtual machine which is untrusted. However, it should be possible for a compiler to recognise:

  • If an object has a lifetime of a block of code and no pointers leak out of that code, then the object can be easily allocated on the stack.
  • If an object must be allocated on the heap, often based on the PC, the refcount can be implicit. In particular, if a block of code doesn’t “leak” any references to the object, then the refcount need not be updated. If it does need to be updated, then it needs to be updated by the sum of the leaked references
  • If an object can form a circular loop, then provide a warning to the user that this can happen and fall back on a full garbage collection system, perhaps with hints from the programmer. (eg, I would treat all blocks that can potentially form loops as “long lived” and bypass a nursary)

The memory manager should also be able to do a much better job as it knows more about what the program is doing. For instance, if you are malloc(3)ing a fixed sized object, then flag it to be freelisted later, as it’s more likely that you are going to allocate objects of the same size again later

One Response to “Garbage Collection”

  1. Remosi's Rants » Why an interpreted language isn’t for us. Says:

    rd - Discusses why trying to be compatible with other people will never win you the war. Garbage collection - Dynamic Garbage collection vs static garbage col […]