August 2014

Reference Counting, Finalization and Resurrection

I've been enamoured of reference counting since devouring Smalltalk-80: The Language and its Implementation in 1987, and actively using it since reading Coplien's Advanced C++ in 1993.

For compatibility with the Object Pascal memory model, which ameliorated the lack of physical memory management, C++ on the Mac generally used handle-based heap objects. This had serious structural issues because they just weren't firmly located in memory.

Ultimately regular pointer-based heap objects won out even on Mac and reference counting became usable. I encapsulated the necessary activities in the smart pointer ZRef and the object base class ZCounted, which stabilized in the late 1990s after some struggles with the advent of preemptive multithreading.


All of which is to justify why I haven't simply embraced the C++11 standard library's smart_ptr — by implementing reference counting for myself I think I stumbled over every possible mistake, but also found a really useful mechanism I haven't encountered elsewhere. In the literature reference counting is considered to have two fundamental problems — reference cycles and object resurrection. In my experience a cycle (ultimately) indicates a broken data model.

However, inadvertent resurrection is a significant problem.


The issue is that we may want to take additional action when an entity's count transitions from one to zero, but if that action itself needs to treat the entity as counted, then the count will necessarily increment from zero, must ultimately transition down again, and we're into recursive execution of finalization.

Another situation is when we don't want simply to release the entity when it's no longer in general circulation, but want to change its treatment, perhaps by placing it in a cache. So we actually still have a physical reference to the entity, but managed with some different mechanism. This kind of changeover is very susceptible to thread issues. We could acquire a lock every time we manipulate any reference count, but that utterly kills performance and creates a pervasive and non-obvious source of deadlocks.


The solution is in the statement — we know to take action when the reference count changes from one to zero. What if we don't change the count in that case? Remember we're tracking the number of extant physical references. Seeing the one to zero transition tells us there's no other reference to the object than the one in our hands at that moment. So any further reference originates from the one we're considering finalizing. Rather than the typical atomic-decrement-and-test, releasing a reference uses a get, test and CAS sequence. The finalization code is now free to pass the entity to other code and no (legal) operation can again take the count from one to zero. Ultimately execution returns to the finalization. If the count has returned to (or remained) one, then we release the entity, otherwise do a simple atomic decrement and exit.

Projects by Category

Recent Projects

Project Archives