Getting to Know the Ruby Standard Library – WeakRef

This article has been republished on Monkey and Crow.

Once again, we shall dive into the depths of ruby’s standard library and fish WeakRef out. First we’ll talk about garbage collection, then try WeakRef out, and then investigate how it works.

In the course of a normal script’s execution, tons of objects are created, used, and then thrown away. When an object is not referenced by any other objects, the garbage collector can safely toss it. Occasionally we may want to keep track of an object, but not insist on it staying in memory. I can think of two cases when you want to have a reference to an object, but don’t mind if it gets collected. The first is if you are maintaining a cache, but its contents are not essential:

  require 'weakref'

  class TransientCache < Hash
    class AmbivalentRef < WeakRef
      def __getobj__
        super rescue nil
      end
    end
  
    def []= key, object
      super(key, AmbivalentRef.new(object))
    end
  
    def [] key
      ref = super(key)
      self.delete(key) if !ref.weakref_alive?
      ref
    end
  
  end

This cache is just a normal Hash, but any time we put an object into it, we store a weak reference to that object. For the moment ignore the AmbivalentRef, I’ll get to that when we look at how WeakRef is implemented. Grab the source for TransientCache, and lets try it out:

  c = TransientCache.new

  x = "important"
  c[:a] = "Hello"
  c[:b] = Object.new
  c[:x] = x

  # Lets see what was in the cache
  c[:a].inspect #=> "Hello"
  c[:b].inspect #=> #<Object:0x000001009fc780>
  c[:x].inspect #=> "important"               

  ObjectSpace.garbage_collect
  
  # Now what's left?
  c[:a].inspect #=> nil
  c[:b].inspect #=> #<Object:0x000001009fc780>
  c[:x].inspect #=> "important"

We manually forced the garbage collector to run, and as you can see it removed the value stored in :a. We expect the value in : x to stay since the variable x is still in scope. What about :b though? The garbage collector may or may not throw out old objects, in this case it looks like it didn’t feel like tossing the value stored in :b just yet.

Weak references are also useful when implementing listeners, observers, or the Pub/Sub pattern. You may wish to read Dan Schultz’s summary even though it pertains to ActionScript, the same principles hold true for ruby.

Now that you know a bit about what WeakRef is good for, let's take a look at how it works. If you have Qwandry installed, follow along with qw weakref.

class WeakRef < Delegator
  ...
  def initialize(orig)
    @__id = orig.object_id
    ObjectSpace.define_finalizer orig, @@final
    ObjectSpace.define_finalizer self, @@final
    ...

We see that WeakRef inherits from Delegator, which sets up a pattern for one object to pass method calls on to another object without the invoker needing to know. A Delegator takes an object to wrap as a parameter, which we see is orig in this case. WeakRef keeps the id of the original object so that it can identify it later. The next to calls are finalizers, which you may not have come across in ruby before. Finalizers are called when the garbage collector frees an object. In this case @@final will be called when either orig or the WeakRef is collected. We’ll come back to to the finalizers after we see what else WeakRef is keeping track of:

    @@mutex.synchronize {
      @@id_map[@__id] = [] unless @@id_map[@__id]
    }
    @@id_map[@__id].push self.object_id
    @@id_rev_map[self.object_id] = @__id
    super
  end

A mutex is used to make sure that multiple threads attempting to modify @@id_map don’t all occur at the same time. @@id_map is going to store an array of references to WeakRef instances by id. Meanwhile, @@id_rev_map stores a reference back to the original object's id. You might be wondering what all this record keeping is for by now. If WeakRef just contained a reference to the wrapped object directly, then the WeakRef would prevent the original object from getting collected, however @@id_map and @@id_rev_map only store references as integers that can be used to lookup the original. This indirection is at the core of how weak references are implemented in ruby. Since we are thinking about this indirection, lets see what happens when you access a WeakRef:

  def __getobj__
    unless @@id_rev_map[self.object_id] == @__id
      Kernel::raise RefError, "Invalid Reference - probably recycled", Kernel::caller(2)
    end
    begin
      ObjectSpace._id2ref(@__id)
    ...

__getobj__ is called by Delegator whenever it wants to obtain the wrapped object, we can see here that WeakRef double checks its internal mapping of ids with @@id_rev_map[self.object_id] == @__id. If the resulting id isn’t what the WeakRef expected, it will throw an exception, otherwise it uses ObjectSpace._id2ref to fetch original object. You can try this yourself in irb:

"Cats".object_id                    #=> 2152560480 
ObjectSpace._id2ref 2152560480      #=> "Cats"

So now we know how WeakRef fetches its original object back for you without keeping a direct reference that would prevent it from being garbage collected.

Now lets take a look at that finalizer referenced in the initialize. There are two possible situations, either an original object was destroyed, or a WeakRef was destroyed. Lets look at the first case:

      ...
      rids = @@id_map[id]
      if rids
	      for rid in rids
	        @@id_rev_map.delete(rid)
	      end
	      @@id_map.delete(id)
      end
      ...

In this case, the id is in @@id_map, and we get back a list of WeakRef ids that mapped to that original object. Each of those reference ids will be removed from the list of WeakRef instances (@@id_rev_map), and then finally their array is removed. Now for the second case where a WeakRef was collected:

      rid = @@id_rev_map[id]
      if rid
	      @@id_rev_map.delete(id)
	      @@id_map[rid].delete(id)
	      @@id_map.delete(rid) if @@id_map[rid].empty?
      end

If this was the situation, then it’s entry in the list of WeakRef instances is removed. If this was the last WeakRef pointing to the original object, then then mapping is removed.

So now that you know the ins and outs of WeakRef, let’s look back at the original example. We defined an AmbivalentRef which just returned nil instead of an exception if it wasn't found. The TransientCache sample didn’t really care if the original object had been collected, if it has, then we just return a cache miss. If you use a WeakRef in your code, you should be aware that an exception will be raised if you try to access a collected object.

About these ads

9 Comments

Filed under ruby, stdlib

9 responses to “Getting to Know the Ruby Standard Library – WeakRef

  1. “I can think of two cases when you want to have a reference to an object, but don’t mind if it gets collected. The first is if you are maintaining a cache, but its contents are not essential”

    And the second?

  2. Adam Sanderson

    Yeah, I wasn’t terribly obvious about that. The second is “Weak references are also useful when implementing listeners, observers, or the Pub/Sub pattern”.

  3. Great article Adam. I can think of a third case for using weak references: when your intention is to write code that can inspect which objects are in use at run time, and can find out when an object is garbage collected.

  4. asotin

    But where is “end of life”?

  5. asotin

    its wonderfull!

  6. Brian Durand

    A word of warning about WeakRef..

    In Ruby 1.8 the Delegator class is extraordinarily heavy as it redefines all the methods of the target object. Wrapping a String with a Delegator ends up allocating ~2800 objects and using 90K of memory. This can make WeakRef unusable if you need a lot of them as they are very slow to create and can end up using more memory than the referenced objects.

    Ruby 1.9 fixes Delegator, but has a bug in WeakRef which can return a different object than the one originally referenced.

    You can use the ref gem to get around these issues.

    • Adam Sanderson

      Thanks for mentioning that Brian. Just out of curiosity, what is the use case for ref’s Ref::StrongReference?

      • Brian Durand

        By mixing it with weak references, you can have a consistent interface to your referencing an object. I don’t anticipate it will get much use.

        The specific use case I had in developing this gem was I needed to keep track of a list of objects and eventually invoke a cleanup callback on them. In some cases the callback had side effects and needed to be called every time. In other cases, the callback was only restoring the internal state of an object. The strong reference allowed me to treat these cases the same since I’d retain a strong reference to the objects with mandatory callbacks and weak references to the objects that the callback wouldn’t matter if they weren’t referenceable.

  7. Pingback: Getting to Know the Ruby Standard Library – Delegator | End of Line