This article has been republished on Monkey and Crow.
TSort
is an interesting part of the ruby standard library that performs topological sorting. Topological sorting is used in package management, analyzing source code, and evaluating prerequisites. RubyGems uses TSort
to determine what order to install gems in when there are multiple dependencies. Let’s take a look at TSort
in action.
Here is an example of a JobRunner
which will take a bunch of tasks with prerequisites, and then tell you which order they should be performed in:
require 'tsort' class JobRunner include TSort Job = Struct.new(:name, :dependencies) def initialize() @jobs = Hash.new{|h,k| h[k] = []} end alias_method :execute, :tsort def add(name, dependencies=[]) @jobs[name] = dependencies end def tsort_each_node(&block) @jobs.each_key(&block) end def tsort_each_child(node, &block) @jobs[node].each(&block) end end if __FILE__ == $0 runner = JobRunner.new runner.add('breakfast', ['serve']) runner.add('serve', ['cook']) runner.add('cook', ['buy eggs','buy bacon']) puts runner.execute end
Running this file will show that you need to buy the eggs and bacon before you can cook and then serve breakfast.
buy eggs
buy bacon
cook
serve
breakfast
Hardly an achievement of complex reasoning, but as the number of prerequisites grow, this becomes vastly more useful.
TSort
Let’s take a look at the source (qw tsort
if you have qwandry). The first thing to notice is that TSort is a module instead of a class. This means that instead of extending it, or calling it directly we will include it in another class. Notice that in the JobRunner
above we called include TSort
at the very beginning. This means our class will now include TSort
’s functionality, but in order for it to work we need to implement a few methods. From the TSort rdoc:
TSort requires two methods to interpret an object as a graph, tsort_each_node and tsort_each_child. * tsort_each_node is used to iterate for all nodes over a graph. * tsort_each_child is used to iterate for child nodes of a given node.
Our implementation of tsort_each_node
iterates over each job, while tsort_each_child
will iterate over each prerequisite for the job. These methods allow TSort
to provide two useful methods. The first is strongly_connected_components
, which will return each node or sets of nodes forming a circular dependency. The second is tsort
which we used above to return the nodes in a sorted order so that each of their prerequisites are satisfied. Lets take a look at what each of these does:
def tsort result = [] tsort_each {|element| result << element} result end
This is simple enough, it just collects all the results from tsort_each
and returns them. Following this thread we see that tsort_each
then iterates over the results of each_strongly_connected_component
, so lets take a closer look at that, and perhaps we can figure out how TSort
works.
def each_strongly_connected_component # :yields: nodes id_map = {} stack = [] tsort_each_node {|node| unless id_map.include? node each_strongly_connected_component_from(node, id_map, stack) {|c| yield c ...
From this snippet of code we can see that TSort
is going to iterate over each of the nodes (jobs in our case), while doing this it will keep track of the position each node has in the stack with the id_map
hash. Next let’s see what each_strongly_connected_component_from
does with the node.
def each_strongly_connected_component_from(node, id_map={}, stack=[]) minimum_id = node_id = id_map[node] = id_map.size stack_length = stack.length stack << node ...
We can see that id_map
and stack
are passed around to keep track of what has been seen already. Jumping to the end of the method, we see that when we finish, TSort
is going to yield back an array of nodes:
if node_id == minimum_id component = stack.slice!(stack_length .. -1) component.each {|n| id_map[n] = nil} yield component end
We know from our example above that this eventually returns all the nodes in their proper order, and tsort_each
expects the arrays to have a length of 1 so we can assume that if everything goes correctly stack.slice!(stack_length .. -1)
should return an array with a length of 1. After it returns this array, it clears the entries from the id_map
. We can deduce that TSort
sorts the nodes by pushing items onto the stack, and then returning the top of it when there are no remaining prerequisites for the node.
Now let’s look at the main part of this method:
... tsort_each_child(node) {|child| if id_map.include? child child_id = id_map[child] minimum_id = child_id if child_id && child_id < minimum_id else sub_minimum_id = each_strongly_connected_component_from(child, id_map, stack) {|c| yield c } minimum_id = sub_minimum_id if sub_minimum_id < minimum_id end } ...
For each child we see that if we have already evaluated the node. If its index in the stack is less than our current minimum_id
we treat it as the new minimum. Otherwise we recursively call each_strongly_connected_component_from
for this node, which will push the child onto the stack and push its prerequisites onto the stack as well. Once there are no more nodes left to push on, the loop exits and the node will get popped off. If there is a cycle (nodes that depend on each other), all the nodes in the cycle will be returned. If this is starting to look familiar to you, this essentially amounts to a depth first search of all the nodes.
Recap
Our exploration of TSort
has revealed a useful library, and shown us one way to perform a depth first search on a data structure that may include cycles, which is pretty neat. Curious about other applications of TSort
? The strongly_connected_components
that TSort
returns also happens to be useful for detecting tightly coupled methods and classes which depend on each other which is useful when doing refactoring and static analysis of source code. This is just one of the handy things we can do with ruby’s standard library.
Pingback: Getting to Know the Ruby Standard Library – TSort | Ruby Here Blog
Thanks for the article! I hadn’t heard of this library before.
A quick note: In Ruby 1.9.2, the code as written chokes with
can't add a new key into hash during iteration (RuntimeError)
. Fortunately, it’s an easy fix. The body oftsort_each_child
should look like this:@jobs[node].each(&block) if @jobs.has_key?(node)
Hmm, interesting, that must have changed since ruby 1.9.1. Thanks for the comment and fix.
In 1.9.2 you can pass a default value when instantiating a Hash, this should keep the original sense: @jobs = Hash.new []
I have to say, your “Getting to Know the Ruby Standard Library” series is excellently written and I look forward to each post. Great job!
Another superb post, thanks! Your series is a real public service.
Thanks for the great article.
What is the purpose of this line?
Job = Struct.new(:name, :dependencies)
The purpose of that line is to make me look like an idiot. Initially in the example I was going to store the jobs as objects, and a struct seemed useful, but I never did.
Pingback: 098 RR DRb with Davy Stevenson