If you have enjoyed any articles on this blog, then come visit me on Monkey and Crow where I am once again actively posting about Ruby, Javascript, and development in general.
A Quick Look at CodeMirror 2
I recently wrote a brief overview of three interesting browser based editors. Marijn Haverbeke the author of CodeMirror mentioned that CodeMirror 2 would be much faster, and I have to say I’m impressed. CodeMirror 2 is using virtual rendering to display and format code much more quickly.
This trick is achieved by only rendering rows as they are needed. As they scroll off the top, rows are removed and and new ones are added to bottom of the scrolling element. This is becoming more common as web applications tackle larger amounts of data. I have also seen this technique used for displaying large tables of data in libraries such as SlickGrid, UkiJS, and we use a variant of this at LiquidPlanner for displaying large and complex project schedules.
CodeMirror 2 also has some interesting demos showing off code completion, search and replace, and parsing embedded languages. If any of this sounds interesting to you, I suggest you skim the manual and watch Marijn’s CodeMirror 2 repository on GitHub. It looks like it’s shaping up to be a great project.
Filed under Uncategorized
3 Browser Based Editors to Watch
In the last couple of weeks, there has been a fair amount of talk about three web based editors. Google has started embeding CodeMirror into Google Project Hosting, IBM is developing Orion, and Mozilla is merging Bespin/SkyWriter into Ace, Cloud9’s editor. Everything from office suites to music players have been migrating online, it seems likely that these editors will be the next category of apps to live in your browser. Will they replace vi, emacs, or TextMate? Probably not any time soon, but they do offer some interesting opportunities, let’s take a look at each of them.
CodeMirror
CodeMirror seems to the be one of the oldest browser based editors. If you’re interested in how it’s implemented, there is a great article covering CodeMirror’s creation. The focus so far seems to be around making a decent editor you can embed in existing projects. Of the three editors, it seemed to be the slowest, but also the most versatile, well documented, and mature editor. It is also worth noting that google has been using it for a while in their KML playground, patch submission for Google Project, and the Google API Playground.
Orion
Little has been said so far about Orion, and it’s obviously in an early stage, but it sounds like some of the Eclipse team is working on it, which is interesting news. I couldn’t find an online demo, however it’s quite easy to download and run the Orion server. The editor itself is pretty snappy, and already has some unique features such as code annotation. Where as CodeMirror is aimed at embedding in existing pages, Orion is setting out to be a full IDE which handles loading and saving files as well as managing user accounts. The general interface surrounding the editor is quite rudimentary at this point, but what is working so far looks good.
Ace / Cloud9
As I mentioned above, Mozilla’s Bespin/SkyWriter is being merged into Ace, the editor for the Cloud9 IDE . Bespin was a very interesting project, and I think Ace shows some great potential. The editor itself is quite fast, and seems natural to use. There isn’t much information out there concerning the full IDE, so I tried signing up for the hosted beta, but found that you can already download and install Cloud9 the IDE yourself. The Cloud9 IDE let’s you easily serve up any directory and edit it through your browser. The interface is surprisingly polished, and seems like it could really offer up a good experience in the future.
So where does all this leave us? None of these editors have the additional functionality that we expect in our day to day editors, but the promise of being able to edit files quickly within your browser is interesting. These editors could decrease some of the barriers to web applications, perhaps becoming as useful as tools like Firebug. I’m not about to throw out TextMate, but as a web developer, I think these editors will fill a useful niche in the future.
Filed under development, Web Development
The Strange Ruby Splat
This article has been republished on Monkey and Crow.
As of ruby 1.9, you can do some pretty odd things with array destructuring and splatting. Putting the star before an object invokes the splat operator, which has a variety of effects. First we’ll start with some very useful examples, then we will poke around the dark corners of ruby’s arrays and the splat operator.
Method Definitions
You can use a splat in a method definition to gather up any remaining arguments:
def say(what, *people) people.each{|person| puts "#{person}: #{what}"} end say "Hello!", "Alice", "Bob", "Carl" # Alice: Hello! # Bob: Hello! # Carl: Hello!
In the example above, what
will get the first argument, then *people
will capture however many other arguments you pass into say
. A real world example of this can be found in the definition of Delegator#method_missing
. A common ruby idiom is to pass a hash in as the last argument to a method. Rails defines an array helper Array#extract_options!
to make this idiom easier to handle with variable argument methods, but you can actually get a similar behavior using a splat at the beginning of the argument list:
def arguments_and_opts(*args, opts) puts "arguments: #{args} options: #{opts}" end arguments_and_opts 1,2,3, :a=>5 # arguments: [1, 2, 3] options: {:a=>5}
Now this example only works if you are guaranteed to pass in a hash at the end, but it illustrates that the splat does not need to always come at the end of a method’s parameters. There are also some other odd uses for the splat in method defitions, for instance this is valid:
def print_pair(a,b,*) puts "#{a} and #{b}" end print_pair 1,2,3,:cake,7 # 1 and 2
Outside of letting you mimic javascript calling conventions, I’m not sure what the practical use is:
function print_pair(a,b){ console.log(a + " and " + b); } print_pair(1,2,3, "cake", 7); //=> 1 and 2
Calling Methods
The splat can also be used when calling a method, not just when defining one. If you wanted to use the say
method from above, but you have your list of people in an array, the splat can help you out:
people = ["Rudy", "Sarah", "Thomas"] say "Howdy!", *people # Rudy: Howdy! # Sarah: Howdy! # Thomas: Howdy!
In this case, the splat converted the array into method arguments. It doesn’t have to be used with methods that take a variable number of arguments though, you can use it in all kinds of other creative ways:
def add(a,b) a + b end pair = [3,7] add *pair # 7
Array Destructuring
First of all, lets quickly cover a few things you can do without splatting:
a,b = 1,2 # Assign 2 values at once a,b = b,a # Assign values in parallel puts "#{a} and #{b}" # 2 and 1
With the above samples in mind, let’s try some fancier stuff. You can use splats with multiple assignment to extract various elements from a list:
first, *list = [1,2,3,4] # first= 1, list= [2,3,4] *list, last = [1,2,3,4] # list= [1,2,3], last= 4 first, *center, last = [1,2,3,4] # first= 1, center= [2,3], last=4 # Unquote a String (don't do this) _, *unquoted, _ = '"quoted"'.split(//) puts unquoted.join # quoted
Array Coercion
If for some reason the previous examples seemed like great ideas to you, you’ll be thrilled to know that the splat can also be used to coerce values into arrays:
a = *"Hello" #=> ["Hello"] "Hello".to_a #=> NoMethodError: undefined method `to_a' for "Hello":String a = *(1..3) #=> [1, 2, 3] a = *[1,2,3] #=> [1, 2, 3]
This can be a nice way to make sure that a value is always an array, especially since it will handle objects that do not implement to_a
The splat is a wily beast popping up in odd corners of ruby. I rarely actually use it outside of in method definitions and method calls, but it’s interesting to know that it is available. Have you found any useful idioms that make use of the splat? I would love to hear about them.
Filed under development, ruby
Getting to Know the Ruby Standard Library – Delegator
This article has been republished on Monkey and Crow.
Today we will examine ruby’s implementation of the Proxy pattern, with the Delegator
class. We have already seen and example of it in use with WeakRef
. A Delegator
can be used when you want to intercept calls to some object without concerning the caller. For example we can use a Delegator
to hide the latency from http calls or other slow code:
require 'delegate' class Future < SimpleDelegator def initialize(&block) @_thread = Thread.start(&block) end def __getobj__ __setobj__(@_thread.value) if @_thread.alive? super end end
The Future
will invoke whatever is passed to it without blocking the rest of your code until you try to access the result. You could use it to issue several http requests and then process them later:
require 'net/http' # These will each execute immediately google = Future.new{ Net::HTTP.get_response(URI('http://www.google.com')).body } yahoo = Future.new{ Net::HTTP.get_response(URI('http://www.yahoo.com')).body } # These will block until their requests have loaded puts google puts yahoo
In this example, google
and yahoo
will both spawn threads, however when we try to print them, the Future
instance will block until the thread is done, and then pass on method calls to the result of our http call. You can grab the code from github and give it try yourself. Lets take a look at how Delegator
works. Open up the source, and follow along, if you have Qwandry installed qw delegate
will do the trick.
The file delegate.rb
defines two classes, Delegator
, an abstract class, and SimpleDelegator
which implements the missing methods in Delegator
. Let’s look at the first few lines of Delegator
:
class Delegator [:to_s,:inspect,:=~,:!~,:===].each do |m| undef_method m end ...
You will notice that a block of code is being executed as part of the ruby class definition. This is entirely valid, and will be executed in the scope of the Delegator
class. undef_method
is called to remove the default implementations of some common methods defined by Object
. We will see why in a little bit. Next up is the initializer:
def initialize(obj) __setobj__(obj) end
The method __setobj__
might look strange, but it is just a normal method with an obscure name. When a Delegator
is instantiated, the object it is delegating to is stored away with __setobj__
. Next look at how Delegator
implements method_missing
, and you’ll see what all the prep work was for:
def method_missing(m, *args, &block) ... target = self.__getobj__ unless target.respond_to?(m) super(m, *args, &block) else target.__send__(m, *args, &block) end ...
method_missing
is defined on Object
, and will be called any time that you try to call method that is not defined. This is the key to how Delegator
works, any methods not defined on Delegator
are handled by this method. Before we dive into this, we should be aware of what the arguments to method_missing
are. The m
is the missing method’s name as a symbol. The args
are zero or more arguments that would have been passed to that method. The &block
is the block that the method was called with, or nil
if no block was given.
The first thing method_missing
does here is call __getobj__
. We’ve already seen __setobj__
, it sets the object that Delegator
wraps, so we can reason that __getobj__
gets it. Once the wrapped object has been obtained, it checks to see if that object implements the method we want to call. If not, then we call Object#method_missing
, which is going to raise an exception. If the wrapped object does implement our method, then it passes it on. The methods that were undefined earlier are guaranteed to be passed on to the wrapped object, and the odd looking __getobj__
and __setobj__
are unlikely to collide with any other object’s methods. Ruby’s flexibility really shines in this example, in just a few lines of code, we get a very useful class that can be used to implement advanced behavior.
Now let’s figure out why there are two classes defined here. If you look down at Delegator#__setobj__
and Delegator#__getobj__
we’ll see something interesting:
def __getobj__ raise NotImplementedError, "need to define `__getobj__'" end ... def __setobj__(obj) raise NotImplementedError, "need to define `__setobj__'" end
Neither of these methods are implemented, effectively making Delegator
an abstract class. For connivence, SimpleDelegator
implements them in a reasonable manner. There are a few other special methods defined on Delegator
as well:
def ==(obj) return true if obj.equal?(self) self.__getobj__ == obj end
First Delegator
checks for equality against itself, then it checks it against the wrapped object. This way ==
will return true if you pass in either the wrapped object, or the delegate itself. In this same manner you can intercept calls to specific methods, or override Delegator#method_missing
to intercept all calls.
We have learned about a powerful design pattern that is easily implemented in ruby. We also saw a very good use for ruby’s method_missing
. How have you used Delegator
or found similar proxy patterns in ruby?
Getting to Know the Ruby Standard Library – WeakRef
This article has been republished on Monkey and Crow.
Once again, we shall dive into the depths of ruby’s standard library and fish WeakRef
out. First we’ll talk about garbage collection, then try WeakRef
out, and then investigate how it works.
In the course of a normal script’s execution, tons of objects are created, used, and then thrown away. When an object is not referenced by any other objects, the garbage collector can safely toss it. Occasionally we may want to keep track of an object, but not insist on it staying in memory. I can think of two cases when you want to have a reference to an object, but don’t mind if it gets collected. The first is if you are maintaining a cache, but its contents are not essential:
require 'weakref' class TransientCache < Hash class AmbivalentRef < WeakRef def __getobj__ super rescue nil end end def []= key, object super(key, AmbivalentRef.new(object)) end def [] key ref = super(key) self.delete(key) if !ref.weakref_alive? ref end end
This cache is just a normal Hash, but any time we put an object into it, we store a weak reference to that object. For the moment ignore the AmbivalentRef
, I’ll get to that when we look at how WeakRef
is implemented. Grab the source for TransientCache, and lets try it out:
c = TransientCache.new x = "important" c[:a] = "Hello" c[:b] = Object.new c[:x] = x # Lets see what was in the cache c[:a].inspect #=> "Hello" c[:b].inspect #=> #<Object:0x000001009fc780> c[:x].inspect #=> "important" ObjectSpace.garbage_collect # Now what's left? c[:a].inspect #=> nil c[:b].inspect #=> #<Object:0x000001009fc780> c[:x].inspect #=> "important"
We manually forced the garbage collector to run, and as you can see it removed the value stored in :a
. We expect the value in : x
to stay since the variable x
is still in scope. What about :b
though? The garbage collector may or may not throw out old objects, in this case it looks like it didn’t feel like tossing the value stored in :b
just yet.
Weak references are also useful when implementing listeners, observers, or the Pub/Sub pattern. You may wish to read Dan Schultz’s summary even though it pertains to ActionScript, the same principles hold true for ruby.
Now that you know a bit about what WeakRef
is good for, let’s take a look at how it works. If you have Qwandry installed, follow along with qw weakref
.
class WeakRef < Delegator ... def initialize(orig) @__id = orig.object_id ObjectSpace.define_finalizer orig, @@final ObjectSpace.define_finalizer self, @@final ...
We see that WeakRef
inherits from Delegator
, which sets up a pattern for one object to pass method calls on to another object without the invoker needing to know. A Delegator
takes an object to wrap as a parameter, which we see is orig
in this case. WeakRef
keeps the id of the original object so that it can identify it later. The next to calls are finalizers, which you may not have come across in ruby before. Finalizers are called when the garbage collector frees an object. In this case @@final
will be called when either orig
or the WeakRef
is collected. We’ll come back to to the finalizers after we see what else WeakRef is keeping track of:
@@mutex.synchronize { @@id_map[@__id] = [] unless @@id_map[@__id] } @@id_map[@__id].push self.object_id @@id_rev_map[self.object_id] = @__id super end
A mutex is used to make sure that multiple threads attempting to modify @@id_map
don’t all occur at the same time. @@id_map
is going to store an array of references to WeakRef
instances by id. Meanwhile, @@id_rev_map
stores a reference back to the original object’s id. You might be wondering what all this record keeping is for by now. If WeakRef
just contained a reference to the wrapped object directly, then the WeakRef
would prevent the original object from getting collected, however @@id_map
and @@id_rev_map
only store references as integers that can be used to lookup the original. This indirection is at the core of how weak references are implemented in ruby. Since we are thinking about this indirection, lets see what happens when you access a WeakRef
:
def __getobj__ unless @@id_rev_map[self.object_id] == @__id Kernel::raise RefError, "Invalid Reference - probably recycled", Kernel::caller(2) end begin ObjectSpace._id2ref(@__id) ...
__getobj__
is called by Delegator
whenever it wants to obtain the wrapped object, we can see here that WeakRef
double checks its internal mapping of ids with @@id_rev_map[self.object_id] == @__id
. If the resulting id isn’t what the WeakRef
expected, it will throw an exception, otherwise it uses ObjectSpace._id2ref
to fetch original object. You can try this yourself in irb
:
"Cats".object_id #=> 2152560480 ObjectSpace._id2ref 2152560480 #=> "Cats"
So now we know how WeakRef
fetches its original object back for you without keeping a direct reference that would prevent it from being garbage collected.
Now lets take a look at that finalizer referenced in the initialize. There are two possible situations, either an original object was destroyed, or a WeakRef
was destroyed. Lets look at the first case:
... rids = @@id_map[id] if rids for rid in rids @@id_rev_map.delete(rid) end @@id_map.delete(id) end ...
In this case, the id is in @@id_map
, and we get back a list of WeakRef
ids that mapped to that original object. Each of those reference ids will be removed from the list of WeakRef
instances (@@id_rev_map
), and then finally their array is removed. Now for the second case where a WeakRef
was collected:
rid = @@id_rev_map[id] if rid @@id_rev_map.delete(id) @@id_map[rid].delete(id) @@id_map.delete(rid) if @@id_map[rid].empty? end
If this was the situation, then it’s entry in the list of WeakRef
instances is removed. If this was the last WeakRef
pointing to the original object, then then mapping is removed.
So now that you know the ins and outs of WeakRef
, let’s look back at the original example. We defined an AmbivalentRef
which just returned nil instead of an exception if it wasn’t found. The TransientCache
sample didn’t really care if the original object had been collected, if it has, then we just return a cache miss. If you use a WeakRef
in your code, you should be aware that an exception will be raised if you try to access a collected object.
Getting to Know the Ruby Standard Library – Timeout
This article has been republished on Monkey and Crow.
I asked for suggestions about what to cover next, and postmodern suggested the Timeout
library among others. Timeout
lets you run a block of code, and ensure it takes no longer than a specified amount of time. The most common use case is for operations that rely on a third party, for instance net/http
uses it to make sure that your script does not wait forever while trying to connect to a server:
def connect ... timeout(@open_timeout) { TCPSocket.open(conn_address(), conn_port()) } ...
You could also use Timeout
to ensure that processing a file uploaded by a user does not take too long. For instance if you allow people to upload files to your server, you might want to limit reject any files that take more than 2 seconds to parse:
require 'csv' def read_csv(path) begin timeout(2){ CSV.read(path) } rescue Timeout::Error => ex puts "File '#{path}' took too long to parse." return nil end end
Lets take a look at how it works. Open up the Timeout
library, you can use qw timeout
if you have Qwandry installed. Peek at the timeout
method, it is surprisingly short.
def timeout(sec, klass = nil) #:yield: +sec+ return yield(sec) if sec == nil or sec.zero? ...
First of all, we can see that if sec
is either 0
or nil
it just executes the block you passed in, and then returns the result. Next lets look at the part of Timeout
that actually does the timing out:
... x = Thread.current y = Thread.start { sleep sec x.raise exception, "execution expired" if x.alive? } return yield(sec) ...
We quickly see the secret here is in ruby’s threads. If you’re not familiar with threading, it is more or less one way to make the computer do two things at once. First Timeout
stashes the current thread in x
. Next it starts up a new thread that will sleep for your timeout period. The sleeping thread is stored in y
. While that thread is sleeping, it calls the block passed into timeout. As soon as that block completes, the result is returned. So what about that sleeping thread? When it wakes up it will raise an exception, which explains the how timeout
stops code from running forever, but there is one last piece to the puzzle.
... ensure if y and y.alive? y.kill y.join # make sure y is dead. end end ...
At the end of timeout
there is an ensure
. If you haven’t come across this yet, it is an interesting feature in ruby. ensure
will always be called after a method completes, even if there is an exception. In timeout
the ensure
kills thread y
, the sleeping thread, which means that it won’t raise an exception if the block returns, or throws an exception before the thread wakes up.
It turns out that Timeout
is a useful little library, and it contains some interesting examples of threading and ensure
blocks. If there is any part of the standard library you are curious about or think is worthy of some more coverage, let me know!
Qwandry 0.1.0 – Now Supporting More Languages
I just finished updating Qwandry so that it can support any number of other languages or packaging systems. Want to use perl, python, or node with Qwandry? No problem:
qw -r python numpy # opens python's numpy library
qw -r perl URI # open perl's URI library
qw -r node express # open express if it is installed for node
Qwandry will probe these dynamic languages and detect their load paths. This is just the first step towards making code more accessible to people. I would love to hear what you think of it, and if you have any suggestions.
Go ahead and install it with ruby’s package manager:
gem install qwandry
Warning
If you had customized Qwandry before, this release will break your custom init.rb
file. Configuration commands looked like this:
add 'projects', '~/toys'
add 'projects', '~/samples'
Now they look slightly different:
register 'projects' do
add '~/toys'
add '~/samples'
end
Awesome
By wrapping the commands that actually add paths to Qwandry’s search path in a block, we can defer slow operations like probing. Furthermore, we now only need to build up the paths for what you are looking for. By deferring configuration until it is needed, we can add support for any language or package scheme we like without slowing Qwandry down.
So what would you like to see Qwandry support next?
Filed under development, ruby
Getting to Know the Ruby Standard Library – Pathname
This article has been republished on Monkey and Crow.
Pathname
is useful library that demonstrates a good refactoring: “Replace Data Value With Object”. In this case the data value is a String
representing a path. Pathname
wraps that String
and provides a wide variety of methods for manipulating paths that would normally require you to call the File
, FileStat
, Dir
, and IO
modules. You might even be using it already without knowing as it shows up in Rails’ paths. First we will see a short example of Pathname
in action, and then we will look at some of the patterns it employs.
Example of Pathname
require 'pathname' path = Pathname.new('.') # current directory path += 'tests' # ./tests path += 'functional' # ./tests/functional path = path.parent # ./tests path += 'config.yaml' # ./tests/config.yaml path.read # contents of ./tests/config.yaml path.open('w'){|io| io << "env: test"} path.read # "env: test" path.children{|p| puts p.inspect} # prints all the files/directories in ./tests
Pathname
provides a nicer interface for interacting with the filesystem, now lets take a look at how it works. As usual, I suggest opening up the file for yourself and following along, if you have Qwandry installed you can type qw pathname
.
Pathname
We will start with how a Pathname
gets created:
def initialize(path) path = path.__send__(TO_PATH) if path.respond_to? TO_PATH @path = path.dup ...
The main thing Pathname#initialize
does is store a copy of the path
argument, while optionally calling TO_PATH
on it, we’ll come back to this in a moment. Since strings are mutable in ruby, dup
is called on the path
argument. This ensures that if you later call path.gsub!('-','_')
, or any other method that mutates the string, Pathname
‘s copy will remain the same. This is a good practice whenever you are dealing with mutable data. Now lets take a look at TO_PATH
:
if RUBY_VERSION < "1.9" TO_PATH = :to_str else # to_path is implemented so Pathname objects are usable with File.open, etc. TO_PATH = :to_path end
This code invokes special behavior based on the current RUBY_VERSION
. Ruby 1.9 will set TO_PATH
to :to_path
, and call that in the initializer above if the object being passed in implements to_path
. A quick look at the RDocs show that File
implements to_path
, so we can pass files directly into Pathname
. Now let’s take a look at how Pathname
makes use of the rest of ruby’s file libraries.
def read(*args) IO.read(@path, *args) end
The definition of Pathname#read
is quite simple, it just takes the path you passed in and uses it to call IO
, so where you might have done IO.read(path)
with Pathname
you can just do path.read
. This pattern is repeated in Pathname
for many of the common filesystem operations, for instance take a look at mtime
:
def mtime() File.mtime(@path) end
We see the same pattern has been repeated, but this time it delegates to File
. Since a Pathname
may reference a file or a directory, some of the methods will delegate to either Dir
or File
:
def unlink() begin Dir.unlink @path rescue Errno::ENOTDIR File.unlink @path end end
First it tries to delete the path as a directory, then as a file. Perhaps a simpler formulation would be directory? ? Dir.unlink @path : File.unlink @path
, but the result is the same. This pattern encapsulates knowledge that the caller no longer needs to deal with.
Pathname
also overrides operators where they make sense, which lets you concatenate paths. Let’s look at how Pathname
does this.
def +(other) other = Pathname.new(other) unless Pathname === other Pathname.new(plus(@path, other.to_s)) end
The plus operator is just a method like any other method in ruby, so overriding it is pretty simple. First, the other
path being added to this one is converted to a Pathname
if it isn’t one already. After that, the paths are combined with plus(@path, other.to_s)
. This might look rather odd since we just converted other
to a Pathname
, but remember that Pathname
treats anything responding to to_path
specially.
Here are some examples of its behavior:
p = Pathname.new('/usr/local/lib') #=> #<Pathname:/usr/local/lib> p + '/usr/' #=> #<Pathname:/usr/> p + 'usr/' #=> #<Pathname:/usr/local/lib/usr/> p + '../include' #=> #<Pathname:/usr/local/include>
Adding an absolute path to an existing path behaves differently from a relative path or a path referencing the parent directory. This obviously has some logic beyond our typical string operators. For the sake of brevity, we can skip the details of how plus
is implemented, though if anyone is interested, we can dissect it later. I suggest skimming the rest of pathname.rb
, look at how public and private methods are defined, and how they are used to simplify methods.
Overview
Pathname
wraps up a lot of functionality that is scattered across multiple libraries by encapsulating that information. Hopefully you have seen how Pathname
can be useful, and have also learned a few patterns that will make your code more useable.
Getting to Know the Ruby Standard Library – Abbrev
This article has been republished on Monkey and Crow.
We’re going to take a look at another little piece of ruby’s standard library, this time it is Abbrev
, a tiny library that generates abbreviations for a set of words. We will expand ever so slightly on the one-liner from my last post to show an example of Abbrev
in action:
require 'abbrev'
commands = Dir[*ENV['PATH'].split(':').map{|p| p+"/*"}].select{|f| File.executable? f}.map{|f| File.basename f}.uniq.abbrev
commands['ls'] #=> 'ls'
commands['spli'] #=> 'split'
commands['spl'] #=> nil
This will match anything on your path, or any substring that will match only one longer string. Combine this with Shellwords, and you have the pieces for an auto completing console. It could also be used for matching rake tasks, tests, or giving suggestions for mistyped methods.
How it Works
So that is what Abbrev
does, but how does it work? If you open up the library (qw abbrev
if you have Qwandry), you will see that it is pretty small, there’s just one method, and then a helper that extends array.
Starting from the beginning, we see that it takes an array of words and an optional pattern. It stores the abbreviations in table
, and tracks of occurrences of each abbreviation in seen
using the counting idiom I mentioned in Hash Tricks.
def abbrev(words, pattern = nil) table = {} seen = Hash.new(0) ...
The pattern can be a RegularExpression or a String:
if pattern.is_a?(String) pattern = /^#{Regexp.quote(pattern)}/ # regard as a prefix end
If it’s a String, it is converted to a RegularExpression. Notice that Regexp.quote(pattern)
is used so that any characters that have special meanings as RegularExpressions will get escaped. If this pattern is present, it is used to ignore any abbreviations that don’t match it. Next we see how the abbreviations are generated for each word:
... words.each do |word| next if (abbrev = word).empty? while (len = abbrev.rindex(/[\w\W]\z/)) > 0 abbrev = word[0,len] next if pattern && pattern !~ abbrev case seen[abbrev] += 1 when 1 table[abbrev] = word when 2 table.delete(abbrev) else break end end end ...
The first part of this sets the current word to abbrev
, but skips the word if it is blank. The next part of the loop is a little more confusing, what does abbrev.rindex(/[\w\W]\z/)
do? It gives you the index of the last character in the String, as far as I can tell in ruby 1.9 this is equivalent to String#length - 1
. So the inner while
loop is going to use abbrev = word[0,len]
to chop off a character each time until the String is empty. The hash seen
is incremented by 1 for this substring. If this is the first time the word has been seen, then the word is recorded. If this is the second time the word has been seen, the word is removed because it is not unique. If the word has been seen more than twice, then not only has this word been seen, but we know that all the substrings of this word have been seen and removed, so the loop exits.
words.each do |word| next if pattern && pattern !~ word table[word] = word end table end
Finally Abbrev
loops through the original words and inserts them. This means that if the array contained “look” and “lookout” they both get added as matches for themselves even though “look” is a substring of “lookout”.
So there you have it, ruby’s Abbrev
library explained, go forth and shorten words.