Benchmarking Ruby Dispatch Strategies

Let’s say we’re dispatching events to listener objects. Events look like this:

Event = Struct.new(:name, :source, :args)

Events should be routed to different handler methods based on the name of the event. For instance, when an object is sent the #call message with an :accepted event:

listener.call(Event[:accepted, obj])

…then it should be dispatched to the #handle_accepted method, as if the listener had been sent this message:

listener.handle_accepted(event)

There are a few different ways we could handle this dispatching logic. To test them out, let’s first set up a little base class for testbed objects.

class TestBed
  attr_reader :event_log

  def initialize
    @event_log = []
  end
end

This gives us a place to record handled events, so we can verify that our test classes have the correct semantics.

Now let’s kick things off with a hardcoded version of the dispatching semantics. This version has a hand-coded case statement which simply switches on the event type and dispatches to the appropriate method.

class HardcodeTestbed < TestBed
  def call(event)
    case event.name
    when :foo
      handle_foo(event)
    when :bar
      handle_bar(event)
    when :baz
      handle_baz(event)
    end
  end

  def handle_foo(event)
    event_log << event
  end

  def handle_bar(event)
    event_log << event
  end

  def handle_baz(event)
    event_log << event
  end
end

If all we needed were a single listener, this would be sufficient. But the purpose of this exercise is to come up with a way to automatically dispatch to the correct handler method, without actually hand-coding a specialized #call method in every listener class. Ideally, we'd like to come up with a strategy we can bundle up into a reusable module.

Let's start out simple. Here's a version that does some string interpolation in order to infer the correct handler method, and then uses #__send__ (otherwise known as #send) to dispatch to that method. The send is only performed if the handler method exists.

class SendTestbed < TestBed
  def call(event)
    handler_name = "handle_#{event.name}"
    __send__(handler_name, event) if respond_to?(handler_name)
  end

  def handle_foo(event)
    event_log << event
  end

  def handle_bar(event)
    event_log << event
  end

  def handle_baz(event)
    event_log << event
  end
end

We theorize that this strategy will be on the slow side. Every call will have to perform string interpolation, check if the method exists, and then do a dynamic message send. And dynamic sends can't be optimized as tightly as hardcoded sends.

Let's now turn our minds to how we might optimize this process.

Here's one approach. In this version, we catch #method_added events at the class level. For each method added, we look to see if it's an event handler method. If so, we add it to a class-wide lookup table that maps events to handler method names.

Then when an event is sent to the class with #call, it simply looks up the corresponding method and does a dyamic send to it.

class SendTableTestbed < TestBed
  def self.method_added(method_name)
    if method_name.to_s =~ /\Ahandle_(.+)\z/
      handler_methods[$1.to_sym] = method_name.to_sym
    end
    super
  end

  def self.handler_methods
    @handler_methods ||= {}
  end

  def call(event)
    if (handler_method = self.class.handler_methods[event.name])
      __send__(handler_method, event)
    end
  end

  def handle_foo(event)
    event_log << event
  end

  def handle_bar(event)
    event_log << event
  end

  def handle_baz(event)
    event_log << event
  end
end

This approach avoids the need for string interpolation on each dispatch. It also avoids the need to explicitly check for the existence of a handler method.

Now let's go for a more exotic approach. Here's a version that, similarly to the last one, builds a lookup table as methods are added to the class. But in this case, instead of storing method names, it stores UnboundMethod objects derived from instance_method.

Then when send #call, it looks up the matching method, binds it to self, and calls it.

class BindTableTestbed < TestBed
  def self.method_added(method_name)
    if method_name.to_s =~ /\Ahandle_(.+)\z/
      handler_methods[$1.to_sym] = instance_method(method_name)
    end
    super
  end

  def self.handler_methods
    @handler_methods ||= {}
  end

  def call(event)
    if (handler_method = self.class.handler_methods[event.name])
      handler_method.bind(self).call(event)
    end
  end

  def handle_foo(event)
    event_log << event
  end

  def handle_bar(event)
    event_log << event
  end

  def handle_baz(event)
    event_log << event
  end
end

In theory this could be even faster than the last strategy. Since an UnboundMethod is a direct reference to the chunk of compiled code that makes up a method, there is no #__send__ lookup overhead.

It's time to get aggressive in our optimizations. Here's a version that uses class_eval to dynamically regenerate the code of the #call method every time a new handler method is added. The code it generates is similar to our original, hardcoded example.

class CodeGenTestbed < TestBed
  def self.method_added(method_name)
    if method_name.to_s =~ /\Ahandle_(.+)\z/
      handler_methods << $1
      regenerate_dispatch_method
    end
    super
  end

  def self.handler_methods
    @handler_methods ||= []
  end

  def self.regenerate_dispatch_method
    dispatch_table = handler_methods.map { |event_name|
      "when :#{event_name} then handle_#{event_name}(event)"
    }.join("\n")
    class_eval %{
      def call(event)
        case event.name
        #{dispatch_table}
        end
      end
    }
  end

  def handle_foo(event)
    event_log << event
  end

  def handle_bar(event)
    event_log << event
  end

  def handle_baz(event)
    event_log << event
  end
end

The final generated #call method will look something like this:

def call(event)
  case event.name
  when :foo then handle_foo(event)
  when :bar then handle_bar(event)
  when :baz then handle_baz(event)
  end
end

One last variation. We wonder if a case statement is the fastest way to write this generated dispatch code. After all, case statements do fancy "case matching" using the === operator. What if we instead built an if statement which compared symbols by identity?

class IfCodeGenTestbed < TestBed
  def self.method_added(method_name)
    if method_name.to_s =~ /\Ahandle_(.+)\z/
      handler_methods << $1
      regenerate_dispatch_method
    end
    super
  end

  def self.handler_methods
    @handler_methods ||= []
  end

  def self.regenerate_dispatch_method
    dispatch_table = handler_methods.map { |event_name|
      "event.name.equal?(:#{event_name}) then handle_#{event_name}(event)"
    }.join("\nelsif ")
    class_eval %{
      def call(event)
        if #{dispatch_table}
        end
      end
    }
  end

  def handle_foo(event)
    event_log << event
  end

  def handle_bar(event)
    event_log << event
  end

  def handle_baz(event)
    event_log << event
  end
end

Perhaps that will be a little faster.

OK, let's put all these different versions to the test.

We write a helper method that will create a new instance of one of the testbed classes, dispatch three events to it, dispatch a fourth, unhandled event to it, and then verify that the three handled events are logged.

def do_test(klass)
  testbed = klass.new
  testbed.call(e1 = Event[:foo])
  testbed.call(e2 = Event[:bar])
  testbed.call(e3 = Event[:baz])
  testbed.call(Event[:buz])
  unless testbed.event_log == [e1, e2, e3]
    raise "#{klass}: #{testbed.event_log.inspect}"
  end
end

Then we stick all of our testbeds in an array, and benchmark them against each other using benchmark-ips.

classes = [
    HardcodeTestbed,
    SendTestbed,
    SendTableTestbed,
    BindTableTestbed,
    CodeGenTestbed,
    IfCodeGenTestbed,
]

Benchmark.ips do |x|
  classes.each do |klass|
    x.report(klass.name) { do_test(klass) }
  end
  x.compare!
end

Let's take a look at the results, shall we? Here are the results on my system with Ruby 2.2:

Calculating -------------------------------------
     HardcodeTestbed    37.559k i/100ms
         SendTestbed    19.806k i/100ms
    SendTableTestbed    28.959k i/100ms
    BindTableTestbed    21.637k i/100ms
      CodeGenTestbed    39.608k i/100ms
    IfCodeGenTestbed    37.599k i/100ms
-------------------------------------------------
     HardcodeTestbed    562.715k (± 4.8%) i/s -      2.817M
         SendTestbed    268.601k (± 2.5%) i/s -      1.347M
    SendTableTestbed    399.429k (± 5.0%) i/s -      1.998M
    BindTableTestbed    266.442k (± 2.6%) i/s -      1.341M
      CodeGenTestbed    559.012k (± 3.5%) i/s -      2.812M
    IfCodeGenTestbed    525.445k (± 2.0%) i/s -      2.632M

Comparison:
     HardcodeTestbed:   562714.6 i/s
      CodeGenTestbed:   559011.6 i/s - 1.01x slower
    IfCodeGenTestbed:   525445.0 i/s - 1.07x slower
    SendTableTestbed:   399429.1 i/s - 1.41x slower
         SendTestbed:   268601.1 i/s - 2.09x slower
    BindTableTestbed:   266442.2 i/s - 2.11x slower

Not surprisingly, our hardcoded version is fastest. Neck and neck with it is our first code generation version, the one that generates a case statement. In my experiments the code generated version sometimes comes out ahead about half the time; I think any difference between these two just statistical noise.

Interestingly, the generated code using if statements comes out a bit slower. Apparently Ruby case statements are quite well optimized.

Next up in the rankings is the "send table" version. We can see that we were right about it being faster to assemble a lookup table in advance than to string-interpolate a method name, check for its existence, and then #__send__ to it with each call.

The latter approach, our naive #__send__-based version, is over two times slower than the hardcoded and generated case-statement versions.

The big surprise (for me, anyway), is how slow the "bind table" version turns out to be. Evidently the overhead of binding a method object and then calling it is greater than the overhead of doing dynamic sends.

Overall, while 2x is a big enough difference to matter in some programs, it's interesting that there are no order-of-magnitude gains or losses to be had here.

In case you're curious, here are the results when running under Ruby 1.9 instead of 2.2.

Calculating -------------------------------------
     HardcodeTestbed    26.692k i/100ms
         SendTestbed    16.202k i/100ms
    SendTableTestbed    24.147k i/100ms
    BindTableTestbed    17.961k i/100ms
      CodeGenTestbed    28.238k i/100ms
    IfCodeGenTestbed    27.217k i/100ms
-------------------------------------------------
     HardcodeTestbed    377.921k (± 3.0%) i/s -      1.895M
         SendTestbed    195.392k (± 2.9%) i/s -    988.322k
    SendTableTestbed    306.741k (± 2.1%) i/s -      1.545M
    BindTableTestbed    213.615k (± 3.3%) i/s -      1.078M
      CodeGenTestbed    379.196k (± 2.3%) i/s -      1.920M
    IfCodeGenTestbed    357.106k (± 2.5%) i/s -      1.796M

Comparison:
      CodeGenTestbed:   379196.1 i/s
     HardcodeTestbed:   377920.8 i/s - 1.00x slower
    IfCodeGenTestbed:   357105.9 i/s - 1.06x slower
    SendTableTestbed:   306741.4 i/s - 1.24x slower
    BindTableTestbed:   213614.9 i/s - 1.78x slower
         SendTestbed:   195392.4 i/s - 1.94x slower

The numbers are slower across the board. The relative speeds are mostly similar, although the "bind table" and "send" versions have switched places in the rankings.

As you might guess, I performed these benchmarks in order to get a concrete idea as to the most performant way to implement part of a library I'm working on. As usual with benchmarking, there were a few surprises, and I'm glad I took the trouble to test my hypotheses.

If you have different results on different versions of Ruby; or suggestions on how to improve the benchmarks; or new variations on method dispatching; I'd love to see them in the comments.

15 comments

    1. I’d been working on something that would eventually go in a gem and need to dispatch events to arbitrary client objects. When working with someone else’s objects, it’s best to use the underscore versions, on the off chance they’ve done something weird.

      1. Ah, got it – and I guess Ruby makes it impossible to overwrite the __send__ version in BasicObject? Or, rather, it’s possible just like everything else in Ruby, except much less likely that someone will overload the __send__ form of it…?

  1. You missed one – encapsulate your handlers as static methods on an embedded module:

    class ModuleTestbed < TestBed
    def call(event)
    Handlers.send(event.name, self, event)
    end

    module Handlers
    def foo(testbed, event)
    testbed.event_log << event
    end

    def bar(testbed, event) testbed.event_log << event end def baz(testbed, event) testbed.event_log << event end def method_missing(handler, *args) end extend self

    end
    end

    On my machine under MRI it benchmarks about the same as the IfCodeGenTestbed:

    Comparison:
    CodeGenTestbed: 715915.7 i/s
    HardcodeTestbed: 710498.7 i/s – 1.01x slower
    ModuleTestbed: 646322.0 i/s – 1.11x slower
    IfCodeGenTestbed: 635730.0 i/s – 1.13x slower
    SendTableTestbed: 557663.6 i/s – 1.28x slower
    BindTableTestbed: 365025.2 i/s – 1.96x slower
    SendTestbed: 348274.4 i/s – 2.06x slower

    Under JRuby it’s actually faster than CodeGenTestbed as well:

    Comparison:
    HardcodeTestbed: 1274959.5 i/s
    ModuleTestbed: 1224070.4 i/s – 1.04x slower
    CodeGenTestbed: 1200955.0 i/s – 1.06x slower
    IfCodeGenTestbed: 1161910.5 i/s – 1.10x slower
    BindTableTestbed: 979396.4 i/s – 1.30x slower
    SendTableTestbed: 976497.5 i/s – 1.31x slower
    SendTestbed: 463007.8 i/s – 2.75x slower

    One difference in my implementation is that I use method_missing to provide a handler for unknown event types (so you could send them to a misc log, perhaps, or raise an exception to the calling method). That does mean an additional performance hit in the test though. If you remove that unknown event, the performance pretty much goes away, even under MRI:

    Comparison:
    CodeGenTestbed: 832402.9 i/s
    ModuleTestbed: 820658.5 i/s – 1.01x slower
    HardcodeTestbed: 805808.8 i/s – 1.03x slower
    IfCodeGenTestbed: 768468.4 i/s – 1.08x slower
    SendTableTestbed: 664580.1 i/s – 1.25x slower
    BindTableTestbed: 404148.6 i/s – 2.06x slower
    SendTestbed: 401068.8 i/s – 2.08x slower

    So new handlers can be added with no additional overhead; simply define a method, logic can easily be added for unhandled events, there is no string interpolation or explicit check for a handler, and the overall performance is roughly equivalent to the fastest alternatives.

    1. Interesting! Although, it kinda breaks the spirit of the example, in that calling to the user-provided handle_* methods was supposed to be an invariant 😉

      1. True, though there is no need to embed the Handlers module in the class, so its just a matter of the user defining Handlers#fnord instead of Testbed#handle_fnord.

        If naming is inflexible, you could do automatic aliasing. I’m pretty sure that wouldn’t negative impact dispatch speed.

        I use this model pretty often for tools where you pass a command as an argument.

  2. How many handlers are we talking? (Does a case statement scale like a hash lookup?)

    How does a lambda.call compare to a object.send?

    [code lang=text]
    def self.handler_methods
    @handler_methods ||= Hash.new { |table, event|
    table[event] = eval("->(instance, args) { instance.handle_#{event}(*args) }")
    }
    end

    def call(event)
    self.class.handler_methods[event.name].call(event)
    end
    [/code]

      1. Surprisingly it isn’t. Actually, it’s just a little slower than code generation:

        [code lang=text]
        class LambdaTableTestbed < TestBed
        @@handler_methods = Hash.new

        def self.method_added(method_name)
        if method_name.to_s =~ /\Ahandle_(.+)\z/
        @@handler_methods[$1.to_sym] = eval %{
        ->(instance, args) { instance.#{method_name}(args) }
        }
        end
        super
        end

        def call(event)
        (@@handler_methods[event.name] or return).call(self, event)
        end

        def handle_foo(event)
        event_log << event
        end

        def handle_bar(event)
        event_log << event
        end

        def handle_baz(event)
        event_log << event
        end
        end
        [/code]

        Results:

        [code lang=text]
        HardcodeTestbed: 412328.3 i/s
        CodeGenTestbed: 407221.1 i/s – 1.01x slower
        IfCodeGenTestbed: 334375.5 i/s – 1.23x slower
        LambdaTableTestbed: 312580.5 i/s – 1.32x slower
        SendTableTestbed: 285952.9 i/s – 1.44x slower
        BindTableTestbed: 163040.3 i/s – 2.53x slower
        SendTestbed: 160361.9 i/s – 2.57x slower
        [/code]

        This is getting to the point where subtle things like self.class.accessor vs. @@class_variable, or declaring a local variable really start to matter. I found I could improve your BindTable implementation with that too.

  3. I’ve decided to play with the benchmark to see if it can be “improved”. Your version measure performance of rather specific scenario: create handler, dispatch four events (which are created every time), compare resulting array with the golden one. It is a good scenario for functional testing (someone may not be agree through 😉 but I’m not sure it should be used to compare dispatching performance.
    Thus I’ve changed test beds to just store the last event (to not have noop and to not collect too many events so performance of Array would be an issue), and changed do_test to:
    def do_test(testbed)
    testbed.call(EVENT_1)
    testbed.call(EVENT_2)
    testbed.call(EVENT_3)
    testbed.call(EVENT_4)
    end

    So events and test beds are created only once for whole measurement.
    And here are results’ve got.
    Comparison:
    HardcodedTestbed: 1617937.6 i/s
    CodeGenTestbed: 1616651.8 i/s – 1.00x slower
    IfCodeGenTestbed: 1050036.9 i/s – 1.54x slower
    SendTableTestbed: 839765.5 i/s – 1.93x slower
    SendTestbed: 358247.2 i/s – 4.52x slower
    BindTableTestbed: 270160.6 i/s – 5.99x slower

    Note that in your case shows that performance of the worst case is just 2 times slower than the best one. But my case shows best to worst ration – 6.
    It would be interesting to here your opinion what kind of benchmark is better for your purpose and why.

    Another interesting area to investigate how the provided approaches scale with number of event types we need to handle 😉

    BTW thank you for very interesting topic.

Comments are closed.