Let’s say we’re dispatching events to listener objects. Events look like this:
Event = Struct.new(:name, :source, :args)
Events should be routed to different handler methods based on the name of the event. For instance, when an object is sent the #call message with an :accepted event:
listener.call(Event[:accepted, obj])
…then it should be dispatched to the #handle_accepted method, as if the listener had been sent this message:
listener.handle_accepted(event)
There are a few different ways we could handle this dispatching logic. To test them out, let’s first set up a little base class for testbed objects.
class TestBed attr_reader :event_log def initialize @event_log = [] end end
This gives us a place to record handled events, so we can verify that our test classes have the correct semantics.
Now let’s kick things off with a hardcoded version of the dispatching semantics. This version has a hand-coded case statement which simply switches on the event type and dispatches to the appropriate method.
class HardcodeTestbed < TestBed def call(event) case event.name when :foo handle_foo(event) when :bar handle_bar(event) when :baz handle_baz(event) end end def handle_foo(event) event_log << event end def handle_bar(event) event_log << event end def handle_baz(event) event_log << event end end
If all we needed were a single listener, this would be sufficient. But the purpose of this exercise is to come up with a way to automatically dispatch to the correct handler method, without actually hand-coding a specialized #call method in every listener class. Ideally, we'd like to come up with a strategy we can bundle up into a reusable module.
Let's start out simple. Here's a version that does some string interpolation in order to infer the correct handler method, and then uses #__send__ (otherwise known as #send) to dispatch to that method. The send is only performed if the handler method exists.
class SendTestbed < TestBed def call(event) handler_name = "handle_#{event.name}" __send__(handler_name, event) if respond_to?(handler_name) end def handle_foo(event) event_log << event end def handle_bar(event) event_log << event end def handle_baz(event) event_log << event end end
We theorize that this strategy will be on the slow side. Every call will have to perform string interpolation, check if the method exists, and then do a dynamic message send. And dynamic sends can't be optimized as tightly as hardcoded sends.
Let's now turn our minds to how we might optimize this process.
Here's one approach. In this version, we catch #method_added events at the class level. For each method added, we look to see if it's an event handler method. If so, we add it to a class-wide lookup table that maps events to handler method names.
Then when an event is sent to the class with #call, it simply looks up the corresponding method and does a dyamic send to it.
class SendTableTestbed < TestBed def self.method_added(method_name) if method_name.to_s =~ /\Ahandle_(.+)\z/ handler_methods[$1.to_sym] = method_name.to_sym end super end def self.handler_methods @handler_methods ||= {} end def call(event) if (handler_method = self.class.handler_methods[event.name]) __send__(handler_method, event) end end def handle_foo(event) event_log << event end def handle_bar(event) event_log << event end def handle_baz(event) event_log << event end end
This approach avoids the need for string interpolation on each dispatch. It also avoids the need to explicitly check for the existence of a handler method.
Now let's go for a more exotic approach. Here's a version that, similarly to the last one, builds a lookup table as methods are added to the class. But in this case, instead of storing method names, it stores UnboundMethod objects derived from instance_method.
Then when send #call, it looks up the matching method, binds it to self, and calls it.
class BindTableTestbed < TestBed def self.method_added(method_name) if method_name.to_s =~ /\Ahandle_(.+)\z/ handler_methods[$1.to_sym] = instance_method(method_name) end super end def self.handler_methods @handler_methods ||= {} end def call(event) if (handler_method = self.class.handler_methods[event.name]) handler_method.bind(self).call(event) end end def handle_foo(event) event_log << event end def handle_bar(event) event_log << event end def handle_baz(event) event_log << event end end
In theory this could be even faster than the last strategy. Since an UnboundMethod is a direct reference to the chunk of compiled code that makes up a method, there is no #__send__ lookup overhead.
It's time to get aggressive in our optimizations. Here's a version that uses class_eval to dynamically regenerate the code of the #call method every time a new handler method is added. The code it generates is similar to our original, hardcoded example.
class CodeGenTestbed < TestBed def self.method_added(method_name) if method_name.to_s =~ /\Ahandle_(.+)\z/ handler_methods << $1 regenerate_dispatch_method end super end def self.handler_methods @handler_methods ||= [] end def self.regenerate_dispatch_method dispatch_table = handler_methods.map { |event_name| "when :#{event_name} then handle_#{event_name}(event)" }.join("\n") class_eval %{ def call(event) case event.name #{dispatch_table} end end } end def handle_foo(event) event_log << event end def handle_bar(event) event_log << event end def handle_baz(event) event_log << event end end
The final generated #call method will look something like this:
def call(event) case event.name when :foo then handle_foo(event) when :bar then handle_bar(event) when :baz then handle_baz(event) end end
One last variation. We wonder if a case statement is the fastest way to write this generated dispatch code. After all, case statements do fancy "case matching" using the === operator. What if we instead built an if statement which compared symbols by identity?
class IfCodeGenTestbed < TestBed def self.method_added(method_name) if method_name.to_s =~ /\Ahandle_(.+)\z/ handler_methods << $1 regenerate_dispatch_method end super end def self.handler_methods @handler_methods ||= [] end def self.regenerate_dispatch_method dispatch_table = handler_methods.map { |event_name| "event.name.equal?(:#{event_name}) then handle_#{event_name}(event)" }.join("\nelsif ") class_eval %{ def call(event) if #{dispatch_table} end end } end def handle_foo(event) event_log << event end def handle_bar(event) event_log << event end def handle_baz(event) event_log << event end end
Perhaps that will be a little faster.
OK, let's put all these different versions to the test.
We write a helper method that will create a new instance of one of the testbed classes, dispatch three events to it, dispatch a fourth, unhandled event to it, and then verify that the three handled events are logged.
def do_test(klass) testbed = klass.new testbed.call(e1 = Event[:foo]) testbed.call(e2 = Event[:bar]) testbed.call(e3 = Event[:baz]) testbed.call(Event[:buz]) unless testbed.event_log == [e1, e2, e3] raise "#{klass}: #{testbed.event_log.inspect}" end end
Then we stick all of our testbeds in an array, and benchmark them against each other using benchmark-ips.
classes = [ HardcodeTestbed, SendTestbed, SendTableTestbed, BindTableTestbed, CodeGenTestbed, IfCodeGenTestbed, ] Benchmark.ips do |x| classes.each do |klass| x.report(klass.name) { do_test(klass) } end x.compare! end
Let's take a look at the results, shall we? Here are the results on my system with Ruby 2.2:
Calculating ------------------------------------- HardcodeTestbed 37.559k i/100ms SendTestbed 19.806k i/100ms SendTableTestbed 28.959k i/100ms BindTableTestbed 21.637k i/100ms CodeGenTestbed 39.608k i/100ms IfCodeGenTestbed 37.599k i/100ms ------------------------------------------------- HardcodeTestbed 562.715k (± 4.8%) i/s - 2.817M SendTestbed 268.601k (± 2.5%) i/s - 1.347M SendTableTestbed 399.429k (± 5.0%) i/s - 1.998M BindTableTestbed 266.442k (± 2.6%) i/s - 1.341M CodeGenTestbed 559.012k (± 3.5%) i/s - 2.812M IfCodeGenTestbed 525.445k (± 2.0%) i/s - 2.632M Comparison: HardcodeTestbed: 562714.6 i/s CodeGenTestbed: 559011.6 i/s - 1.01x slower IfCodeGenTestbed: 525445.0 i/s - 1.07x slower SendTableTestbed: 399429.1 i/s - 1.41x slower SendTestbed: 268601.1 i/s - 2.09x slower BindTableTestbed: 266442.2 i/s - 2.11x slower
Not surprisingly, our hardcoded version is fastest. Neck and neck with it is our first code generation version, the one that generates a case statement. In my experiments the code generated version sometimes comes out ahead about half the time; I think any difference between these two just statistical noise.
Interestingly, the generated code using if statements comes out a bit slower. Apparently Ruby case statements are quite well optimized.
Next up in the rankings is the "send table" version. We can see that we were right about it being faster to assemble a lookup table in advance than to string-interpolate a method name, check for its existence, and then #__send__ to it with each call.
The latter approach, our naive #__send__-based version, is over two times slower than the hardcoded and generated case-statement versions.
The big surprise (for me, anyway), is how slow the "bind table" version turns out to be. Evidently the overhead of binding a method object and then calling it is greater than the overhead of doing dynamic sends.
Overall, while 2x is a big enough difference to matter in some programs, it's interesting that there are no order-of-magnitude gains or losses to be had here.
In case you're curious, here are the results when running under Ruby 1.9 instead of 2.2.
Calculating ------------------------------------- HardcodeTestbed 26.692k i/100ms SendTestbed 16.202k i/100ms SendTableTestbed 24.147k i/100ms BindTableTestbed 17.961k i/100ms CodeGenTestbed 28.238k i/100ms IfCodeGenTestbed 27.217k i/100ms ------------------------------------------------- HardcodeTestbed 377.921k (± 3.0%) i/s - 1.895M SendTestbed 195.392k (± 2.9%) i/s - 988.322k SendTableTestbed 306.741k (± 2.1%) i/s - 1.545M BindTableTestbed 213.615k (± 3.3%) i/s - 1.078M CodeGenTestbed 379.196k (± 2.3%) i/s - 1.920M IfCodeGenTestbed 357.106k (± 2.5%) i/s - 1.796M Comparison: CodeGenTestbed: 379196.1 i/s HardcodeTestbed: 377920.8 i/s - 1.00x slower IfCodeGenTestbed: 357105.9 i/s - 1.06x slower SendTableTestbed: 306741.4 i/s - 1.24x slower BindTableTestbed: 213614.9 i/s - 1.78x slower SendTestbed: 195392.4 i/s - 1.94x slower
The numbers are slower across the board. The relative speeds are mostly similar, although the "bind table" and "send" versions have switched places in the rankings.
As you might guess, I performed these benchmarks in order to get a concrete idea as to the most performant way to implement part of a library I'm working on. As usual with benchmarking, there were a few surprises, and I'm glad I took the trouble to test my hypotheses.
If you have different results on different versions of Ruby; or suggestions on how to improve the benchmarks; or new variations on method dispatching; I'd love to see them in the comments.
Any reason you went with
__send__
when writing that method, as opposed to (the arguably more typical/more common)send
? Just curious.I’d been working on something that would eventually go in a gem and need to dispatch events to arbitrary client objects. When working with someone else’s objects, it’s best to use the underscore versions, on the off chance they’ve done something weird.
Ah, got it – and I guess Ruby makes it impossible to overwrite the
__send__
version inBasicObject
? Or, rather, it’s possible just like everything else in Ruby, except much less likely that someone will overload the__send__
form of it…?I believe the latter is the case 🙂
You missed one – encapsulate your handlers as static methods on an embedded module:
class ModuleTestbed < TestBed
def call(event)
Handlers.send(event.name, self, event)
end
module Handlers
def foo(testbed, event)
testbed.event_log << event
end
def bar(testbed, event) testbed.event_log << event end def baz(testbed, event) testbed.event_log << event end def method_missing(handler, *args) end extend self
end
end
On my machine under MRI it benchmarks about the same as the IfCodeGenTestbed:
Comparison:
CodeGenTestbed: 715915.7 i/s
HardcodeTestbed: 710498.7 i/s – 1.01x slower
ModuleTestbed: 646322.0 i/s – 1.11x slower
IfCodeGenTestbed: 635730.0 i/s – 1.13x slower
SendTableTestbed: 557663.6 i/s – 1.28x slower
BindTableTestbed: 365025.2 i/s – 1.96x slower
SendTestbed: 348274.4 i/s – 2.06x slower
Under JRuby it’s actually faster than CodeGenTestbed as well:
Comparison:
HardcodeTestbed: 1274959.5 i/s
ModuleTestbed: 1224070.4 i/s – 1.04x slower
CodeGenTestbed: 1200955.0 i/s – 1.06x slower
IfCodeGenTestbed: 1161910.5 i/s – 1.10x slower
BindTableTestbed: 979396.4 i/s – 1.30x slower
SendTableTestbed: 976497.5 i/s – 1.31x slower
SendTestbed: 463007.8 i/s – 2.75x slower
One difference in my implementation is that I use method_missing to provide a handler for unknown event types (so you could send them to a misc log, perhaps, or raise an exception to the calling method). That does mean an additional performance hit in the test though. If you remove that unknown event, the performance pretty much goes away, even under MRI:
Comparison:
CodeGenTestbed: 832402.9 i/s
ModuleTestbed: 820658.5 i/s – 1.01x slower
HardcodeTestbed: 805808.8 i/s – 1.03x slower
IfCodeGenTestbed: 768468.4 i/s – 1.08x slower
SendTableTestbed: 664580.1 i/s – 1.25x slower
BindTableTestbed: 404148.6 i/s – 2.06x slower
SendTestbed: 401068.8 i/s – 2.08x slower
So new handlers can be added with no additional overhead; simply define a method, logic can easily be added for unhandled events, there is no string interpolation or explicit check for a handler, and the overall performance is roughly equivalent to the fastest alternatives.
Interesting! Although, it kinda breaks the spirit of the example, in that calling to the user-provided handle_* methods was supposed to be an invariant 😉
True, though there is no need to embed the Handlers module in the class, so its just a matter of the user defining Handlers#fnord instead of Testbed#handle_fnord.
If naming is inflexible, you could do automatic aliasing. I’m pretty sure that wouldn’t negative impact dispatch speed.
I use this model pretty often for tools where you pass a command as an argument.
How many handlers are we talking? (Does a case statement scale like a hash lookup?)
How does a
lambda.call
compare to aobject.send
?[code lang=text]
def self.handler_methods
@handler_methods ||= Hash.new { |table, event|
table[event] = eval("->(instance, args) { instance.handle_#{event}(*args) }")
}
end
def call(event)
self.class.handler_methods[event.name].call(event)
end
[/code]
correction,
handler_methods[event.name].call(self, event)
I think that’s functionally identical to the
UnboundMethod
version.Surprisingly it isn’t. Actually, it’s just a little slower than code generation:
[code lang=text]
class LambdaTableTestbed < TestBed
@@handler_methods = Hash.new
def self.method_added(method_name)
if method_name.to_s =~ /\Ahandle_(.+)\z/
@@handler_methods[$1.to_sym] = eval %{
->(instance, args) { instance.#{method_name}(args) }
}
end
super
end
def call(event)
(@@handler_methods[event.name] or return).call(self, event)
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
[/code]
Results:
[code lang=text]
HardcodeTestbed: 412328.3 i/s
CodeGenTestbed: 407221.1 i/s – 1.01x slower
IfCodeGenTestbed: 334375.5 i/s – 1.23x slower
LambdaTableTestbed: 312580.5 i/s – 1.32x slower
SendTableTestbed: 285952.9 i/s – 1.44x slower
BindTableTestbed: 163040.3 i/s – 2.53x slower
SendTestbed: 160361.9 i/s – 2.57x slower
[/code]
This is getting to the point where subtle things like
self.class.accessor
vs.@@class_variable
, or declaring a local variable really start to matter. I found I could improve your BindTable implementation with that too.I’ve decided to play with the benchmark to see if it can be “improved”. Your version measure performance of rather specific scenario: create handler, dispatch four events (which are created every time), compare resulting array with the golden one. It is a good scenario for functional testing (someone may not be agree through 😉 but I’m not sure it should be used to compare dispatching performance.
Thus I’ve changed test beds to just store the last event (to not have noop and to not collect too many events so performance of Array would be an issue), and changed do_test to:
def do_test(testbed)
testbed.call(EVENT_1)
testbed.call(EVENT_2)
testbed.call(EVENT_3)
testbed.call(EVENT_4)
end
So events and test beds are created only once for whole measurement.
And here are results’ve got.
Comparison:
HardcodedTestbed: 1617937.6 i/s
CodeGenTestbed: 1616651.8 i/s – 1.00x slower
IfCodeGenTestbed: 1050036.9 i/s – 1.54x slower
SendTableTestbed: 839765.5 i/s – 1.93x slower
SendTestbed: 358247.2 i/s – 4.52x slower
BindTableTestbed: 270160.6 i/s – 5.99x slower
Note that in your case shows that performance of the worst case is just 2 times slower than the best one. But my case shows best to worst ration – 6.
It would be interesting to here your opinion what kind of benchmark is better for your purpose and why.
Another interesting area to investigate how the provided approaches scale with number of event types we need to handle 😉
BTW thank you for very interesting topic.
Thanks for the article Avdi, and thanks for diving deeper into this Oleg.
Put some of this together and made some changes to improve performance of the various implementations. If someone has already done this, just let me know and I can put a PR over there.
https://github.com/kbrock/dispatch_benchmarking
Please check my gem for benchmarking ruby methods: https://github.com/igorkasyanchuk/benchmark_methods