Let’s say we’re dispatching events to listener objects. Events look like this:
Event = Struct.new(:name, :source, :args)
Events should be routed to different handler methods based on the name of the event. For instance, when an object is sent the #call message with an :accepted event:
listener.call(Event[:accepted, obj])
…then it should be dispatched to the #handle_accepted method, as if the listener had been sent this message:
listener.handle_accepted(event)
There are a few different ways we could handle this dispatching logic. To test them out, let’s first set up a little base class for testbed objects.
class TestBed
attr_reader :event_log
def initialize
@event_log = []
end
end
This gives us a place to record handled events, so we can verify that our test classes have the correct semantics.
Now let’s kick things off with a hardcoded version of the dispatching semantics. This version has a hand-coded case statement which simply switches on the event type and dispatches to the appropriate method.
class HardcodeTestbed < TestBed
def call(event)
case event.name
when :foo
handle_foo(event)
when :bar
handle_bar(event)
when :baz
handle_baz(event)
end
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
If all we needed were a single listener, this would be sufficient. But the purpose of this exercise is to come up with a way to automatically dispatch to the correct handler method, without actually hand-coding a specialized #call method in every listener class. Ideally, we'd like to come up with a strategy we can bundle up into a reusable module.
Let's start out simple. Here's a version that does some string interpolation in order to infer the correct handler method, and then uses #__send__ (otherwise known as #send) to dispatch to that method. The send is only performed if the handler method exists.
class SendTestbed < TestBed
def call(event)
handler_name = "handle_#{event.name}"
__send__(handler_name, event) if respond_to?(handler_name)
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
We theorize that this strategy will be on the slow side. Every call will have to perform string interpolation, check if the method exists, and then do a dynamic message send. And dynamic sends can't be optimized as tightly as hardcoded sends.
Let's now turn our minds to how we might optimize this process.
Here's one approach. In this version, we catch #method_added events at the class level. For each method added, we look to see if it's an event handler method. If so, we add it to a class-wide lookup table that maps events to handler method names.
Then when an event is sent to the class with #call, it simply looks up the corresponding method and does a dyamic send to it.
class SendTableTestbed < TestBed
def self.method_added(method_name)
if method_name.to_s =~ /\Ahandle_(.+)\z/
handler_methods[$1.to_sym] = method_name.to_sym
end
super
end
def self.handler_methods
@handler_methods ||= {}
end
def call(event)
if (handler_method = self.class.handler_methods[event.name])
__send__(handler_method, event)
end
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
This approach avoids the need for string interpolation on each dispatch. It also avoids the need to explicitly check for the existence of a handler method.
Now let's go for a more exotic approach. Here's a version that, similarly to the last one, builds a lookup table as methods are added to the class. But in this case, instead of storing method names, it stores UnboundMethod objects derived from instance_method.
Then when send #call, it looks up the matching method, binds it to self, and calls it.
class BindTableTestbed < TestBed
def self.method_added(method_name)
if method_name.to_s =~ /\Ahandle_(.+)\z/
handler_methods[$1.to_sym] = instance_method(method_name)
end
super
end
def self.handler_methods
@handler_methods ||= {}
end
def call(event)
if (handler_method = self.class.handler_methods[event.name])
handler_method.bind(self).call(event)
end
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
In theory this could be even faster than the last strategy. Since an UnboundMethod is a direct reference to the chunk of compiled code that makes up a method, there is no #__send__ lookup overhead.
It's time to get aggressive in our optimizations. Here's a version that uses class_eval to dynamically regenerate the code of the #call method every time a new handler method is added. The code it generates is similar to our original, hardcoded example.
class CodeGenTestbed < TestBed
def self.method_added(method_name)
if method_name.to_s =~ /\Ahandle_(.+)\z/
handler_methods << $1
regenerate_dispatch_method
end
super
end
def self.handler_methods
@handler_methods ||= []
end
def self.regenerate_dispatch_method
dispatch_table = handler_methods.map { |event_name|
"when :#{event_name} then handle_#{event_name}(event)"
}.join("\n")
class_eval %{
def call(event)
case event.name
#{dispatch_table}
end
end
}
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
The final generated #call method will look something like this:
def call(event) case event.name when :foo then handle_foo(event) when :bar then handle_bar(event) when :baz then handle_baz(event) end end
One last variation. We wonder if a case statement is the fastest way to write this generated dispatch code. After all, case statements do fancy "case matching" using the === operator. What if we instead built an if statement which compared symbols by identity?
class IfCodeGenTestbed < TestBed
def self.method_added(method_name)
if method_name.to_s =~ /\Ahandle_(.+)\z/
handler_methods << $1
regenerate_dispatch_method
end
super
end
def self.handler_methods
@handler_methods ||= []
end
def self.regenerate_dispatch_method
dispatch_table = handler_methods.map { |event_name|
"event.name.equal?(:#{event_name}) then handle_#{event_name}(event)"
}.join("\nelsif ")
class_eval %{
def call(event)
if #{dispatch_table}
end
end
}
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
Perhaps that will be a little faster.
OK, let's put all these different versions to the test.
We write a helper method that will create a new instance of one of the testbed classes, dispatch three events to it, dispatch a fourth, unhandled event to it, and then verify that the three handled events are logged.
def do_test(klass)
testbed = klass.new
testbed.call(e1 = Event[:foo])
testbed.call(e2 = Event[:bar])
testbed.call(e3 = Event[:baz])
testbed.call(Event[:buz])
unless testbed.event_log == [e1, e2, e3]
raise "#{klass}: #{testbed.event_log.inspect}"
end
end
Then we stick all of our testbeds in an array, and benchmark them against each other using benchmark-ips.
classes = [
HardcodeTestbed,
SendTestbed,
SendTableTestbed,
BindTableTestbed,
CodeGenTestbed,
IfCodeGenTestbed,
]
Benchmark.ips do |x|
classes.each do |klass|
x.report(klass.name) { do_test(klass) }
end
x.compare!
end
Let's take a look at the results, shall we? Here are the results on my system with Ruby 2.2:
Calculating -------------------------------------
HardcodeTestbed 37.559k i/100ms
SendTestbed 19.806k i/100ms
SendTableTestbed 28.959k i/100ms
BindTableTestbed 21.637k i/100ms
CodeGenTestbed 39.608k i/100ms
IfCodeGenTestbed 37.599k i/100ms
-------------------------------------------------
HardcodeTestbed 562.715k (± 4.8%) i/s - 2.817M
SendTestbed 268.601k (± 2.5%) i/s - 1.347M
SendTableTestbed 399.429k (± 5.0%) i/s - 1.998M
BindTableTestbed 266.442k (± 2.6%) i/s - 1.341M
CodeGenTestbed 559.012k (± 3.5%) i/s - 2.812M
IfCodeGenTestbed 525.445k (± 2.0%) i/s - 2.632M
Comparison:
HardcodeTestbed: 562714.6 i/s
CodeGenTestbed: 559011.6 i/s - 1.01x slower
IfCodeGenTestbed: 525445.0 i/s - 1.07x slower
SendTableTestbed: 399429.1 i/s - 1.41x slower
SendTestbed: 268601.1 i/s - 2.09x slower
BindTableTestbed: 266442.2 i/s - 2.11x slower
Not surprisingly, our hardcoded version is fastest. Neck and neck with it is our first code generation version, the one that generates a case statement. In my experiments the code generated version sometimes comes out ahead about half the time; I think any difference between these two just statistical noise.
Interestingly, the generated code using if statements comes out a bit slower. Apparently Ruby case statements are quite well optimized.
Next up in the rankings is the "send table" version. We can see that we were right about it being faster to assemble a lookup table in advance than to string-interpolate a method name, check for its existence, and then #__send__ to it with each call.
The latter approach, our naive #__send__-based version, is over two times slower than the hardcoded and generated case-statement versions.
The big surprise (for me, anyway), is how slow the "bind table" version turns out to be. Evidently the overhead of binding a method object and then calling it is greater than the overhead of doing dynamic sends.
Overall, while 2x is a big enough difference to matter in some programs, it's interesting that there are no order-of-magnitude gains or losses to be had here.
In case you're curious, here are the results when running under Ruby 1.9 instead of 2.2.
Calculating -------------------------------------
HardcodeTestbed 26.692k i/100ms
SendTestbed 16.202k i/100ms
SendTableTestbed 24.147k i/100ms
BindTableTestbed 17.961k i/100ms
CodeGenTestbed 28.238k i/100ms
IfCodeGenTestbed 27.217k i/100ms
-------------------------------------------------
HardcodeTestbed 377.921k (± 3.0%) i/s - 1.895M
SendTestbed 195.392k (± 2.9%) i/s - 988.322k
SendTableTestbed 306.741k (± 2.1%) i/s - 1.545M
BindTableTestbed 213.615k (± 3.3%) i/s - 1.078M
CodeGenTestbed 379.196k (± 2.3%) i/s - 1.920M
IfCodeGenTestbed 357.106k (± 2.5%) i/s - 1.796M
Comparison:
CodeGenTestbed: 379196.1 i/s
HardcodeTestbed: 377920.8 i/s - 1.00x slower
IfCodeGenTestbed: 357105.9 i/s - 1.06x slower
SendTableTestbed: 306741.4 i/s - 1.24x slower
BindTableTestbed: 213614.9 i/s - 1.78x slower
SendTestbed: 195392.4 i/s - 1.94x slower
The numbers are slower across the board. The relative speeds are mostly similar, although the "bind table" and "send" versions have switched places in the rankings.
As you might guess, I performed these benchmarks in order to get a concrete idea as to the most performant way to implement part of a library I'm working on. As usual with benchmarking, there were a few surprises, and I'm glad I took the trouble to test my hypotheses.
If you have different results on different versions of Ruby; or suggestions on how to improve the benchmarks; or new variations on method dispatching; I'd love to see them in the comments.
Any reason you went with
__send__when writing that method, as opposed to (the arguably more typical/more common)send? Just curious.I’d been working on something that would eventually go in a gem and need to dispatch events to arbitrary client objects. When working with someone else’s objects, it’s best to use the underscore versions, on the off chance they’ve done something weird.
Ah, got it – and I guess Ruby makes it impossible to overwrite the
__send__version inBasicObject? Or, rather, it’s possible just like everything else in Ruby, except much less likely that someone will overload the__send__form of it…?I believe the latter is the case 🙂
You missed one – encapsulate your handlers as static methods on an embedded module:
class ModuleTestbed < TestBed
def call(event)
Handlers.send(event.name, self, event)
end
module Handlers
def foo(testbed, event)
testbed.event_log << event
end
def bar(testbed, event) testbed.event_log << event end def baz(testbed, event) testbed.event_log << event end def method_missing(handler, *args) end extend selfend
end
On my machine under MRI it benchmarks about the same as the IfCodeGenTestbed:
Comparison:
CodeGenTestbed: 715915.7 i/s
HardcodeTestbed: 710498.7 i/s – 1.01x slower
ModuleTestbed: 646322.0 i/s – 1.11x slower
IfCodeGenTestbed: 635730.0 i/s – 1.13x slower
SendTableTestbed: 557663.6 i/s – 1.28x slower
BindTableTestbed: 365025.2 i/s – 1.96x slower
SendTestbed: 348274.4 i/s – 2.06x slower
Under JRuby it’s actually faster than CodeGenTestbed as well:
Comparison:
HardcodeTestbed: 1274959.5 i/s
ModuleTestbed: 1224070.4 i/s – 1.04x slower
CodeGenTestbed: 1200955.0 i/s – 1.06x slower
IfCodeGenTestbed: 1161910.5 i/s – 1.10x slower
BindTableTestbed: 979396.4 i/s – 1.30x slower
SendTableTestbed: 976497.5 i/s – 1.31x slower
SendTestbed: 463007.8 i/s – 2.75x slower
One difference in my implementation is that I use method_missing to provide a handler for unknown event types (so you could send them to a misc log, perhaps, or raise an exception to the calling method). That does mean an additional performance hit in the test though. If you remove that unknown event, the performance pretty much goes away, even under MRI:
Comparison:
CodeGenTestbed: 832402.9 i/s
ModuleTestbed: 820658.5 i/s – 1.01x slower
HardcodeTestbed: 805808.8 i/s – 1.03x slower
IfCodeGenTestbed: 768468.4 i/s – 1.08x slower
SendTableTestbed: 664580.1 i/s – 1.25x slower
BindTableTestbed: 404148.6 i/s – 2.06x slower
SendTestbed: 401068.8 i/s – 2.08x slower
So new handlers can be added with no additional overhead; simply define a method, logic can easily be added for unhandled events, there is no string interpolation or explicit check for a handler, and the overall performance is roughly equivalent to the fastest alternatives.
Interesting! Although, it kinda breaks the spirit of the example, in that calling to the user-provided handle_* methods was supposed to be an invariant 😉
True, though there is no need to embed the Handlers module in the class, so its just a matter of the user defining Handlers#fnord instead of Testbed#handle_fnord.
If naming is inflexible, you could do automatic aliasing. I’m pretty sure that wouldn’t negative impact dispatch speed.
I use this model pretty often for tools where you pass a command as an argument.
How many handlers are we talking? (Does a case statement scale like a hash lookup?)
How does a
lambda.callcompare to aobject.send?[code lang=text]
def self.handler_methods
@handler_methods ||= Hash.new { |table, event|
table[event] = eval("->(instance, args) { instance.handle_#{event}(*args) }")
}
end
def call(event)
self.class.handler_methods[event.name].call(event)
end
[/code]
correction,
handler_methods[event.name].call(self, event)I think that’s functionally identical to the
UnboundMethodversion.Surprisingly it isn’t. Actually, it’s just a little slower than code generation:
[code lang=text]
class LambdaTableTestbed < TestBed
@@handler_methods = Hash.new
def self.method_added(method_name)
if method_name.to_s =~ /\Ahandle_(.+)\z/
@@handler_methods[$1.to_sym] = eval %{
->(instance, args) { instance.#{method_name}(args) }
}
end
super
end
def call(event)
(@@handler_methods[event.name] or return).call(self, event)
end
def handle_foo(event)
event_log << event
end
def handle_bar(event)
event_log << event
end
def handle_baz(event)
event_log << event
end
end
[/code]
Results:
[code lang=text]
HardcodeTestbed: 412328.3 i/s
CodeGenTestbed: 407221.1 i/s – 1.01x slower
IfCodeGenTestbed: 334375.5 i/s – 1.23x slower
LambdaTableTestbed: 312580.5 i/s – 1.32x slower
SendTableTestbed: 285952.9 i/s – 1.44x slower
BindTableTestbed: 163040.3 i/s – 2.53x slower
SendTestbed: 160361.9 i/s – 2.57x slower
[/code]
This is getting to the point where subtle things like
self.class.accessorvs.@@class_variable, or declaring a local variable really start to matter. I found I could improve your BindTable implementation with that too.I’ve decided to play with the benchmark to see if it can be “improved”. Your version measure performance of rather specific scenario: create handler, dispatch four events (which are created every time), compare resulting array with the golden one. It is a good scenario for functional testing (someone may not be agree through 😉 but I’m not sure it should be used to compare dispatching performance.
Thus I’ve changed test beds to just store the last event (to not have noop and to not collect too many events so performance of Array would be an issue), and changed do_test to:
def do_test(testbed)
testbed.call(EVENT_1)
testbed.call(EVENT_2)
testbed.call(EVENT_3)
testbed.call(EVENT_4)
end
So events and test beds are created only once for whole measurement.
And here are results’ve got.
Comparison:
HardcodedTestbed: 1617937.6 i/s
CodeGenTestbed: 1616651.8 i/s – 1.00x slower
IfCodeGenTestbed: 1050036.9 i/s – 1.54x slower
SendTableTestbed: 839765.5 i/s – 1.93x slower
SendTestbed: 358247.2 i/s – 4.52x slower
BindTableTestbed: 270160.6 i/s – 5.99x slower
Note that in your case shows that performance of the worst case is just 2 times slower than the best one. But my case shows best to worst ration – 6.
It would be interesting to here your opinion what kind of benchmark is better for your purpose and why.
Another interesting area to investigate how the provided approaches scale with number of event types we need to handle 😉
BTW thank you for very interesting topic.
Thanks for the article Avdi, and thanks for diving deeper into this Oleg.
Put some of this together and made some changes to improve performance of the various implementations. If someone has already done this, just let me know and I can put a PR over there.
https://github.com/kbrock/dispatch_benchmarking
Please check my gem for benchmarking ruby methods: https://github.com/igorkasyanchuk/benchmark_methods