Array Set Operations in Ruby

Do you ever find yourself doing this?

tags = %w[foo bar baz]
tags << 'buz' unless tags.include?('buz')

Or:

tags << 'baz'
tags.uniq!

In both cases, we have an Array we want to use as a set, containing only unique elements.

One way to tackle this more cleanly is to simply use a Set.

require 'set'
tags = Set.new(%w[foo bar baz])
tags.add('foo')
tags.add('buz')
tags # => #<Set: {"foo", "bar", "baz", "buz"}>

But the Set and Array interfaces differ in some regards, and if other code is already expecting the collection to be an Array, that solution may not be practical.

As it happens, Array supports several basic set operations innately. You may already know about these, but in case you don’t, here are some examples.

Set union:

tags = %w[foo bar]
tags |= %w[foo buz] # => ["foo", "bar", "buz"]

Set difference:

tags = %w[foo bar]
tags - %w[bar baz] # => ["foo"]

Set intersection:

tags = %w[foo bar]
tags & %w[bar baz] # => ["bar"]

It’s a small thing, but perhaps it will save you a few lines of code.

UPDATE: My WordPress “related posts” feature points out that I have officially begun to repeat myself. Ah well. If nothing else this article has a bit more explanation than the one from 2010.

14 comments

    1. In RSpec, a matcher =~ is provided for comparing arrays without consideration to order. It’s quite helpful. Not saying you do or should use RSpec; it just seems to be a lesser known matcher so sharing in case anybody finds it useful.

      The Set trick is neat too, I will keep that in mind!

  1. Good stuff. I use sets all the time, not simply because I want set semantics, but also because Set#include? is O(1) and Array#include? is O(N). When you’re checking membership in a collection in a type loop, using a set rather than an array can make a big difference. Last week I optimized a method that was taking over 20 minutes to run (w/o doing any IO) down to 16 seconds by changing an array to a set.

  2. “My WordPress “related posts” feature points out that I have officially begun to repeat myself.”

    Well, you need to store your posts in a Set, not an Array!

  3. Indeed a very good post. I wasn’t familiar with the union functionality, and I can really see use cases where it might come handy.

    It’s interesting that I haven’t seen much code using Set. I wonder why it is so? Is just because people isn’t familiar with it?

Leave a Reply

Your email address will not be published. Required fields are marked *