Copying Amazon SimpleDB Domains

Just a quickie today. Amazon SimpleDB has no built-in way to copy a domain – for the purposes of backing it up, for instance, or for populating a development domain with data from the production domain. Using the RightAws tools it’s pretty straightforward to implement domain copy, though:

  require 'progressbar'
  require 'right_aws'

  def sdb_copy(source_domain, destination_domain)
    sdb = RightAws::SdbInterface.new
    sdb.list_domains do |results|
      domains = results[:domains]
      if domains.include?(destination_domain)
        raise "Destination #{destination_domain} already exists"
      end
    end
    sdb.create_domain(destination_domain)
    count = session.select("select count(*) from #{domain}")[:items].first["Domain"]["Count"].first.to_i
    progress = ProgressBar.new('copy', count)
    sdb.select("select * from #{source_domain}") do |results|
      results[:items].each do |record|
        record.each_pair do |key, data|
          sdb.put_attributes(destination_domain, key, data)
          progress.inc
        end
      end
    end
    progress.finish
  end

This method has a few bells and whistles: it checks to see that the destination domain doesn’t already exist, and it displays a progress bar to let you know how far along the copy process is.

Unfortunately since SimpleDB doesn’t support any kind of bulk import the copies have to be made one by one, which is slooooow. A copy of about 45,000 items takes over 2 hours.

I’ve posted a Gist with this method and a few other utilities for interacting with SimpleDB. It’s still in a pretty rough state, but it’s usable. If you use Boson you should be able to install it as a boson library with the following command:

$ boson install http://gist.github.com/raw/220299/171c2b3c8aae955afa1dc38a78c77c0531e24d1c/simpledb.rb

6 comments

  1. Cool! The calculation of the count on line 13 isn't 100% correct though, as it won't work right on very large domains. This is because 'select count()' operations will return before they are complete if the operation is going to take longer than 5 seconds. I find this usually happens when the domain has >75k items.

    Perhaps there's a more elegant way to solve this, but in my code, I do as follows to calculate the count of a domain:

    next_token = nil
    count = 0
    begin
    response = sdb.select(“select count(
    ) from #{domain_name}”, next_token)
    count += response[:items][0]['Domain']['Count'][0].to_i
    next_token = response[:next_token]
    end while next_token

  2. Cool! The calculation of the count on line 13 isn't 100% correct though, as it won't work right on very large domains. This is because 'select count()' operations will return before they are complete if the operation is going to take longer than 5 seconds. I find this usually happens when the domain has >75k items.

    Perhaps there's a more elegant way to solve this, but in my code, I do as follows to calculate the count of a domain:

    next_token = nil
    count = 0
    begin
    response = sdb.select(“select count(
    ) from #{domain_name}”, next_token)
    count += response[:items][0]['Domain']['Count'][0].to_i
    next_token = response[:next_token]
    end while next_token

Comments are closed.