Is Bigger Better ?

by joost on October 26, 2009

Much like with beards, when it comes to databases, bigger isn’t always bettter.  But at openplaces, we’re stuck between a rock and hard place.  A big hard place.  With terabytes of data.  With millions of records. Some painful truths:

  • Distributed computing is messy.  Nodes go out of sync, data gets lost.
  • There is no mature distributed database option.  HBase, Hypertable, Cassandra, CouchDB, MongoDB, Voldemort… They all have sharp edges.  You are going to get cut and you are going to bleed.
  • It takes about 12 hours to upload our database to EC2.  But our data is growing.  Soon we’ll have to actually ship hard drives to Seattle: http://aws.amazon.com/importexport/.

On that up-beat note, Ruby coders, you are finally invited to start playing with BigRecord, your window to the world of Bigtable-esque data.  Which, for now, means HBase.  Click here to get started.

{ 7 comments… read them below or add one }

Scott Sayles October 27, 2009 at 2:21 am

Awesome. Would be nice to have a non-Java dependency. Wouldn’t it be possible to integrate HBase with pure Ruby via Thrift and remove that dependency? In any case, thanks for putting this out. I’ll be checking it out.

joost October 27, 2009 at 12:17 pm

A Thrift implementation actually wouldn’t difficult. It would simply be a new implementation of AbstractAdapter. When we started working with HBase, there was no Thrift API, so we created the driver, which plays the same role in wrapping the HBase java client code. We haven’t implemented the Thrift adapter because the jRuby+Drb solution works well and because we have some (possibly unfounded) concerns about Thrift performance.
Note that the Java dependency is isolated to the hbase-driver; your Rails app does not have to run in jRuby.

Scott Sayles October 27, 2009 at 12:57 pm

joost, thanks for the reply. Yeah, I think I didn’t realized that the java dependency was only at the drb solution layer. This sounds like a great boon for the Rails community.

Jonathan Ellis October 27, 2009 at 9:56 pm

Did you see CassandraObject? Looks like the goals are pretty similar. http://github.com/NZKoz/cassandra_object

joost October 28, 2009 at 11:29 am

There’s also http://code.google.com/p/hypertable/wiki/HyperRecord. In both cases, the projects are specific to one database. While we started with HBase, the goal is to abstract the Object-Data mapping layer from the specific database implementation, much like how ActiveRecord works for any SQL database. It would be great to eventually integrate these and other efforts (like http://github.com/sishen/hbase-ruby), though. There is Power In a Union.

Trent November 19, 2009 at 8:22 pm

Any idea when a Cassandra adapter will be available?

Greg November 25, 2009 at 8:58 pm

@Trent The Cassandra adapter was working partially in Bigrecord back a few months ago. Unfortunately, the API, and Cassandra itself seemed to be rather unstable so development halted on it. Depending on how stable Cassandra is now, an adapter is possible within a few months.

Leave a Comment

Previous post: The Big Day

BigRecord is released under the MIT license and is sponsored by openplaces