Much like with beards, when it comes to databases, bigger isn’t always bettter. But at openplaces, we’re stuck between a rock and hard place. A big hard place. With terabytes of data. With millions of records. Some painful truths:
- Distributed computing is messy. Nodes go out of sync, data gets lost.
- There is no mature distributed database option. HBase, Hypertable, Cassandra, CouchDB, MongoDB, Voldemort… They all have sharp edges. You are going to get cut and you are going to bleed.
- It takes about 12 hours to upload our database to EC2. But our data is growing. Soon we’ll have to actually ship hard drives to Seattle: http://aws.amazon.com/importexport/.
On that up-beat note, Ruby coders, you are finally invited to start playing with BigRecord, your window to the world of Bigtable-esque data. Which, for now, means HBase. Click here to get started.
{ 10 comments… read them below or add one }
Awesome. Would be nice to have a non-Java dependency. Wouldn’t it be possible to integrate HBase with pure Ruby via Thrift and remove that dependency? In any case, thanks for putting this out. I’ll be checking it out.
A Thrift implementation actually wouldn’t difficult. It would simply be a new implementation of AbstractAdapter. When we started working with HBase, there was no Thrift API, so we created the driver, which plays the same role in wrapping the HBase java client code. We haven’t implemented the Thrift adapter because the jRuby+Drb solution works well and because we have some (possibly unfounded) concerns about Thrift performance.
Note that the Java dependency is isolated to the hbase-driver; your Rails app does not have to run in jRuby.
joost, thanks for the reply. Yeah, I think I didn’t realized that the java dependency was only at the drb solution layer. This sounds like a great boon for the Rails community.
Did you see CassandraObject? Looks like the goals are pretty similar. http://github.com/NZKoz/cassandra_object
There’s also http://code.google.com/p/hypertable/wiki/HyperRecord. In both cases, the projects are specific to one database. While we started with HBase, the goal is to abstract the Object-Data mapping layer from the specific database implementation, much like how ActiveRecord works for any SQL database. It would be great to eventually integrate these and other efforts (like http://github.com/sishen/hbase-ruby), though. There is Power In a Union.
Any idea when a Cassandra adapter will be available?
@Trent The Cassandra adapter was working partially in Bigrecord back a few months ago. Unfortunately, the API, and Cassandra itself seemed to be rather unstable so development halted on it. Depending on how stable Cassandra is now, an adapter is possible within a few months.
We are considering using BigRecord to connect our rails app to Cassandra. We would be able/willing to help with the adapter.
What’s the current status of your work? We’d prefer a Thrift-based implementation than a Java-based one. Did your concerns with performance still hold?
@Gilles Within the next month, I’ll be developing Bigrecord more actively, and the Cassandra adapter’s been on my todo list for a while now.
As I understand it, all client access to Cassandra is now done through Thrift anyway, so the native Java client isn’t even an option. The performance concerns were with regard to HBase only, since we have not used Cassandra at all in-house.
@Gilles Cassandra support has finally been added to BigRecord… see http://www.bigrecord.org/cassandra-support-and-general-updates/