Facebook announced in a blog post on Thursday that it has upgraded the Apache HBase database with a new open source system called HydraBase. Facebook is an avid HBase shop, using it to store data for various services, including the company's internal monitoring system, search indexing, streaming data analysis and data scraping. What makes HydraBase better than HBase is that it is supposedly a more reliable database that should minimize downtime when servers fail.
With HBase, data is sharded across many regions, with multiple regions hosted on a set of "region servers." If a region server goes down, all the regions it hosts have to migrate to another reason. According to Facebook, although HBase has automatic failover, it can take a long time to actually happen.
HydraBase counters this lag by having each region hosted on multiple region servers, so if a single server were to go kaput, the other region servers can act as backups, thus significantly improving recovery time compared with HBase. The company claims HydraBase could lead to Facebook having "no more than five minutes of downtime in a year."
From the blog post:
The set of region servers serving each region form a quorum. Each quorum has a leader that services read and write requests from the client. HydraBase uses the RAFT consensus protocol to ensure consistency across the quorum. With a quorum of 2F+1, HydraBase can tolerate up to F failures. Each hosting region server synchronously writes to the WAL corresponding the modified region, but only a majority of the region servers need to complete their writes to ensure consistency.
Facebook is testing HydraBase and the company plans on deploying the system in phases across production clusters.
A typical HydraBase deployment. Source: Facebook
In addition to the HydraBase, Facebook also open sourced on Tuesday HDFS RAID, a way of using erasure codes -- a method of data protection -- in order to cut down on the multiple clusters of data one might have Hadoop create as backups in case one cluster shuts down.
Last year when the company used HDFS RAID in its data warehouse clusters, the blog post explains, "the cluster's overall replication factor was reduced enough to represent tens of petabytes of capacity saving."
Post and thumbnail images courtesy of Shutterstock user dolphfyn.