BigTable(next generation database led by Google) 1
By MD on Jun 13, 2008 | In Database | 1 feedback »
BigTable. It could be pretty big as it sounds. According to Jeffry Dean, who is a fellow at Google, the biggest one today is up to 4000TB, spanning over thousands of servers..., only a table!!!
Have you ever heard about BigTable? Unless you're a database vendor or a Google infrastructure freak, I'm afraid you haven't.
According to the paper titled Bigtable: A Distributed Storage System for Structured Data, it is like:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).
Two points.
- scale to a very large size: petabytes of data across thousands of commodity servers
- both in terms of data size and latency requirements
How?
Performance matters
To reduce HDD seek time is a keen point to general performance of computer since I/O costs million times more than CPU's. So, it's wise to fetch as big chunk as possible at once.
Today, most of user files are getting larger and larger, ever larger. But still, a unit of HDD stays smaller, 512KB usually, and 512KB-4KB of filesystem on Linux.
Google File System, Google's underlying distributed filesystem, makes use of a huge chunk, 64MB in size.
The next thing to consider is layout. How data should be laid on a block? Contiguous data can be read/written from/to disk at once.
A database usually put one row on a contiguous space. So as long as you put all the data you require on a single record, you can get the best performance. Some databases provide another approach, column oriented.
BigTable is not a conventional table
It's more like a spreadsheet. And a map under the hood.
BigTable offers a new way both in performance and functionality. Next time, I will show you details.
1 comment
Leave a comment
| « UNIQLOCK | Google Visualization API On Your Site Example » |