BigTable(next generation database led by Google) 1

BigTable. It could be pretty big as it sounds. According to Jeffry Dean, who is a fellow at Google, the biggest one today is up to 4000TB, spanning over thousands of servers..., only a table!!!

Have you ever heard about BigTable? Unless you're a database vendor or a Google infrastructure freak, I'm afraid you haven't.

According to the paper titled Bigtable: A Distributed Storage System for Structured Data, it is like:

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Two points.

  • scale to a very large size: petabytes of data across thousands of commodity servers
  • both in terms of data size and latency requirements

How?

Performance matters

To reduce HDD seek time is a keen point to general performance of computer since I/O costs million times more than CPU's. So, it's wise to fetch as big chunk as possible at once.

Today, most of user files are getting larger and larger, ever larger. But still, a unit of HDD stays smaller, 512KB usually, and 512KB-4KB of filesystem on Linux.

Google File System, Google's underlying distributed filesystem, makes use of a huge chunk, 64MB in size.

The next thing to consider is layout. How data should be laid on a block? Contiguous data can be read/written from/to disk at once.

A database usually put one row on a contiguous space. So as long as you put all the data you require on a single record, you can get the best performance. Some databases provide another approach, column oriented.

BigTable is not a conventional table

It's more like a spreadsheet. And a map under the hood.

BigTable offers a new way both in performance and functionality. Next time, I will show you details.

1 comment

Comment from: yasso [Visitor]
But many are not real database, at least not the type we thought of. Although they are quite fast and have good scalability, they are in very simple structure. And they store data and search data by index. For example LEXST database, if you count on them to do transactions as oracle does, you are wrong.
2009/04/01 @ 20:10

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
Free Blog Themes and Free Blog Templates