Tags: agent
The Challenge in the record-all-era
By MD on Apr 24, 2008 | In Database | Send feedback »
My new column got up today, titled "The Challenge in the record-all-era".
Do you know how fast the recorded data by human beings is growing? Here're two basic questions.
- How many data, in size, have human beings recorded up to 20th century?
- What about today?
You can find a clue about the estimate on the 1st question done by Berkeley, here at Wikipedia.
It says:
Earlier Berkeley studies estimated that by the end of 1999, the sum of human-produced information (including all audio, video recordings and text/books) was about 12 exabytes of data.
The article, "So much data, relatively little space", answers to the 2nd question.
The previous best estimate came from researchers at the University of California, Berkeley, who totaled the globe's information production at 5 exabytes in 2003.
And the latest one:
Add it all up and IDC determined that the world generated 161 billion gigabytes — 161 exabytes — of digital information last year.
If IDC tracked original data only, its result would have been 40 exabytes.
This article is published in 2007, so it is about 2006. Recently, it grew by 200% every year!
2005 is, perhaps, the 1st year that more data than the whole up to 20th centuries is created in 1 year. Then it has been grown. Today, it would be 100x more(40exa in 2006, 80exa in 2007, 160exa in 2008).
So far, lots of topics about HDD, FileSystem, any others backing database technology appeared on my column. Through their challenges and directions, what I found is the fight against the fast growing data.
But, the purpose of the database is not storing data, but retrieving data. So, we need to ask, "how can I get what I want when I need?".
Today, the only entry point available is a search by keywords. Then, you traverse on the links. But it's an island, structured vertically. So there are many people collectiong information horizontally through the web by hand.
For example, SONY and Nintendo are appearently related in game. But how could you know that through links?
What we need tomorrow is something like neuron networks, which represents logical/dynamic relationship of data/documents. For example, in our brain, a route from neuron A to neuronB is more active, the cost gets cheaper. Otherwise, it gets higher. It's something like this, not about static structure, but dynamic one.
It is the challenge in the record-all-era. You'll find usefull information through relationships.
Some are already up and running. It's coming soon.