Get Hadoop up and running without DNS
By MD on Sep 14, 2008 | In Database | Send feedback »
In this couple of days, I have tried to get Hadoop Word Count ruuning on my local cluster with 3 CentOS boxes.
Thanks to Running Hadoop On Ubuntu Linux (Multi-Node Cluster), 90% of the set up was easy as described.
But there are two problems that I had to waste my time.
1. RSA authentication with SSH
authorized_keys file has to be accessible only by the user. Don't forget to disable any access by any groups and others.
2. Host name resolution
examples of hosts files
master
::1 localhost6.localdomain6 localhost6
192.168.10.21 master * master has to have accessible IP address(not ::1 nor 127.0.0.1) by slaves
192.168.10.22 slave.yellow
192.168.10.23 slave.redslave.yellow
127.0.0.1 slave.yellow localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.21 master
192.168.10.23 slave.redslave.red
127.0.0.1 slave.red localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.21 master
192.168.10.22 slave.yellow
Be careful about these network settings, then the Work Count should run.
2008/9/14 tested Hadoop-0.18.0, JDK1.6.0_10, CentOS5.2
Essay "Katakana" Dictionary
By MD on Aug 3, 2008 | In Information | Send feedback »
Inspired by Haruki Murakami, the unique novel writer, we just launched a new blog, which is like an essay "Katakana" dictionary.
We have three types of words in Japanese. Kanji(??), Hiragana(????), and Katakana(????). Katakana represents sounds, mostly of foreign words like Coca Cola(?????).
One Katakana word a day. Then, one guy writes an essay related to the word. Interesting, isn't it.
Here's the dictionary. Enjoy!
http://ameblo.jp/teamharukist/
Object fundamentalism and Business
By MD on Aug 2, 2008 | In Information | Send feedback »
I love Jazz, and so to play trumpet.
I don't like to manage nor to be managed.
I love OpenSource mind like hippy's.
I love objects.
But the problem is, they are not so successful in Business.
Okay. Let me think. Why?
...
It is only me that enjoys?
That could be a reason.
But rather, it would be a mandatory to succeed.
The reason is, perhaps, the lack of attitude for audience/end users.
Playing trumpet is a part of my life, even myself.
So, that would be great if audience get high/relaxed,
but that's not the first thing.
Object? That's for my business.
Object fundamentalism? I don't think I am, but some think I am.
Cache, an ultimate enterprise object-capable database.
db4o, an open source object database for embedded system
Rational, needless to say
The Object Fundamentalism family got a certain level of success.
But it seemed to be limited so far.
Why?
Through other businesses, I realized that
a successful technology can provide end users with benefits directly.
Oracle, Google, VMWare, Salesforce.
And they have lots of believers who brings the bible to end users
to integrate, convince, pray and sell.
Sometimes, those believers put some benefits on top of it,
but even without it, the bible itself is valuable.
What about Object Fundamentalism family?
It depends on engineers.
That means a value is created *by engineers* for their customer.
So, the point would be to hire a great engineer rather than a product.
The Object Fundamentalism itself is worse than a piece of bread for end users.
How to improve the situation?
A product/service should have a clear benefit for end users, not (only) for engineers.
For end users, object words are as good as, with Japanese old saying, Buddha's words to a horse.
- I found an interesting story from "Essential Drucker".
The three stonecutters who were asked what they were doing. The first replied, "I am making a living." The second kept on hammering while he said, "I am doing the best job of stonecutting in the entire country." The third one looked up with a visionary gleam in his eyes and said, "I am building a cathedral."
The third man is, of course, the true "manager." The first man knows what he wants to get out of the work and manages to do so. He is likely to give a "fair day's work for a fair day's pay." It is the second man who is a problem. Workmanship is essential; without it no business can flourish; in fact, an organization becomes demoralized if it does not demand of its members that most scrupulous workmanship they are capable of. But there is always a danger that the true workman, the true professional, will believe that he is accomplishing something when in effect he is just polishing stones or collecting footnotes. Workmanship must be encouraged in the business enterprise. But it must always be related to the needs of the whole.
BigTable(next generation database led by Google) 1
By MD on Jun 13, 2008 | In Database | Send feedback »
BigTable. It could be pretty big as it sounds. According to Jeffry Dean, who is a fellow at Google, the biggest one today is up to 4000TB, spanning over thousands of servers..., only a table!!!
Have you ever heard about BigTable? Unless you're a database vendor or a Google infrastructure freak, I'm afraid you haven't.
According to the paper titled Bigtable: A Distributed Storage System for Structured Data, it is like:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).
Two points.
- scale to a very large size: petabytes of data across thousands of commodity servers
- both in terms of data size and latency requirements
How?
Performance matters
To reduce HDD seek time is a keen point to general performance of computer since I/O costs million times more than CPU's. So, it's wise to fetch as big chunk as possible at once.
Today, most of user files are getting larger and larger, ever larger. But still, a unit of HDD stays smaller, 512KB usually, and 512KB-4KB of filesystem on Linux.
Google File System, Google's underlying distributed filesystem, makes use of a huge chunk, 64MB in size.
The next thing to consider is layout. How data should be laid on a block? Contiguous data can be read/written from/to disk at once.
A database usually put one row on a contiguous space. So as long as you put all the data you require on a single record, you can get the best performance. Some databases provide another approach, column oriented.
BigTable is not a conventional table
It's more like a spreadsheet. And a map under the hood.
BigTable offers a new way both in performance and functionality. Next time, I will show you details.