Document Similarity By Color
The way to get document similarity by color
It is the invention that I made an application for patent last week. That is the key, I think, that puts Information Retrieval one level up higher.
The first demo will be launched late March.
Background
How does Performance of Information Retrieval get measured? If you are not familier with "Precision" and "Recall", check them back at the Information Retrieval.
We have two major Information Retrieval tools. A database and a search engine.
A database provides higher precision and lower recall. That means you need to specify detailed search criteria to get what you need.
While a search engine does opposite, lower precision and higher recall. That means you're likely to get what you need, but need to find out among huge results.
Wait, Google is not bad like that.
True. So this lacks two practical contepts. Search results can be ordered, and search results can be paged.
Earlier you get what you need, higher precision you get. So a search engine usually provides higher precision and higher recall, it depends on Ranking, though.
It is such a great innovation, actually. And it has driven the adoption of search-engine-like-capability everywhere.
Problem
It's been more than 10 years since Google, Stanford actually, made an application for patent about PageRank. It was a birth of search engine era.
Thanks to the spread of such a great technology, web documents got exploded known as Information Explosion.
For example, you find document A at the 5th out of 100 results. One year later, the results get doubled. Then you find document A at the 10th out of 200 results. One more year later, it gets out of the first page. (I assume Ranking technology stays the same)
While, what about the size of a page? That is related to human capability rather than technology, so stays almost the same. 10 results or so per page. No?
Now we have a problem. The precision gets lower and lower.
To make it worse, one of the major search engine improvements is to recognize topics. With topic recognition capability, more data is coming in.
In fact, rather than recognizing more topics, to nail down, to personalize search results is what Google is doing.
In this way, I think, something that improves recognition of a page by human is required. Otherwise, we'll get even lower precisions without getting great benefits by topics, Onthology.
Solution
Then, you reach to my invention. Document Similarity By Color.
With this technology, you can understand topics that a document represents by color, and then can search for what you want like chasing one color.
I think this implicates a power shift to come. So, I call the phenomenon "Liberation Of Search".
Just my 2cents ...