Recognize how concepts are formed on the Web
By MD on Mar 23, 2009 | In Database | Send feedback »
This is just a brief idea to recognize how concepts are formed on the Web. It is not static, but dynamic. And it is shown in numbers, not in abstract way, to enjoy statistical techniques on it.
Invariant
At first, let me introduce how our brain recognizes things. You may want to take a look at this video by Jeff Hawkins to catch up the latest brain science. "Invariant", which is the term Jeff takes, is less changing, mostly static concept considered residing in our neocortex at levels.
Our brain translates things into these invariants. That's why we can tell you're human, so am I. Otherwise, every thing is different from each other. Then, your life will get wild.
"Invariant" *expects* what will happen next based on its context when actuall input is coming up. Emergency annunciator will get on if any differences are obserbed between the expectation and actuall inputs. Then, your consciousness gets involved to watch out.
It is natural to wonder if we refine Information Retrieval with such concepts.
Invariant representations for document
Let's consider document as usuall in IR.
What is "Invatiant" for document? As you can guess, and actually see in various researches about clustering, factoring, topic estimation, "Topic" looks the most natural choise to take.
Y = AX
Y: Document/Word(probability) matrix
X: Document/Topic(probability) matrix
A: Topic/Word(probability) matrix
Think about such a matrix X, which shows topical aspects of documents. You can get relax, don't need to get bothered here by the exact meaning of matrix. It is actually the equation used in GaP by Canny.
We'll get "Invatiant"s if we can find such A and X. Please not that it is just one way to get "Invatiant", and I think, is the best way to picture the idea.
Difference, Earthquake, Concept Map
We can apply this idea to Information Retrieval, not for search, but to see its statistical aspects of underlying document collection.
These will form X in the equation above.
Now you have "Invariant" representation by numbers for a query, a concept.
You can measure differences, deviations, means, whatever you like.
If we measure deviations shown in concepts like doing for "Earthquake", isn't it safe to say that similar measures mean they are close to each other geometically? Then, we'll get dynamic "Concept Map" on the web.
No feedback yet
Leave a comment
| « Probabilistic Graph Model | Document Similarity By Color » |