Semantic Web as Collateral Damage from Keyword Search

When it comes to Semantic Web in Information Retrieval, it has been over-promise and under-delivery.

While existing players like Google, Microsoft, and Yahoo seem that they have carefully added "Semantic" features under their hoods.

So why are people still talking about "Semantic Web"? They are just not aware of it, or waiting for something not delivered?

As Alex Iskold had pointed out in Semantic Search: The Myth And Reality, over expectations are true. As other over-promised services like character recognition, machine translation, and artificial intelligence. They keep saying it is a matter of machine resource.

Semantic Web would join in such Vapor Service club?

Another argument by Alex is UI of IR system. As long as it is based on keywords by users, Semantic Search System can not beat existing Search Engines. I agree with him. Here's the mathematical proof.

This is Figure.2 in The Dual Role of Smoothing in the Language Modeling Approach( Chengxiang Zhai and Joh Lafferty). You see the difference among the two charts. The line on the right chart steeply goest down, while another keeps up to around the middle.

This is about Language Model IR system, but the same is true in Vector Space Model or Boolean system. The more keywords you use, the worse the search quality gets. Try your favorite Search Engine, increase the number of keywords, and you'll see the result gets messy or only a few results.

So, Semantic Web is considered as the collateral damage from Keyword Search.

Here's my 2 cents. What about "Search by Link"? On the emerging tablet/mobile device, users rarely enter keywords, but just follow links when browsing. In Twitter, all what users see is comment + links. "Link" has enough information to extract long-enough queries, right?

Now, let me show you some examples.

The future of news: Back to the coffee house | The Economist

TOP 10 from my test:

  1. NSFW: 1200 words absolutely, definitely not about Rupert Murdoch and Google (0.08)
  2. Ad-Supported Amazon Kindle Coming for $114 - Techland - TIME.com(0.08)
  3. Blogging: Outreach and outrage | The Economist(0.07)
  4. The newspaper business: Paper tigers | The Economist(0.07)
  5. Sports newspapers: Pink, and read all over | The Economist(0.06)
  6. The Media Bundle Is Dead, Long Live The News Aggregators (0.06)
  7. The New York Times Introduces An iPad App (0.06)
  8. China's new labour law: Union of the state | The Economist(0.06)
  9. The future of journalism: Yesterday's papers | The Economist(0.06)
  10. What Should An iPad Newspaper Look Like? (0.06)

7 Essential Books on Optimism | Brain Pickings

TOP 10 from my test

  1. Mind Reading: Positive Psychologist Martin Seligman on the Good Life – TIME Healthland(0.38)
  2. How to Have Fun Like Monkeys, Whales and Foxes | Wired Science | Wired.com(0.28)
  3. Lemonade without the Lemons: New Search Engine Looks for Uplifting News: Scientific American(0.27)
  4. Honeybees Might Have Emotions | Wired Science | Wired.com(0.26)
  5. Study: Dogs' Separation Anxiety May Be a Sign of Pessimism - - TIME Healthland(0.26)
  6. The study of well-being: Strength in a smile | The Economist(0.25)
  7. A survey of new media: What sort of revolution? | The Economist(0.25)
  8. Bagehot: The hopeful interventionist | The Economist(0.24)
  9. Stay positive: Study shows that optimists live longer – TIME Healthland(0.24)
  10. Observations: Good-Bye Blue Monday(0.24)

* many more from "Today's Deep Story" @savyengine

I dare not to compare to Search Engines since it does not make sense. Both are two different beasts for different purpose. No, more than that, it's still only an infant. For the time being, the question is if it WORKS in many many more cases.

Yet, I am finding very interesting effects.

1. Good old articles
2. Lengthy articles

I frequently encounter good old articles semantically the same, which worth reading(lengthy). It is rarely happening on today's search engine. It is not surprising many articles can survive over times just like literature.

Thus, articles buried deep by today's Search Engine can be utilized well when users are curious about the subject. At the end of the day, Semantic Web may bring the lost Depth of the Web.

Internet People

It was two years ago that I launched a search engine that was based on the concept of "Chigai". It was highly experimental, but ended up as a miserable failure due to too much randomness.

Its underlying algorithm is based on Probabilistic Topic Models.

The algorithm is based on the following distinctive idea:

Topic models (e.g., Blei, Ng, & Jordan, 2003; Griffiths & Steyvers, 2002; 2003; 2004; Hofmann, 1999; 2001) are based upon the idea that documents are mixtures of topics, where a topic is a probability distribution over words.

And it says in "Conclusion":

Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing.

This is the attractive part.


the development of a deeper understanding of human language learning and processing

And it gets even more attractive if you consider a bit old, but recurring famous article by Nicholas Carr, "Is Google Making Us Stupid?".

In the article, the author took a story of Friedrich Nietzsche.

A friend of Nietzsche noticed a subtler effect after Nietzsche bought a typewriter and got used to blind typing. And he wrote to Nietzsche.

"Perhaps you will through this instrument even take to a new idiom,” the friend wrote in a letter, noting that, in his own work, his “‘thoughts’ in music and language often depend on the quality of pen and paper.

”You are right,” Nietzsche replied, “our writing equipment takes part in the forming of our thoughts.” Under the sway of the machine, writes the German media scholar Friedrich A. Kittler , Nietzsche’s prose “changed from arguments to aphorisms, from thoughts to puns, from rhetoric to telegram style.”

I think the claim here is that "A tool that expresses one's mind can form one's mind". So in the beginning, you may think you use a tool to express your mind, but at the end of the day, you find you are expressed by the tool.

Then, by extending the claim to today, you may find Google forms your mind. Facebook, Twitter. And unfortunatelly, its effect seems not much appreciated.

We may be seeing, the birth of Internet People.

Pages: 1 · 2

Restlet + GWT StockWatcher sample integration

How to host GWT Retrieving JSON Data example on Restlet?

Refer to the followings:

Serving Static Files
Getting Parameter Values

Now here's StockResource.

Code:

import java.util.Random;
import org.restlet.data.Form;
import org.restlet.data.Parameter;
import org.restlet.resource.Get;
import org.restlet.resource.ServerResource;
 
public class StockResource extends ServerResource {
  
    private static final double MAX_PRICE = 100.0; // $100.00
    private static final double MAX_PRICE_CHANGE = 0.02; // +/- 2%
  
  @Get
  public String browse(){
    StringBuilder sb = new StringBuilder();
    
      Random rnd = new Random();
 
      sb.append('[');
      Form form = getRequest().getResourceRef().getQueryAsForm();
      for (Parameter parameter : form) {
        if(parameter.getName().equals("q")){
            String[] stockSymbols = parameter.getValue().split(" ");
            for (String stockSymbol : stockSymbols) {
 
              double price = rnd.nextDouble() * MAX_PRICE;
              double change = price * MAX_PRICE_CHANGE * (rnd.nextDouble() * 2f - 1f);
 
              sb.append("  {");
              sb.append("    \"symbol\": \"");
              sb.append(stockSymbol);
              sb.append("\",");
              sb.append("    \"price\": ");
              sb.append(price);
              sb.append(',');
              sb.append("    \"change\": ");
              sb.append(change);
              sb.append("  },");
            }
        }
      }
      sb.append(']');
    
    return sb.toString();
  }
 
}

And Restlet application.

Code:

import org.restlet.Application;
import org.restlet.Restlet;
import org.restlet.resource.Directory;
import org.restlet.routing.Router;
 
public class MyApplication extends Application {
  
  private static final String ROOT_URI = "file:///PathToGWTWar/"; // e.g. file:///Users/Me/workspace/war
  
    @Override
    public Restlet createInboundRoot() {
        Router router = new Router();
        // Attach the resource.
        router.attach("/testFileUpload", MyResource.class);
        
        router.attach("/stockwatcher/stockPrices", StockResource.class);
        
        Directory dir = new Directory(getContext(), ROOT_URI);
        router.attach("/", dir);
        
        return router;
    }
 
}

In this case, I added to fileUpload sample. And war folder where GWT generates js and other static files is put at ROOT_URI.

The point is the order to attach resources to Router. Try to see what happens when you attach StockResource at last.

As always, Restlet allows us to get a task done easily.

PaaS???

??????????PaaS???????????????????????

Open PaaS (Java PaaS)
Google App Engine
VMforce: VMware + salesforce

Open Source PaaS
openstack: NASA + RackSpace

Proprietary PaaS
Windows Azure: Microsoft
Oracle On Demand: Oracle, private cloud

????????????????????????????Open PaaS??????????????Microsoft?Oracle?????????????????????????????

?????Open PaaS??????????VMforce?App Engine?????VMforce???????????????????????JPA?JDBC???????????

???????mixi?????Open PaaS?VMware???????????????????????

Versant?JPA/JDO???????????????????????????????

???????????????
?????????PaaS??ITPro?

:: Next >>

Free Blog Themes and Free Blog Templates