Semantic Web as Collateral Damage from Keyword Search

When it comes to Semantic Web in Information Retrieval, it has been over-promise and under-delivery.

While existing players like Google, Microsoft, and Yahoo seem that they have carefully added "Semantic" features under their hoods.

So why are people still talking about "Semantic Web"? They are just not aware of it, or waiting for something not delivered?

As Alex Iskold had pointed out in Semantic Search: The Myth And Reality, over expectations are true. As other over-promised services like character recognition, machine translation, and artificial intelligence. They keep saying it is a matter of machine resource.

Semantic Web would join in such Vapor Service club?

Another argument by Alex is UI of IR system. As long as it is based on keywords by users, Semantic Search System can not beat existing Search Engines. I agree with him. Here's the mathematical proof.

This is Figure.2 in The Dual Role of Smoothing in the Language Modeling Approach( Chengxiang Zhai and Joh Lafferty). You see the difference among the two charts. The line on the right chart steeply goest down, while another keeps up to around the middle.

This is about Language Model IR system, but the same is true in Vector Space Model or Boolean system. The more keywords you use, the worse the search quality gets. Try your favorite Search Engine, increase the number of keywords, and you'll see the result gets messy or only a few results.

So, Semantic Web is considered as the collateral damage from Keyword Search.

Here's my 2 cents. What about "Search by Link"? On the emerging tablet/mobile device, users rarely enter keywords, but just follow links when browsing. In Twitter, all what users see is comment + links. "Link" has enough information to extract long-enough queries, right?

Now, let me show you some examples.

The future of news: Back to the coffee house | The Economist

TOP 10 from my test:

  1. NSFW: 1200 words absolutely, definitely not about Rupert Murdoch and Google (0.08)
  2. Ad-Supported Amazon Kindle Coming for $114 - Techland - TIME.com(0.08)
  3. Blogging: Outreach and outrage | The Economist(0.07)
  4. The newspaper business: Paper tigers | The Economist(0.07)
  5. Sports newspapers: Pink, and read all over | The Economist(0.06)
  6. The Media Bundle Is Dead, Long Live The News Aggregators (0.06)
  7. The New York Times Introduces An iPad App (0.06)
  8. China's new labour law: Union of the state | The Economist(0.06)
  9. The future of journalism: Yesterday's papers | The Economist(0.06)
  10. What Should An iPad Newspaper Look Like? (0.06)

7 Essential Books on Optimism | Brain Pickings

TOP 10 from my test

  1. Mind Reading: Positive Psychologist Martin Seligman on the Good Life – TIME Healthland(0.38)
  2. How to Have Fun Like Monkeys, Whales and Foxes | Wired Science | Wired.com(0.28)
  3. Lemonade without the Lemons: New Search Engine Looks for Uplifting News: Scientific American(0.27)
  4. Honeybees Might Have Emotions | Wired Science | Wired.com(0.26)
  5. Study: Dogs' Separation Anxiety May Be a Sign of Pessimism - - TIME Healthland(0.26)
  6. The study of well-being: Strength in a smile | The Economist(0.25)
  7. A survey of new media: What sort of revolution? | The Economist(0.25)
  8. Bagehot: The hopeful interventionist | The Economist(0.24)
  9. Stay positive: Study shows that optimists live longer – TIME Healthland(0.24)
  10. Observations: Good-Bye Blue Monday(0.24)

* many more from "Today's Deep Story" @savyengine

I dare not to compare to Search Engines since it does not make sense. Both are two different beasts for different purpose. No, more than that, it's still only an infant. For the time being, the question is if it WORKS in many many more cases.

Yet, I am finding very interesting effects.

1. Good old articles
2. Lengthy articles

I frequently encounter good old articles semantically the same, which worth reading(lengthy). It is rarely happening on today's search engine. It is not surprising many articles can survive over times just like literature.

Thus, articles buried deep by today's Search Engine can be utilized well when users are curious about the subject. At the end of the day, Semantic Web may bring the lost Depth of the Web.

2 comments

Comment from: Yuriy Guskov [Visitor]
Did you try to look at your conclusions and ones from "Semantic Search: The Myth and Reality" from the different angle? It seems to me that the problem is search is not precise. Ironically, Google founders set it as final goal but still fails to deliver it. The problem which you describe with title and long queries are easily explainable: (a) natural language words are ambiguous, therefore, the more used the more ambiguous is result; (b) search is not aware about how this words are linked with each other. "Semantic Search: Myth and Reality" is talking about the same: (1) Google Statistical Algorithm fits only simple searches, (2) natural language requires analysis, (3) semantic search is closer to tasks, which solved by relational databases today. But is semantic search approach suits for humans? "Search Vs. Queury" shows that, even the first and the second approaches are different. In any case, we need ways to convert human-friendly searches into queries (or triples, or RDF, whatever you call it). Without it, semantic search and Semantic Web, in general, just duplicates tasks, which were solved for the last 50 years, though with brand-new format.



I think the situation may change only if we will have a sort of lightweight Semantic Web, which will be easy to use by everyone and flexible enough. How? Idea is simple: (a) using hypertext marked up with meaning (which will make it human-friendly), (b) using identifiers to make natural language less ambiguous and make formats fine-grained and open-ended, (c) using restricted set of simple rules to relate elements with each other. One of key points is non-URI decentralized routed identification. This way we may improve search results not because search is done through structured data (like triples), but because information itself is made more precise.



If you are interested, you can find more on this in my blog.
2011/07/21 @ 02:28
Comment from: MD [Member] Email
Hi, the different angle correctly reminded me of "Semantic Web". You're right it means to mark up HTML so it carries semantic information by itself.

I am not sure about markups, but easily come up with a couple of hard issues as follows:

1. only a few elaborate on markups
- maybe not if SEO works, right?
2. markups are easily abused by SEOs
- how to check abused markups?
3. even if 1 and 2 are solved, search interfaces become complex to describe Semantic intentions

Anyway, I better not use "Semantic", which is for marketing, but something for users...
2011/07/21 @ 02:51

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Internet People

It was two years ago that I launched a search engine that was based on the concept of "Chigai". It was highly experimental, but ended up as a miserable failure due to too much randomness.

Its underlying algorithm is based on Probabilistic Topic Models.

The algorithm is based on the following distinctive idea:

Topic models (e.g., Blei, Ng, & Jordan, 2003; Griffiths & Steyvers, 2002; 2003; 2004; Hofmann, 1999; 2001) are based upon the idea that documents are mixtures of topics, where a topic is a probability distribution over words.

And it says in "Conclusion":

Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing.

This is the attractive part.


the development of a deeper understanding of human language learning and processing

And it gets even more attractive if you consider a bit old, but recurring famous article by Nicholas Carr, "Is Google Making Us Stupid?".

In the article, the author took a story of Friedrich Nietzsche.

A friend of Nietzsche noticed a subtler effect after Nietzsche bought a typewriter and got used to blind typing. And he wrote to Nietzsche.

"Perhaps you will through this instrument even take to a new idiom,” the friend wrote in a letter, noting that, in his own work, his “‘thoughts’ in music and language often depend on the quality of pen and paper.

”You are right,” Nietzsche replied, “our writing equipment takes part in the forming of our thoughts.” Under the sway of the machine, writes the German media scholar Friedrich A. Kittler , Nietzsche’s prose “changed from arguments to aphorisms, from thoughts to puns, from rhetoric to telegram style.”

I think the claim here is that "A tool that expresses one's mind can form one's mind". So in the beginning, you may think you use a tool to express your mind, but at the end of the day, you find you are expressed by the tool.

Then, by extending the claim to today, you may find Google forms your mind. Facebook, Twitter. And unfortunatelly, its effect seems not much appreciated.

We may be seeing, the birth of Internet People.

Pages: 1 · 2

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Restlet + GWT StockWatcher sample integration

How to host GWT Retrieving JSON Data example on Restlet?

Refer to the followings:

Serving Static Files
Getting Parameter Values

Now here's StockResource.

Code:

import java.util.Random;
import org.restlet.data.Form;
import org.restlet.data.Parameter;
import org.restlet.resource.Get;
import org.restlet.resource.ServerResource;
 
public class StockResource extends ServerResource {
  
    private static final double MAX_PRICE = 100.0; // $100.00
    private static final double MAX_PRICE_CHANGE = 0.02; // +/- 2%
  
  @Get
  public String browse(){
    StringBuilder sb = new StringBuilder();
    
      Random rnd = new Random();
 
      sb.append('[');
      Form form = getRequest().getResourceRef().getQueryAsForm();
      for (Parameter parameter : form) {
        if(parameter.getName().equals("q")){
            String[] stockSymbols = parameter.getValue().split(" ");
            for (String stockSymbol : stockSymbols) {
 
              double price = rnd.nextDouble() * MAX_PRICE;
              double change = price * MAX_PRICE_CHANGE * (rnd.nextDouble() * 2f - 1f);
 
              sb.append("  {");
              sb.append("    \"symbol\": \"");
              sb.append(stockSymbol);
              sb.append("\",");
              sb.append("    \"price\": ");
              sb.append(price);
              sb.append(',');
              sb.append("    \"change\": ");
              sb.append(change);
              sb.append("  },");
            }
        }
      }
      sb.append(']');
    
    return sb.toString();
  }
 
}

And Restlet application.

Code:

import org.restlet.Application;
import org.restlet.Restlet;
import org.restlet.resource.Directory;
import org.restlet.routing.Router;
 
public class MyApplication extends Application {
  
  private static final String ROOT_URI = "file:///PathToGWTWar/"; // e.g. file:///Users/Me/workspace/war
  
    @Override
    public Restlet createInboundRoot() {
        Router router = new Router();
        // Attach the resource.
        router.attach("/testFileUpload", MyResource.class);
        
        router.attach("/stockwatcher/stockPrices", StockResource.class);
        
        Directory dir = new Directory(getContext(), ROOT_URI);
        router.attach("/", dir);
        
        return router;
    }
 
}

In this case, I added to fileUpload sample. And war folder where GWT generates js and other static files is put at ROOT_URI.

The point is the order to attach resources to Router. Try to see what happens when you attach StockResource at last.

As always, Restlet allows us to get a task done easily.

Feedback awaiting moderation

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

PaaS????????? ~ PaaS battle field map

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

PaaS???

??????????PaaS???????????????????????

Open PaaS (Java PaaS)
Google App Engine
VMforce: VMware + salesforce

Open Source PaaS
openstack: NASA + RackSpace

Proprietary PaaS
Windows Azure: Microsoft
Oracle On Demand: Oracle, private cloud

????????????????????????????Open PaaS??????????????Microsoft?Oracle?????????????????????????????

?????Open PaaS??????????VMforce?App Engine?????VMforce???????????????????????JPA?JDBC???????????

???????mixi?????Open PaaS?VMware???????????????????????

Versant?JPA/JDO???????????????????????????????

???????????????
?????????PaaS??ITPro?

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

??????????

?????????????Tata????????CRL?Versant??????????????????HPC??????????????????????????????
?

??????????????????????
?
???????????????????

????????????????????????????????????????????????????????
?
?
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????????????

Twitter??????????????????????????????????????

????????????????????????????????????????????????
?
?
?????????????????saleseforce.com, Google App Engine?Amazon EC2??????SaaS?PaaS?IaaS?????????????????IT????????????????????????????????????????????SaaS?IaaS?????????PaaS????????????????????????????????????????????CRL?PaaS?????

?????????????????????????????????????????????????????????????????????????????????????????IaaS???????????????????????SaaS??????????????????

???????????????????????????????????????????
?
?
????????????????????????????????????????IT?????????????????????

?????????????????????????

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

??????????????????

Intel Nehalem????????CPU?????????Pentium???????????????????????????????????????????????????????CPU?????????????????????????????????????????CEDEC??????????????????????????????????

Amdahl's law?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????????????????Amdahl's law??????????????????????????????

????Performance Analysis Guide for Intel® CoreTM i7 Processor and Intel® XeonTM 5500 processors??????????????????????CPU??????????Execution Unit????????????????????????????????????????????ROB??????????????????????????????????RS???????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????

Hadoop?Map Reduce????????????????????????Mapper?Reduce??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????GPU?Shader????????????????

???????????????????????????CPU?????????????????????????????????????

?????????????????????????????????????????

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Priority management of Hadoop MapReduce

Many MapReduce jobs are now in progress. Then, one critical job arrived. How to get this done?

Priority management is one of the hardest part to do in Java.

Under the design of my prev post, StreamingDBInputFormat for low latency MapReduce, it is easy.

Just changing data feeding rates of currently executing jobs. Each mapper just waits until new data feed is given.

While reducers are under control through StreamingDBOutputFormat data storing rates.

That's how you get some control over priority.

This is effective for such a case that any job-based scheduler can't handle.

Ability to pause/resume tasks

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

StreamingDBInputFormat for low latency MapReduce

There's a sub class of InputFormat, called DBInputFormat, when you want to load data from RDBMS and do MapReduce jobs.

But it is not practical because of its complexity. It does:

1. select count(*) from A
2. select * from A where X offset M limit N
* N = TOTAL NUMBER OF THE RESULT / TOTAL NUMBER OF MAPPER

The problem is that both of the complexity of 1 and 2 are O(n). For 1, n is the number of the result, and M for 2. And 2 has to get repeated multiple times as the total number of mappers. Apparently, this is not a practical design.

Ideal design is streaming.

StreamingDBInputFormat#getSplits tells the server to get started, immediately returns List<inputsplit>, and each of which is delivered to respective mapper. Then, each mapper create RecordReader with the respective InputSplit, and calls nextKeyValue to last.

No explicit waits on the surface.

Under the hood, RecordReader prefetches a batch of records asynchronously from its server. On the server, there's a buffer which is accessed by two threads. One is to respond to user requests. And another is to feed data off from disks.

Feedback awaiting moderation

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Hadoop ~ Hybrid SIMD system

Hadoop is a hybrid SIMD system that consists of MIMD computers.

Now it is rambling around in my head...

MIMD is one that everyone is familiar with on today's programming. SIMD is another like one on GPU computing.

Code:

void main(void)
{
   float gray = dot(gl_Color.rgb, vec3(0.299, 0.587, 0.114));
   gl_FragColor = vec4(gray, gray, gray, 0);
}

This is a simple fragment(pixel) shader that produces gray-scale image. It is a tiny program that is executed for every single pixel in parallel.

Just imagine to do this on CPU. Even the high-end CPU like Intel Xeon 7500 has only 2 virtual cores x 8 physical cores = 16 logical cores(threads). Surprise, X7560 costs more than $10,000 as of today.

While today's decent graphics card equips NVIDIA GeForce GTX 460 costs about $200. It has 336 cores. 1/3 of clock speeds.

Batch processing. Data conversion. There're many that SIMD can deal with a lot better than CPU?

But, SIMD has constraints. GPU can handle only 8bits float in various forms like vectors and matrices. And a task executed in shaders can not depend on other tasks.

That's how Hadoop comes into play. Hadoop MR provides SIMD-like framework with MIMD computers. That enables massive parallelism and throughput while allowing user defined computations. Thus, Hadoop is a hybrid SIMD system that consists of MIMD computers.

There're two limitations. Hadoop can't do:

1. dependent task in parallel (multiple passes required)
2. finish tasks faster than minimum Hadoop latency

But anyway, Hadoop is such an elegant system that supplement today's MIMD computing system.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

?????????????Hadoop MapReduce

?Hadoop????????????????

???????????????????????????????????????????

?????? -> FileInputFormat -- MR --> FileOutputFormat --> ??????

?????????????????????????????????????

????????Key/Value????????????????Key?Value??????????????????????Key????????????????????????????????????????????

??????API????????????????

Collection<U> doMapReduce<K, V>(Collection<T> input);

???T?????????????K/V????????????MapReduce?????U?????????????????????

??????????MapReduce????????????????????????????????????????????????

??????????????????????????

?????? -> FileInputFormat -- MR --> MySQLDBOutputFormat -> MySQL
MySQL -> MySQLDBInputFormat -- MR --> TeradataOutputFormat -> Teradata
Excel -> ExcelInputFormat -- MR --> SharePointOutputFormat -> SharePoint
SharePoint -> SharePointInputFormat -- MR --> OracleDBOutputFormat -> Oracle
Oracle -> OracleDBInputFormat -- MR --> TeradataOutputFormat -> Teradata

????????????Excel?????????MySQL? SharePoint? Oracle???????????BI??Teradata????????????????MR???????????????????????????????MR??????????????????????????????????????????MR?????????????????????????????????????????????????????????Teradata??????????????????????

Hadoop???????????????????????SIMD???????????Intel?CPU?????????????????????????????????GPU??????????????????????????CPU???????GPU??????????????????SIMD???????????????????????????????????

???????????????????Hadoop????MapReduce????????????????????????????????????????????????????

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

GPU Index

I made some considerations on this topic, how this idea could be a reliable implementations.

Here's an image how it works.

And its data structure.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Billions of query processing with GPU

I stop explaining some backgrounds why GPU, and show the tremendous power.

Here, I take an example of GeoCoding. Suppose you need to design a service that can offer reverse GeoCoding, which translates longitude/latitude into something meaningful.

I suggest you to design yourself before moving down.

Fig. 1

Fig. 2

Okay, so now my turn. I use images like the above.

Fig. 1 is RGB colors in 2D. Red is x axis, Green is y axis, at every 16 steps. Along x-axis, (0, 0), (16, 0), (32, 0), ..., (240, 0). Along y-axis, (0, 0), (0, 16), ..., (0, 240). And at each location like (64, 192), there's a tile that consists of 16 sub-tiles, and that is Blue. Blue is placed like the followings:

192 208 224 240
128 144 160 176
064 080 096 112
000 016 032 048

So this can represent 16 x 16 x 16 = 4096 colors. This is small, but can represent 256 x 256 x 256 x 256 = 4,294,967,296 colors in a similar way. I suggest not to try by your hand, but consider to use computer, though.

And the next image, Fig. 2, is in the same format, 16 x 16 x 16. But as you see, it's not the same as Fig. 1.

Its every pixel encodes something meaningful in 32bits color. Maybe IP address, customer id, social security number, etc. The point is if you can get coordinates in 2D RGB.

If you can, now you can put something meaningful in 32bits color at each pixel. And that is what you see in Fig. 2.

Okay, now let's go back to the question, reverse GeoCoding. For example, Tokyo tower is located at (35.658704, 139.745408), that is (35 39' 31'' 334''', 139 44' 43'' 469'''). And it can be encoded like (35 168 161 1, 139 190 211 1). The second and third component are base 256 instead of 60. The 4th component indicates if the first component is positive. But here we use 2D RGB space, so has no Alpha, the 4th, channel. Instead, let's just use two images. 0 and 1.

Thus, you get a conversion of longitude/latitude into two 2D RGB coordinates. And it works as a combination of longitude and latitude, so 2 x 2 = 4 in total. It is a 3D cube consists of two 2D RGB coordinates of longitude and latitude.

The next thing you need to do is to put something meaningful at each pixel(location). Here, let's assume we have a location object, which includes its id, address, postal code, and possibly some references to track address changes. Such an id shall be larger than 256 that fits in a single component. So let's use 4 components as 32bits integer, and which is encoded at corresponding pixels.

Now let's make sum. 256 x 256 x 256 x 256 pixels, and each of which is represented as 32 bits integer. It's about 17GB. That's not small, but every single location is represented in this space. It's easy to trim it down to a couple of GB for your operational unit like by countries.

Now you get it clear, how it works?

But wait, it could get done in CPU with 4 dimensional arrays that holds 32bits integer?

Here comes the tremendous power of GPU.

Reverse GeoCoding requests are queued for each frame. Frame is a unit to draw, usually on a screen. But off-screen drawing is also possible and commonly used. Each request is converted to a color, and drawn as a pixel on a queue image(texture in GPU terms).

And the queue image is rendered by GPU. First, the image size is clipped down to large enough shape to cover those requests. Then, each pixel is processed concurrently by hundreds of processing units in GPU. Each pixel is mapped to the location id 3D cube, and render the mapped color as an output.

Thus, you get an output image that encodes location ids.

Suppose the queue size is a realistic size like 2048 x 2048 = 4,194,304. Each holds 32bits integer, so the size is about 16MB. Reasonable, right? But remember, this can represent 4 millions of query. This means, with 250fps, a billion of query is processed in a second. Not bad, eh?

Remember that you can switch and update the lookup image, location id in this case, even for every frame. And in fact, GPU is a lot cheaper than expensive high-end CPU chip like Xeon 7500.

Also note that GPU processing is for parallel processing. It is CPU for dependent processing. Just use as needed.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

ssptr - Scalable Smart Pointer

ssptr : http://code.google.com/p/ssptr/

This is a project that hosts Scalable Synchronization with Smart Pointer.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Template friend

I have a smart pointer template class.

Code:

template <
  class T,
  template <class> class Storage
  >
class SSP{
  friend class SSPService<T, Storage>;
}

And its service.

Code:

template<
  class AbstractPointee,
  template <class> class Storage>
class SSPService{
}

SSPService instance is a singleton that offers various services like factories, locks, and retrievals. So, it's not convinient to limit its use only for a single hierarchy of AbstractPointee. That's how, the service has methods like the followings.

Code:

template<class U, template <class> class S>
bool generic_lock(SSP<U, S>& ssp){}
 
template<class U, template <class> class S>
bool generic_unlock(SSP<U, S>& ssp){}

Then, now any method invocations for non-AbstractPointee class are not considered a frind of SSP anymore. So it fails to get compiled...

Thanks, there is a solution. I don't know why. Someone, let me know.

Code:

template <
  class T,
  template <class> class Storage
  >
class SSP{
 
  template<class U, template <class> class S> friend class SSPService;
}

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Segmentation Fault In Recurisve Structure With Smart Pointer

A singly linked list implemented with smart pointer, which has a reference to next, can easily cause Stack Overflow at its destruction because the destructor of the next, and the next, and the next, are called recursively.

Here's the code.

Code:

#include <boost/shared_ptr.hpp>
#include <iostream>
 
#define RECURSIVE
 
class IntSLLNode {
 
private:
  int index;
  boost::shared_ptr<IntSLLNode> next;
 
  IntSLLNode(int ix){
    index = ix;
  }
 
public:
  int hasNext(){return next != 0;}
  boost::shared_ptr<IntSLLNode>& moveNext(){return next;}
  int getIndex(){return index;}
  friend class IntSLList;
};
 
class IntSLList {
private:
  boost::shared_ptr<IntSLLNode> head, tail;
public:  
  ~IntSLList(){
#ifndef RECURSIVE
    while(head != tail){
      // replace
      head = head->next;
    }
#endif
  }
 
  void addNode(int index){
 
    boost::shared_ptr<IntSLLNode> n(new IntSLLNode(index));
    if(head == 0){
      head = tail =n;
    }
    else
    {
      tail->next = n;
      tail = n;
    }
 
  }
 
  boost::shared_ptr<IntSLLNode>& getHeadNode(){return head;}
};
 
int main(){
  IntSLList list;
  // the number to reproduce crush may vary, but seems to be stable
  // On my Core2Duo box, it is around 65000
  // Here I dare to use a value large enough
  for(int i=0; i<1000000; i++){
    list.addNode(i);
  }
  // crush at destruction
  std::cout << "CRUSH?" << std::endl;
}


This works by commenting out the line 4.

For your own smart pointer, the point is line 7 in the following code.

Code:

SmartPointer& operator=(const SmartPointer& src){
    if(this != &src){
      if(counter->decrement() == 0){
        delete counter;
        //src.pointee_ = 0; // NOT DESTRUCTIVE
        counter = src.counter;
        counter->increment(); // increment before deleting pointee_, which might have a reference to src. This results in a temporal no referencer, and triggers recursive destruction
        if(pointee_) delete pointee_;
        pointee_ = src.pointee_;
      }else{
        pointee_ = src.pointee_;
        //src.pointee_ = 0; // NOT DESTRUCTIVE
        counter = src.counter;
        counter->increment();
      }
    }
    return *this;
  }

If you call delete pointee_ before incrementing the counter of src, it triggers recursive destruction in the above case.

For a large graph cached in memory, it's good to have some kind of protection or grouping methods.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

So far, so bad.

I found a couple of things. A couple of bad things.

1. shared_ptr is not thread safe * it was stack overflow by recursive destruction

Here's the code to reproduce. The reason follows.

SPIntSLList.h

Code:

#ifndef INT_SINGLY_LINKEDLIST
#define INT_SINGLY_LINKEDLIST
 
#include <boost/shared_ptr.hpp>
 
class SPIntSLLNode {
 
private:
  int index;
  long int payload[4];
  boost::shared_ptr<SPIntSLLNode> next;
 
  SPIntSLLNode(int ix){
    index = ix;
  }
 
public:
  int hasNext(){return next != 0;}
  boost::shared_ptr<SPIntSLLNode>& moveNext(){return next;}
  int getIndex(){return index;}
  friend class SPIntSLList;
};
 
class SPIntSLList {
private:
  boost::shared_ptr<SPIntSLLNode> head, tail;
public:  
  ~SPIntSLList();
 
  void addNode(int index);
  boost::shared_ptr<SPIntSLLNode>& getHeadNode(){return head;}
};
 
#endif

SPIntSLList.cpp

Code:

#include <iostream>
#include <time.h>
#include "SPIntSLList.h"
 
SPIntSLList::~SPIntSLList(){
  // no need iterative deletion
}
 
void SPIntSLList::addNode(int index){
  boost::shared_ptr<SPIntSLLNode> n(new SPIntSLLNode(index));
  if(head == 0){
    head = n;
    tail = n;
  }
  else
  {
    tail->next = n;
    tail = n;
  }
}
 
int main(){
 
  std::cout << "sizeof(SPIntSLLNode) = " << sizeof(SPIntSLLNode) << std::endl;
 
  clock_t start, end;
 
  std::cout << "creating huge large SPIntSLList" << std::endl;
  start = clock();
  SPIntSLList huge;
  std::cout << "creating huge SPIntSLListNode" << std::endl;
  // 157160: segmetation fault boarder when load was evaluated
  for(int i=0; i<1000000; i++){
    huge.addNode(i);
    // n, temporal PSP is destructed here
  }
  end = clock();
  std::cout << "creation time: " << (end-start) << " clocks(" << (double) (end-start)/CLOCKS_PER_SEC << " sec)" << std::endl;
 
  std::cout << "iterating over huge SPIntSLListNode" << std::endl;
  start = clock();
  // the following code calls copy constructor instead of assignment operator
  boost::shared_ptr<SPIntSLLNode> current = huge.getHeadNode();
  while(current->hasNext()){
    if(current->getIndex() % 10000 == 0)
      std::cout << "index:" << current->getIndex() << std::endl;
    current = current->moveNext();
  }
  end = clock();
  std::cout << "iteration time: " << (end-start) << " clocks(" << (double) (end-start)/CLOCKS_PER_SEC << " sec)" << std::endl;
  std::cout << "completed huge SPIntSLList" << std::endl;
}

Then, a wonderful gift.

Segmentation fault (core dumped)

And here's the stack trace.

Code:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000401227 in boost::detail::atomic_exchange_and_add (pw=Cannot access memory at address 0x7fff2ee2dff8
)
    at /home/takenori/workspace_native/Boost/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:36
36      {
(gdb) bt
#0  0x0000000000401227 in boost::detail::atomic_exchange_and_add (pw=Cannot access memory at address 0x7fff2ee2dff8
)
    at /home/takenori/workspace_native/Boost/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:36
#1  0x000000000040137b in boost::detail::sp_counted_base::release (
    this=0xd921130)
    at /home/takenori/workspace_native/Boost/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:143
#2  0x000000000040142b in boost::detail::shared_count::~shared_count (
    this=0xd9210c0, __in_chrg=<value optimized out>)
    at /home/takenori/workspace_native/Boost/boost/smart_ptr/detail/shared_count.hpp:217
#3  0x00000000004014be in boost::shared_ptr<SPIntSLLNode>::~shared_ptr (
    this=0xd9210b8, __in_chrg=<value optimized out>)
    at /home/takenori/workspace_native/Boost/boost/smart_ptr/shared_ptr.hpp:163

They have the same mistake as mine.

Code:

~shared_count() // nothrow
    {
       if( pi_ != 0 ) pi_->release();
#if defined(BOOST_SP_ENABLE_DEBUG_HOOKS)
        id_ = 0;
#endif
    }

After "pi_ != 0", pi is not guarranteed to stay valid. It is possible to delete after its evaluation.

The last two referencing threads can enter release().

Code:

void release() // nothrow
    {
        if( atomic_exchange_and_add( &use_count_, -1 ) == 1 )
        {
            dispose();
            weak_release();
        }
    }

Then, one comes late experiences Segmentation Fault on $use_count_.

2. Atomic types and operations are not as fast as RCU(read copy update)

I expected this(from the presentation by liburcu).

But the performance of lock-free reference counting smart pointer is as slow as mutex, on my dual core linux box.

The difference might be less significant because it's only dual core on 1 CPU. But I doubt it.

The test was like this.

Code:

#include "TsPersistentSmartPointer.h"
#include <iostream>
#include <time.h>
 
#define MAX_THREADS  100
#define ITERATION  1000
 
class Foo{
public:
  Foo(int id) : id_(id){std::cout << id_ << ":constructer called" << std::endl;}
  ~Foo(){std::cout << id_ << ":destructer called" << std::endl;}
 
  void SayHello(){
    std::cout << "Hello, this is " << id_ << std::endl;
  }
private:
  int id_;
};
 
void *copy(void *arg){
  // 1. copy, adding a reference
  TsPersistentSmartPointer<Foo> psp = *((TsPersistentSmartPointer<Foo>*) arg);
  TsPersistentSmartPointer<Foo> pspCopied = psp;
 
  // 2. removing a reference when exiting
}
 
int main(){
  
  TsPersistentSmartPointer<Foo> psp(new Foo(3));
  pthread_t threads[MAX_THREADS];
  int rt;
 
  clock_t start, end;
  std::cout << "Executing..." << std::endl;
  start = clock();
  for(int i=0; i<ITERATION; i++)
  {
    for(int t=0; t<MAX_THREADS; t++){
      rt = pthread_create(&threads[t], NULL, copy, (void*) &psp);
      if(rt != 0){
        std::cout << "Failed to create thread " << t << std::endl;
      }
    }
 
    for(int t=0; t<MAX_THREADS; t++){
      rt = pthread_join(threads[t], NULL);
      if(rt != 0){
        std::cout << "Failed to join thread " << t << std::endl;
      }
    }
  }
 
  end = clock();
  std::cout << "execution time: " << (end-start) << " clocks(" << (double) (end-start)/CLOCKS_PER_SEC << " sec)" << std::endl;
}

This is heavily read intensive test. But each read on reference counting SmartPointer requires incrementing the counter, and decrementing when exiting. So it is deadly heavy write intensive test. It would run as low as Mutex in the figure above.

liburcu is implemented with architecture specific assembly code. But what's the difference? That's what I ought to find...

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Lock-free reference counting smart pointer

With GCC 4.4, I tested how atomic types and operations of c++0x work by implementing lock-free reference counting smart pointer.

TsPersistentSmartPointer.h

Code:

#ifndef PERSISTENT_SMART_POINTER
#define PERSISTENT_SMART_POINTER
 
#include <cstdatomic>
 
class Counter {
public:
  Counter():i(1){}
  ~Counter(){}
 
  void increment(){
    i.fetch_add( 1, std::memory_order_relaxed );
  }
  int decrement(){
    return i.fetch_sub( 1, std::memory_order_relaxed );
  }
  void reset(){
    i.store(0, std::memory_order_relaxed);
  }
private:
  std::atomic_uint i;
};
 
template <class T>
class TsPersistentSmartPointer {
public:
  explicit TsPersistentSmartPointer(T* pointee) : pointee_(pointee), counter(new Counter()){
  }
  TsPersistentSmartPointer() : pointee_(0), counter(new Counter()){
  }; // for null construction, has to be copied from somewhere else
  ~TsPersistentSmartPointer(){
    if(counter->decrement() == 0){
      delete counter;
      delete pointee_;
    }
  }
 
  T& operator*() const{
    return *pointee_;
  }
 
  T* operator->() const{
    return pointee_;
  }
 
  // assignment
  TsPersistentSmartPointer(const TsPersistentSmartPointer& src){
      pointee_ = src.pointee_;
      counter = src.counter;
      counter->increment();
      //src.pointee_ = 0; // NOT DESTRUCTIVE
  }
  TsPersistentSmartPointer& operator=(const TsPersistentSmartPointer& src){
    if(this != &src){
      if(counter->decrement() == 0){
        delete counter;
        if(pointee_) delete pointee_;
      }
      pointee_ = src.pointee_;
      //src.pointee_ = 0; // NOT DESTRUCTIVE
      counter = src.counter;
      counter->increment();
    }
    return *this;
  }
  TsPersistentSmartPointer& operator=(T* pointee){
    if(pointee_ != pointee){
      if(pointee_)
        delete pointee_;
      pointee_ = pointee;
      counter->reset();
    }
    return *this;
  }
 
  // equality
  inline friend bool operator==(const TsPersistentSmartPointer& psp, const T* pointee){
    return psp.pointee_ == pointee;
  }
  inline friend bool operator==(const T* pointee, const TsPersistentSmartPointer& psp){
    return pointee == psp.pointee_;
  }
  // inequality
  inline friend bool operator!=(const TsPersistentSmartPointer& psp, const T* pointee){
    return psp.pointee_ != pointee;
  }
  inline friend bool operator!=(const T* pointee, const TsPersistentSmartPointer& psp){
    return pointee != psp.pointee_;
  }
 
private:
  T* pointee_;
  Counter* counter;
};
 
#endif

Atomic operations are implemented inside Counter class. The point is the following part, the destructor of TsPersistentSmartPointer:

Code:

if(counter->decrement() == 0){
      delete counter;
      delete pointee_;
    }

At first, I did the following, which resulted in Segmentation Fault under a million of concurrent destructions.

Code:

counter->decrement();
    if(counter->get() == 0){
      delete counter;
      delete pointee_;
    }

The problem could happen when the last two enters concurrently.

Thread1: counter->decrement(); // i = 1
Thread2: counter->decrement(); // i = 0
Thread2: counter->get() == 0 // TRUE
Thread1: counter->get() == 0 // TRUE!
Thread1: delete counter;
Thread2: delete counter; // OUCH!

Here're some more code for testing.

TsPersistentSmartPointer.cpp

Code:

#include "TsPersistentSmartPointer.h"
#include <iostream>
 
class Foo{
public:
  Foo(int id) : id_(id){std::cout << id_ << ":constructer called" << std::endl;}
  ~Foo(){std::cout << id_ << ":destructer called" << std::endl;}
 
  void SayHello(){
    std::cout << "Hello, this is " << id_ << std::endl;
  }
private:
  int id_;
};
 
int main(){
 
  Foo f0(0);
  Foo *f1 = new Foo(1);
  Foo *f2 = new Foo(2); // memory leak
  Foo *f3 = new Foo(3);
  TsPersistentSmartPointer<Foo> psp(0);
 
  // copy
  psp = f3;
 
  // dereference and indirection
  psp->SayHello();
  (*psp).SayHello();
 
  // equality
  std::cout << "equality works? :" << ((psp == f3) & (f3 == psp)) << std::endl;
 
  // copy constructor
  TsPersistentSmartPointer<Foo> pspCopied(psp);
  // dereference and indirection
  pspCopied->SayHello();
  (*pspCopied).SayHello();
 
  // see how destructors are called
  delete f1;
 
  return 0;
}

TsPSPIntSLList.h

Code:

#ifndef INT_SINGLY_LINKEDLIST
#define INT_SINGLY_LINKEDLIST
 
#include "TsPersistentSmartPointer.h"
 
class TsPSPIntSLLNode {
 
private:
  int index;
  long int payload[4];
  TsPersistentSmartPointer<TsPSPIntSLLNode> next;
 
  TsPSPIntSLLNode(int ix){
    index = ix;
  }
 
public:
  int hasNext(){return next != 0;}
  TsPersistentSmartPointer<TsPSPIntSLLNode>& moveNext(){return next;}
  int getIndex(){return index;}
  friend class TsPSPIntSLList;
};
 
class TsPSPIntSLList {
private:
  TsPersistentSmartPointer<TsPSPIntSLLNode> head, tail;
public:  
  ~TsPSPIntSLList();
 
  void addNode(int index);
  TsPersistentSmartPointer<TsPSPIntSLLNode>& getHeadNode(){return head;}
};
 
#endif

TsPSPIntSLList.cpp

Code:

#include <iostream>
#include <time.h>
#include "TsPSPIntSLList.h"
 
TsPSPIntSLList::~TsPSPIntSLList(){
  // no need iterative deletion
}
 
void TsPSPIntSLList::addNode(int index){
  TsPersistentSmartPointer<TsPSPIntSLLNode> n(new TsPSPIntSLLNode(index));
  if(head == 0){
    head = n;
    tail = n;
  }
  else
  {
    tail->next = n;
    tail = n;
  }
}
 
int main(){
 
  std::cout << "sizeof(TsPSPIntSLLNode) = " << sizeof(TsPSPIntSLLNode) << std::endl;
 
  clock_t start, end;
 
  std::cout << "creating huge large TsPSPIntSLList" << std::endl;
  start = clock();
  TsPSPIntSLList huge;
  std::cout << "creating huge TsPSPIntSLListNode" << std::endl;
  // 157160: segmetation fault boarder when load was evaluated
  for(int i=0; i<1000000; i++){
    huge.addNode(i);
    // n, temporal PSP is destructed here
  }
  end = clock();
  std::cout << "creation time: " << (end-start) << " clocks(" << (double) (end-start)/CLOCKS_PER_SEC << " sec)" << std::endl;
 
  std::cout << "iterating over huge TsPSPIntSLListNode" << std::endl;
  start = clock();
  // the following code calls copy constructor instead of assignment operator
  TsPersistentSmartPointer<TsPSPIntSLLNode> current = huge.getHeadNode();
  while(current->hasNext()){
    if(current->getIndex() % 10000 == 0)
      std::cout << "index:" << current->getIndex() << std::endl;
    current = current->moveNext();
  }
  end = clock();
  std::cout << "iteration time: " << (end-start) << " clocks(" << (double) (end-start)/CLOCKS_PER_SEC << " sec)" << std::endl;
  std::cout << "completed huge TsPSPIntSLList" << std::endl;
 
  // followings are called automatically, but here explicitely for measurement
  //current.~TsPersistentSmartPointer();
  //huge.~TsPSPIntSLList();
 
}

Please note no code is required for the destructor of TsPersistentSmartPointer.

So what about reference linking?

No. It can't be lock free if with doubly linked list. Two links to prev and next have to be updated at the same time, otherwise one could see inconsistent stete.

Performance?

I have measured performance with clock() function defined in "time.h" as in TsPSPIntSLList.cpp.

Raw Pointer for single thread
Creation: 0.01 sec
Iteration: 0.02 sec

SmartPointer for single thread
Creation: 0.27 sec
Iteration: 0.05 sec

Lock-free SmartPointer
Creation: 0.45 sec
Iteration: 0.12 sec

In time, there's a significant overhead over plain raw pointer implementation.

Also, a couple of CPU cache statistics collected by vaigrind.

Raw Pointer for single thread
Instruction Read: 409,523,865
Data Read: 115,388,899
L1 Data Read Miss: 2,013,421
L2 Data Read Miss: 2,005,354
Data Write: 69,133,007
L1 Data Write Miss: 1,002,531
L2 Data Write Miss: 1,001,018

Smart Pointer for single thread
Instruction Read: 960,359,665
Data Read: 284,618,659
L1 Data Read Miss: 2,049,844
L2 Data Read Miss: 2,041,472
Data Write: 172,410,809
L1 Data Write Miss: 1,916,330
L2 Data Write Miss: 1,914,709

Lock-free Smart Pointer
Instruction Read: 1,161,535,161
Data Read: 317,391,770
L1 Data Read Miss: 2,013,687
L2 Data Read Miss: 2,005,198
Data Write: 235,134,100
L1 Data Write Miss: 2,002,377
L2 Data Write Miss: 2,001,012

(pthread Mutex SmartPointer)
Creation: 0.51 sec
Iteration: 0.14 sec
Instruction Read: 1,149,615,787
Data Read: 311,412,927
L1 Data Read Miss: 3,014,567
L2 Data Read Miss: 3,005,312
Data Write: 237,144,326
L1 Data Write Miss: 3,002,797
L2 Data Write Miss: 3,001,194

Please note L1 and L2 cache read miss stays around as total number increases.
(except mutex: mutex calls another million of malloc)

It is not free lunch. But for read intensive applications, how attractive it is!

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

A Solution To Von Neumann bottleneck

Here's a question.

Do you have a problem about memory performance that might have been observed as CPU or I/O bottleneck before?

In other words,

Where's your interests about performance bottleneck today? Memory?

Even like this:

Do you use any Virtualization technology?

If yes, you are likely suffering from Von Neumann bottleneck, which is described as follows in Wikipedia:

The separation between the CPU and memory leads to the von Neumann bottleneck, the limited throughput (data transfer rate) between the CPU and memory compared to the amount of memory.

This is not something new, but is considered having been accelerated by increasing number of CPU cores and size of RAM, while having almost the same amount of cache and the same bus size between memory and CPU.

-

Intel released Xeon 7500 series(e.g. X7560) recently. It has NUMA architecture, which has respective dedicated memory bank and bus to each CPU. OS kernels have improved memory management system like SLAB to achieve better cache hit rate from CPU point's of view.

But, inherently, memory allocation is a responsibility of a programmer. A compiler can optimize instructions, but not data manipulated. Today's CPU can predict repeated pattern, but which is extremely simple like a jump.

So, I could say, the third factor, that has accelerated the bottleneck, is a programmer.

-

Object Database is a technology developed long time ago. And its benefit has been considered development efficiency through maximum utilization of Object Oriented Programming Language.

Recently, I found one possible huge benefit of Object Database that has not been addressed before.

ODBMS returns objects that are directly referenced by a user against a query. While returned values are copied to user objects with RDBMS. In this sense, ODBMS provides memory management system. While it is up to a programmer with RDBMS.

So, here's the potential that ODBMS can be a solution to Von Neumann bottleneck.

-

From CPU's point of view, cache is nothing but bringing data up to L1 by the time CPU process it.

L1 <- L2
(L2 <- L3)
L2 <- RAM
(TLB lookup)
(physical address <- virtual address)
(RAM <- I/O: disk or network)

So, I could say, ODBMS is a cache system.

This can explain up to 100x times faster performance on complex data structure like Graph.

Such a redefinition might unveil the true power of ODBMS.

Feedback awaiting moderation

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Object Database As Memory Management System

Object Database has survived for no particular reason. There's an obvious case that Object Database overwhelms other database systems both in performance and productivity.

It is where non-predeterministic query is issued, often in an algorithm.

For example, any algorithms on a Graph are good examples. Suppose you have a practical, optimized shortest path search algorithm. Which verteces you need to retrieve?

Unknown.

Until its previous vertex is evaluated.

So, in this case, it takes too long if it gets done straightforward. With RDBMS, you need to issue a query for every single vertex. That involves heavy tasks like disk and network access. While Object Database can prefetch/cluster them according to their relations. That will save those two heavy tasks down to 1/(prefetch_size * cache_hit_rate). Of course it gets worse if cache_hit_rate is too low, though.

The benefit is significant. In some particular cases. That's why 10 out of 10 US telecom uses Versant.

So, now I am wondering if we can extend the benefit.

Last year, in game industry, there was a trend called Data Oriented Design.

The Latency Elephant.

Until recently, we didn't doubt the best strategy was to bring data onto memory. It is almost the fastest. But it is not true anymore.

It looks like the same thing as described above between client and server happens between memory and CPU. When you zoom in by 1000x(compared to client/server), you'll see there's a long way.

The solution is obvious. To put related objects together on a single page so that cache miss won't happen. The result is shown in the same way, 1/(prefetch_size * cache_hit_rate), compared to the worst scenario(every object is on a different page from each other).

It is obvious, but not easy. OS hids physical representation and only present logical representation both in file and memory allocation. So, you have to roll the abstraction carpet, and go down.

But what about sending Object Database down there, and let it manage. It must be effective not only for existing users, but also for Memory Management System in general.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

What character do you think of about "A.I."?

Obviously, it is monotonous. One, single profile. Like "Star Wars Episode II". Those clones.

Also, often questioned if it has a mind or not. In Japan, there're many that have minds to help people against evils.

Oops, is it a robot? Robot isn't it "A.I."?

Definition.

"the study and design of intelligent agents," where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success.

Okay, so "A.I." has a goal by definition, lives to achieve a goal. This sounds, nothing to do with "intelligence". Or, "intelligence" is an ability to achieve a goal!?

Again, the definition.

Intelligence is an umbrella term describing a property of the mind comprehending related abilities, such as the capacities for abstract thought, reasoning, planning, problem solving, speech, and learning.

A property of the mind. But it should not be of oneself, but of others. It is "others" that think someone as "INTELLIGENT".

So, it doesn't matter if it has mind or is clever and perfect.

Is it safe to redefine "INTELLIGENT" as "RELIABLE"? You can count on it. Then, it is "INTELLIGENT". Like us for human, it is not entirely, but on a specific field/situation. "RELIABLE" needs a small print.

No, but are we "RELIABLE"? No. I could label us "NOT RELIABLE".

Yet, we can accept it. Then, we can get along with.

We expect something perfect. But nothing. Plus and Minus. Up and Down. I guess, it's easier to get along with such a "NOT RELIABLE". Someone perfect must be empty or too disciplined.

So, I guess we have an ability to recognize a certain character. It is a key. Then, we can get along with. A "character". It seems to be the realistic and helpful direction of "A.I.".

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Infinite loop on a finite state machine

Finite State Machine and Behavior Tree are often used in game AI. In game, designer asks expected actions to give users expected experience. So, generic offerings like machine learning is somewhat useless because it lacks control.

But there IS a difficulty in decision making. What action to take at when? There're many possibilities.

State --> Action --> Path Finding, Movement, etc.

A1 A2 A3 A4
PC
S1 S2 S3 S4 S5 S6 S7 S8

a1 a2 a3 a4
NPC
s1 s2 s3 s4 s5 s6 s7 s8

Action: A, a
State: S, s
* current in bold

Actions may be ordered by behavior trees.
States may be directed.
In FSM, there's only one state at a time, and is designed as directed graph.


So, in this case,
Decision Making is done as "Pick up an action or a chain of actions based on current states". And what leads to better decision making?

In FSM, for example in FPS, state is simple. In combat or not. Whenever NPC become aware of PC, it gets in combat mode to kill you. But the action could be complicated. Which path to get closer? With obstacles to hide? Can work as a team? Evaluation function? More variables gets you less control. If NPC is damn zombie, sending more zombies with damn pathfinder makes sense. But if it is not, it seems another layer of state machine is required. And, that would require another, and another, and another...

It may not help, but think about ourselves. We, human, does this in three stages.

-Define Goal
Often, defining goal is not easy. It is not plain, but hierarchical. In team, we share it.
-Observe
Collect information to describe the current situation towards the goal.
-Analyse
Analyse collected information, and list possible actions.

Wrong analysis on less information or wrong information likely leads to failure.

This happens also in game. Information given is limited both in its amount and its realtime ness due to system resource. Even if information is enough, the situation may not be obvious.

But a game is basically zero-sum game. I mean, there is an opponent. So if you know what the opponent does, you can choose a winning option. But if the opponent knows it, it can choose a counter solution. And if, and if, and if...

Now it seems we are in an infinite loop on a finite state machine. This may represent a common argue, "WE NEED MORE PROCESSING POWER". Seriously?

A sniper will put most of its efforts on collecting information. A soldier on actions to give more damages. If there is a diversity in NPC abilities, which fates its character, their fights to take their advantages to achieve their goals become a game. Doesn't it become an expectation of what your team mates do? Like, one tend to stay back and cover, another to go first and lead. That kind of character makes game play a lot easier to work with and more fun to fight against NPCs.

A character, an expectation, to count on others, simplifies decision making. PC reveals its character, too, through actions made in the past, which makes team play easier, and prevent bad users from the community to keep game experience.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

NPC Training College

NPC stands for Non Player Character, which moves autonomously. Here's the definition in WIkipedia.

What's for? To have NPC experienced. Then apply the experience to another.
What's needed? Reusability.
What's possible in the future? NPC will forcast, recognize differences from the actual happenings, then concentrait on the current situation(increase sensitibity rather than forcast) to achieve the goal. And it becomes an experience.

NPC:
has a goal.
can recognize the situation.
then make an action.

NPC will:
forcast.
recognize the differences from events actually happened.
concentrait to sense the situation to deal with unknown situation.

Underlying Theorem: Bayes' Theorem
1. Improve prior probability(by expert, by machine learning)
2. Collect posterior to have it experienced
Situation is modeled as Bayesian network. Then, makes an action.

The problem: Reusability
Raw sense information is too concrete to be reused.

Solution: Multiple sense layers(Hierarchical Baysian network )
According to "On INTELLIGENCE, Jeff Hawkins", human neo cortex is layered. HIgher a layer is, more abstracted information is. So sensing happens bottom-up, while forcast happens up-down.

Mmmm, this contains lots of tasks unresolved. I need to focus. But, this is an interesting project to have AI experienced by open, community based efforts.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Extend Your OO Programming Language

One major chance to use Object Database is when you need inter-process communication. You have options. File? Socket?

How do you manage an identity of an object? Is it possible to lock an object?

Isn't it a good reason to try Object Database?

Another is when you running out of memory spaces. Suppose you work on A.I. prototype that can take pasts into account, as time goes by, you're in the situation.

It's the best if all the objects fit in memory. But it costs high, both in hardware and power consumption. So it is natural to swap some objects off to disk. The whole graph is not always required. Some are required, others are preferred. Depending on the preference, stays in memory(cached) or stored on disk. But still, in logical representation, your objects exist, consist of the whole graph. Nice, eh?

If your situation were the mix of those two, it is highly recommended to try Object Database. Thus, Object Database can be considered to extend your OO programming language.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Server Energy Consumption Estimation Project

According to "The Economic Meltdown of Moore’s Law and the Green Data Center" presented by The Uptime Institute Inc. in 2007, this year, 2009 is when 3-years-electricity-cost will be the same amount as server cost. And in 2012, it is expected to be doubled.

What does this mean?

If you are a facility manager, you must be very sensitive to energy costs today. But what about you, as an IT manager who develops software? Probably not. For your business requirement does not include energy costs.

But soon, you will need to take it into account if the cost is rising as expected. It's easy to get the whole power consumption just by plug it into a measurement equipment, but doesn't tell which consumes how much at all.

Fortunately, there're many studies available. But unfortunately, no practical tool/project available today.

So I will get started my project next month, which is based on the study, "Full-System Power Analysis and Modeling for Server Environments". An overview of the project is illustrated below.

Months ago, I got 3 Dell-T105 cheap boxes with AMD Opteron Quad Core and 4GB RAMs. So they look convenient to begin with.

For the time being, the project is only for Linux. And the main performance counter to use is PerfSuite.

Basically, there won't be any technical problems, but the question is if such a user community can be built. I guess the project could be runnable by revenues through ads from vendors.

But at first, let's see my boxes can show reliable results.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Vizoo, YouTube For Graphs

Vizoo, YouTube for graphs, got TechCranched. I have led the db design from scrach over a year.

Not yet in English. But coming soon!

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

cSearch: to create, not to consume

Haruki Murakami is one of the most interesting writer in the world today.

As he often mentions, he likes Raymond Carver.

Once you read Ray's works, you never forget his taste. It is strange. In the begining, it looks normal, but in the end, it becomes totally scary world as if you got through to the other world.

Ray puts a little difference one by one. You can move on without paying too much attention. You can just put an ordinary image on the gap.

But if you do, you find it's hard to picture such a scene. So you have to imagine.

Then, in the end, you made up the story with bunch of images you had.

That is a blank or difference effect, I think. So you find something new whenever you read.

With this hypothesis in your mind, take a look at popular novels. They are filled with pictures, no blanks left.

You can enjoy for the first time, second?, third?, boring, boring, boring...

They are made to be consumed. While something with lots of blanks or differences is to be created in users.

Here, I see a significant difference. And it is a key, I think, to the coming world.

With such a concept, I made a search system, which will give you "How come? Ah-ha!" search experience. Available in 4 languages.

cSearch

Examples are blogged here in English. Just try, and tell me what it is.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Database that tells "Chigai"

This world is changing. From the one "how to do" matters, to the other "what to do" does.

Then, problems people have are changing accordingly.

Answering to "what to do" is not something others can do or should do. But to make a decision on "what to do" makes sense when there's any change on its field.

"Chigai" is Japanese that means "difference". An interesting property of "Chigai" is not the difference itself, but the gap that tells A is different from B. So I could say, "Chigai" holds something new information that A and B do not hold.

A change is recognized as "Chigai". And then perceived or interpreted to get accepted.

With this "Chigai" in mind, to deal with the "what to do" era, Database that tells "Chigai" is making more sense. An interesting fact is that such a database can tell a new information that the database doesn't store.

Okay, so how we recognize "Chigai"? Some recent brain sciences answer to this question. By expecting what'll come next, and to which a reality is compared.

I love the concept "Chigai", and its interesting property that says a gap, ?(emptiness), holds the source of creativity.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Probabilistic Graph Model

Feedback awaiting moderation

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Recognize how concepts are formed on the Web

This is just a brief idea to recognize how concepts are formed on the Web. It is not static, but dynamic. And it is shown in numbers, not in abstract way, to enjoy statistical techniques on it.

Invariant

At first, let me introduce how our brain recognizes things. You may want to take a look at this video by Jeff Hawkins to catch up the latest brain science. "Invariant", which is the term Jeff takes, is less changing, mostly static concept considered residing in our neocortex at levels.

Our brain translates things into these invariants. That's why we can tell you're human, so am I. Otherwise, every thing is different from each other. Then, your life will get wild.

"Invariant" *expects* what will happen next based on its context when actuall input is coming up. Emergency annunciator will get on if any differences are obserbed between the expectation and actuall inputs. Then, your consciousness gets involved to watch out.

It is natural to wonder if we refine Information Retrieval with such concepts.

Invariant representations for document

Let's consider document as usuall in IR.

What is "Invatiant" for document? As you can guess, and actually see in various researches about clustering, factoring, topic estimation, "Topic" looks the most natural choise to take.

Y = AX
Y: Document/Word(probability) matrix
X: Document/Topic(probability) matrix
A: Topic/Word(probability) matrix

Think about such a matrix X, which shows topical aspects of documents. You can get relax, don't need to get bothered here by the exact meaning of matrix. It is actually the equation used in GaP by Canny.

We'll get "Invatiant"s if we can find such A and X. Please not that it is just one way to get "Invatiant", and I think, is the best way to picture the idea.

Difference, Earthquake, Concept Map

We can apply this idea to Information Retrieval, not for search, but to see its statistical aspects of underlying document collection.

These will form X in the equation above.

Now you have "Invariant" representation by numbers for a query, a concept.

You can measure differences, deviations, means, whatever you like.

If we measure deviations shown in concepts like doing for "Earthquake", isn't it safe to say that similar measures mean they are close to each other geometically? Then, we'll get dynamic "Concept Map" on the web.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Document Similarity By Color

The way to get document similarity by color

It is the invention that I made an application for patent last week. That is the key, I think, that puts Information Retrieval one level up higher.

The first demo will be launched late March.

Background

How does Performance of Information Retrieval get measured? If you are not familier with "Precision" and "Recall", check them back at the Information Retrieval.

We have two major Information Retrieval tools. A database and a search engine.

A database provides higher precision and lower recall. That means you need to specify detailed search criteria to get what you need.

While a search engine does opposite, lower precision and higher recall. That means you're likely to get what you need, but need to find out among huge results.

Wait, Google is not bad like that.

True. So this lacks two practical contepts. Search results can be ordered, and search results can be paged.

Earlier you get what you need, higher precision you get. So a search engine usually provides higher precision and higher recall, it depends on Ranking, though.

It is such a great innovation, actually. And it has driven the adoption of search-engine-like-capability everywhere.

Problem

It's been more than 10 years since Google, Stanford actually, made an application for patent about PageRank. It was a birth of search engine era.

Thanks to the spread of such a great technology, web documents got exploded known as Information Explosion.

For example, you find document A at the 5th out of 100 results. One year later, the results get doubled. Then you find document A at the 10th out of 200 results. One more year later, it gets out of the first page. (I assume Ranking technology stays the same)

While, what about the size of a page? That is related to human capability rather than technology, so stays almost the same. 10 results or so per page. No?

Now we have a problem. The precision gets lower and lower.

To make it worse, one of the major search engine improvements is to recognize topics. With topic recognition capability, more data is coming in.

In fact, rather than recognizing more topics, to nail down, to personalize search results is what Google is doing.

In this way, I think, something that improves recognition of a page by human is required. Otherwise, we'll get even lower precisions without getting great benefits by topics, Onthology.

Solution

Then, you reach to my invention. Document Similarity By Color.

With this technology, you can understand topics that a document represents by color, and then can search for what you want like chasing one color.

I think this implicates a power shift to come. So, I call the phenomenon "Liberation Of Search".

Just my 2cents ...

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Device Database Platform

What is the paramount of Unix for you?

For me, it's modularity. It's amazing, I can't believe, that the design has survived for more than 30 years, especially in the last decade.

There is a reason that makes me astonished.

I am the one among Java generation. I thought C is something fixed, in changes both of requirements and hardware. I mean, C is not something you can deal with changes. But it is not absolutely true. You'll see when you look at all those wide variety of adoptions. Especially, in the appearence of Linux.

Linux, what a great adoption of Unix.

Legend has it that the kernel developers were constantly tweaking the C code they wrote in the Linux kernel in order to control the 80x86 machine code that the GCC compiler was producing.

says Randall Hyde in his book, "WRITE GREAT CODE VOLUME 2: THINKING LOW-LEVEL, WRITING HIGH-LEVEL".

In general, Java guys are indifferent to low-level issues since they think it is the due of JVM or have never thought about it. Of cource, I used to be one of those. Such guys can count on one of the best possible implementation by making use of Java library written by Java developers, but how could JVM translate a code written by them into efficient bytecode? And then to efficient assembly for a target machine?

True. In this Virtualization era, you may wonder, if such an effort makes any difference. I agree. How many layers, abstractions, your application might have to get through? Ten?

Beside, you might argue quoting the phrase, "premature optimization is the root of all evil". But then, I will ask. Have you ever succeeded in tuning performance at last without upgrading hardware?

I have felt some kind of, how can I say, like you're in a cage like a bird. If you feel like that, you may want to forget about all those abstractions, then think at hardware-level from scratch.

Then, what came to me is the idea, Device Database Platform.

It is a modular database. Imagine VFS in Linux kernel.

There are a couple of reasons why a device database has to be modular.

A software on a device has to be optimized as possible as one can do. The work force cost is considered chieper than hardware cost brought by an increase of unit cost for inefficient code. So you need to get rid of unnecesarry parts, and replace inefficient module with another designed best. This requirement keeps device guys stay on RYO database simply because a commercial database lacks the ability for customization, which means quite expensive on a device.

It is easy to imagine that you can design more efficient algorithm for a specific usage. Besides, there are many CPU architectures and OS involved. So, as Linux lernel guys does, there are many rooms improved for every combination.

Actually, device guys have done that. So they write their own database. But it is also true, they want a database.

When I consider these conditions, and learn from Unix and Linux, what occurs to me is to design and write a modular database.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Get Hadoop up and running without DNS

In this couple of days, I have tried to get Hadoop Word Count ruuning on my local cluster with 3 CentOS boxes.

Thanks to Running Hadoop On Ubuntu Linux (Multi-Node Cluster), 90% of the set up was easy as described.

But there are two problems that I had to waste my time.

1. RSA authentication with SSH

authorized_keys file has to be accessible only by the user. Don't forget to disable any access by any groups and others.

2. Host name resolution

examples of hosts files

master

::1 localhost6.localdomain6 localhost6
192.168.10.21 master * master has to have accessible IP address(not ::1 nor 127.0.0.1) by slaves
192.168.10.22 slave.yellow
192.168.10.23 slave.red

slave.yellow

127.0.0.1 slave.yellow localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.21 master
192.168.10.23 slave.red

slave.red

127.0.0.1 slave.red localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.21 master
192.168.10.22 slave.yellow

Be careful about these network settings, then the Work Count should run.

2008/9/14 tested Hadoop-0.18.0, JDK1.6.0_10, CentOS5.2

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Essay "Katakana" Dictionary

Inspired by Haruki Murakami, the unique novel writer, we just launched a new blog, which is like an essay "Katakana" dictionary.

We have three types of words in Japanese. Kanji(??), Hiragana(????), and Katakana(????). Katakana represents sounds, mostly of foreign words like Coca Cola(?????).

One Katakana word a day. Then, one guy writes an essay related to the word. Interesting, isn't it.

Here's the dictionary. Enjoy!
http://ameblo.jp/teamharukist/

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Object fundamentalism and Business

I love Jazz, and so to play trumpet.

I don't like to manage nor to be managed.

I love OpenSource mind like hippy's.

I love objects.

But the problem is, they are not so successful in Business.

Okay. Let me think. Why?

...

It is only me that enjoys?

That could be a reason.
But rather, it would be a mandatory to succeed.

The reason is, perhaps, the lack of attitude for audience/end users.

Playing trumpet is a part of my life, even myself.
So, that would be great if audience get high/relaxed,
but that's not the first thing.

Object? That's for my business.

Object fundamentalism? I don't think I am, but some think I am.

Cache, an ultimate enterprise object-capable database.
db4o, an open source object database for embedded system
Rational, needless to say

The Object Fundamentalism family got a certain level of success.
But it seemed to be limited so far.

Why?

Through other businesses, I realized that
a successful technology can provide end users with benefits directly.
Oracle, Google, VMWare, Salesforce.
And they have lots of believers who brings the bible to end users
to integrate, convince, pray and sell.
Sometimes, those believers put some benefits on top of it,
but even without it, the bible itself is valuable.

What about Object Fundamentalism family?

It depends on engineers.
That means a value is created *by engineers* for their customer.
So, the point would be to hire a great engineer rather than a product.

The Object Fundamentalism itself is worse than a piece of bread for end users.

How to improve the situation?

A product/service should have a clear benefit for end users, not (only) for engineers.

For end users, object words are as good as, with Japanese old saying, Buddha's words to a horse.

- I found an interesting story from "Essential Drucker".

The three stonecutters who were asked what they were doing. The first replied, "I am making a living." The second kept on hammering while he said, "I am doing the best job of stonecutting in the entire country." The third one looked up with a visionary gleam in his eyes and said, "I am building a cathedral."

The third man is, of course, the true "manager." The first man knows what he wants to get out of the work and manages to do so. He is likely to give a "fair day's work for a fair day's pay." It is the second man who is a problem. Workmanship is essential; without it no business can flourish; in fact, an organization becomes demoralized if it does not demand of its members that most scrupulous workmanship they are capable of. But there is always a danger that the true workman, the true professional, will believe that he is accomplishing something when in effect he is just polishing stones or collecting footnotes. Workmanship must be encouraged in the business enterprise. But it must always be related to the needs of the whole.

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

UNIQLOCK

A bit late, but Enjoy!

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

BigTable(next generation database led by Google) 1

BigTable. It could be pretty big as it sounds. According to Jeffry Dean, who is a fellow at Google, the biggest one today is up to 4000TB, spanning over thousands of servers..., only a table!!!

Have you ever heard about BigTable? Unless you're a database vendor or a Google infrastructure freak, I'm afraid you haven't.

According to the paper titled Bigtable: A Distributed Storage System for Structured Data, it is like:

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Two points.

  • scale to a very large size: petabytes of data across thousands of commodity servers
  • both in terms of data size and latency requirements

How?

Performance matters

To reduce HDD seek time is a keen point to general performance of computer since I/O costs million times more than CPU's. So, it's wise to fetch as big chunk as possible at once.

Today, most of user files are getting larger and larger, ever larger. But still, a unit of HDD stays smaller, 512KB usually, and 512KB-4KB of filesystem on Linux.

Google File System, Google's underlying distributed filesystem, makes use of a huge chunk, 64MB in size.

The next thing to consider is layout. How data should be laid on a block? Contiguous data can be read/written from/to disk at once.

A database usually put one row on a contiguous space. So as long as you put all the data you require on a single record, you can get the best performance. Some databases provide another approach, column oriented.

BigTable is not a conventional table

It's more like a spreadsheet. And a map under the hood.

BigTable offers a new way both in performance and functionality. Next time, I will show you details.

1 comment

Comment from: yasso [Visitor]
But many are not real database, at least not the type we thought of. Although they are quite fast and have good scalability, they are in very simple structure. And they store data and search data by index. For example LEXST database, if you count on them to do transactions as oracle does, you are wrong.
2009/04/01 @ 20:10

This post has 1 feedback awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Google Visualization API On Your Site Example

This is a Google Visualization Gadget example that is hosted on my site.

SONY's sales&op_profits by segments since 1996. Enjoy!

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Visualize Your Data with Google Visualization API

Today, here at Google I/O 2nd day, a new Google Visualization API was announced.

What's new?

1. events
2. gadget.draw()

Google Visualization Gadget supports selection events so that a developer can respond to end users.

With draw() method, a gadget can draw any tables only if a table supports pre-defined DataTable APIs.

Let's try the new draw() method with the JSON example.

Instead of writing tables with HTML tags, simply pass DataTable object to Table gadget.

Here's the code.

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <script type="text/javascript" src="http://www.google.com/jsapi"></script>
    <script type="text/javascript">
    
    google.load("visualization", "1", {packages:["table"]});
    
    var xhr;
    try {  xhr = new ActiveXObject('Msxml2.XMLHTTP');   }
    catch (e)
    {
        try {   xhr = new ActiveXObject('Microsoft.XMLHTTP');    }
        catch (e2)
        {
          try {  xhr = new XMLHttpRequest();     }
          catch (e3) {  xhr = false;   }
        }
     }
 
   xhr.open("GET", DATASOURCE_URL,  true);
   xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
   xhr.send(null);
  
   xhr.onreadystatechange  = function()
    {
         if(xhr.readyState  == 4)
         {
              if(xhr.status  == 200){
                  the_object = eval( "(" + xhr.responseText + ")" );
                  handleQueryResponse(the_object.table);
              }else{
                
              }
         }
    };
 
      // Query response handler function.
      function handleQueryResponse(table) {
        var data = new google.visualization.DataTable();
        
        // convert to Google.DataTable
        // column
        for (var col = 0; col < table.cols.length; col++) {
          data.addColumn('string', table.cols[col].label);
        }
        // row
        for (var row = 0; row < table.rows.length; row++) {
          data.addRow();
          for (var col = 0; col < table.cols.length; col++) {
            data.setCell(row, col, table.rows[row][col].v);
          }
        }
        
       var vis_table = new google.visualization.Table(document.getElementById('table_div'));
       vis_table.draw(data, {showRowNumber: false});
        
      }
 
    </script>
  </head>
 
  <body>
    <div id="table_div">Loading...</div>
  </body>
</html>

Thanks a lot, Google Visualization Team! Now that our own data can be easily integrated with Gadget!

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

Inspiration encouraged by blanks

Kenichiro Mogi, who is a brain scientist, explains how inspiration works in his book, "Hirameki-Nou".

"Hirameki" is an inspiration in Japanese, and "Nou" is a brain.

So what is "inspiration" at all?

It is a difficult question. So let's look at a couple of similar events first.

FOK, "Feeling Of Knowing".

FOK is an interesting feeling. "I think I know, but I don't know". If you don't know, how did you know that you know it!?

This is something a today's computer never does.

When you have FOK in a exam or something, you feel "emergency". "emergency" will produce "emergence". It is a kind of "inspiration".

Savant syndrome

There're people who have an unbelievable ability, called Savant syndrome. They can take a picture of something, as it is, exactly the same.

Paintings in Lascaux looks done by people with Savant syndrome. Or an acient people had Savant syndrome.

Human might have lost the ability for the sake of language.

But people with Savant syndrome are not good at communication. You can imagine the movie "Rainman".

Our brain can produce "meanings" out of our memories, which are interweaved by our experiences/learnings.

The ability to find "meanings" looks having a trade-off against the ability of "Savant syndrome".

Slow learnings

2+3=?

It is solved by fast learnings. Modern people are good at fast learnings, each of which has a clear process to solve based on a certain algorithm.

"Computer" is the typical outcome.

While "inspiration" can not be solved by fast learnings. It takes some time to get an inspiration. No one has no idea when it comes.

"Inspiration" requires slow learnings.

ACC and LPFC

ACC(Anterior Cingulate Cortex) is a watch tower in our brain, which is in frontal cortex. When we feel something like pain, ACC reacts at first.

ACC tells LPFC the event. LPFC stands for Lateral Prefrontal Cortex. Then, LPFC issues orders to related parts like "Take a rest" or "Do this immediately".

Blanks

It seems that our brain tries to fill blanks unconsciously. As if it is the purpose for them.

What are those blanks?

Looks like an inconsistency in one's own world out of all the experiences/learnings.

For example, Newton had been wondering "why doesn't the Moon fall down?". It is an inconsistency in his world.

Inspiration encouraged by blanks

Blanks are produced in a bottom up fashion unconsciously. They drive you to fill them.

It will create a certain environment/feeling similar to FOK.

Then, you will learn and experience a lot. They are slow learnings.

Someday, you'll get it, successfully being inspired. Or may not.

In this way, an inspiration is encouraged by blanks.

The mechanism to find "blanks" or an inconsistency is interesting. It can not only suggest an inconsistency, but also tell new findings.

A database that can suggest something has to recognize this kind of blanks, inconsistencies in its world. It is nothing but "inspiration".

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
PoorExcellent
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

JSON Restful Web Service in Java

What kind of ways are available to get JSON Restful Web Service in Java?

Wait, what is JSON? Also Restful Web Service exactly?

Actually, I got to know them only a couple of weeks ago.

If you have tried XML Web Service with SOAP, you probably know how hard to get performance, and ease of use without complexity(especially about schema binding).

Then, JSON Restful Web Service must be what you will get interested in.

Let's go back to the first question. You can think of two providers. Sun and Apache. Here're options to go first.

  1. Apache Axis2.0
  2. JAX-WS with JSON extention

In my opinion, they're just a topping on top of XML Web Service. That is the very basic mismatch. Feel free to try if you need to make sure

This is the one, with which you can get a true power of JSON Restful Web Service.

Restlet

Take a look at Introduction on the site if you are not 100% sure about Restful Web Service(REST).

I'm going to show you an example to implement a service that returns DataTable compatible with one of Google Visualization API. The source of information is this thread in Google Visualization Discussion.

Code:

<CHUNK1>
google.visualization.Query.setResponse(
{
        requestId:'0',
        status:'ok',
        signature:'6173382439516707022',                /* Changes when data changes */
</CHUNK1>
So the timeseries gadget must download this JSON every so often and
check the new signature against it's own.  If the signature is
different, it knows that the data on the "spreadsheet" has changed.
<CHUNK2>
        table:{
                cols:
                        [
                                {
                                        id:'A',
                                        label:'Date',
                                        type:'d',               /* d=date */
                                        pattern:'M/d/yyyy'      /* unique date pattern */
                                },
                                {
                                        id:'B',
                                        label:'Budget',
                                        type:'n',       /*      n=number */
                                        pattern:'#0.###############'    /* unique number pattern */
                                },
                                {
                                        id:'C',
                                        label:'Revenue',
                                        type:'n',
                                        pattern:'#0.###############'
                                },
                                {
                                        id:'D',
                                        label:'Movie',
                                        type:'t',       /*      t=text */
                                        pattern:''      /* there is no text pattern */
                                }
                        ],
</CHUNK2>
Chunk2 shows the columns needed for a timeseries chart. (Date,
Value1, ..., ValueN, PopupText)
Note the type and pattern fields.  d=date, n=number, t=text
<CHUNK3>
                rows:
                        [
                                [
                                        {
                                                v:new Date(1981,10,6),
                                                f:'11/6/1981'
                                        },
                                        {
                                                v:5000000.0,
                                                f:'5000000'
                                        },
                                        {
                                                v:4.2365581E7,
                                                f:'42365581'
                                        },
                                        {
                                                v:'Time Bandits'
                                        }
                                ],
</CHUNK3>

To get this type of JSON Object is the goal. So how to do this?

In order to understand the following code, you may want to learn the tutorial how Restlet models the resource oriented Restfull world.

Restlet

Code:

import org.restlet.Application;
import org.restlet.Context;
import org.restlet.Restlet;
import org.restlet.Router;
 
public class JSONApplication extends Application {
 
  public JSONApplication() {
    super();
  }
 
  public JSONApplication(Context arg0) {
    super(arg0);
  }
 
  @Override
  public Restlet createRoot() {
        Router router = new Router(getContext());
        
        router.attach("/table", JSONTableResource.class);
        
    return router;
  }
 
}

Table Resource and JSON Representation

Code:

import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.restlet.Context;
import org.restlet.data.CharacterSet;
import org.restlet.data.MediaType;
import org.restlet.data.Request;
import org.restlet.data.Response;
import org.restlet.data.Status;
import org.restlet.ext.json.JsonRepresentation;
import org.restlet.resource.Representation;
import org.restlet.resource.Resource;
import org.restlet.resource.ResourceException;
import org.restlet.resource.Variant;
 
public class JSONTableResource extends Resource {
  
  private static final String JSON_NAME_TABLE = "table";
  
  private static final String JSON_NAME_COLUMNS = "cols";
  private static final String JSON_NAME_COLUMNS_ID = "id";
  private static final String JSON_NAME_COLUMNS_LABEL = "label";
  private static final String JSON_NAME_COLUMNS_TYPE = "type";
  private static final String JSON_NAME_COLUMNS_PATTERN = "pattern";
  
  private static final String JSON_NAME_ROWS = "rows";
  private static final String JSON_NAME_ROWS_V = "v";
  private static final String JSON_NAME_ROWS_F = "f";
 
  public JSONTableResource(Context context, Request request, Response response) {
    super(context, request, response);
    
    getVariants().add(new Variant(MediaType.APPLICATION_JSON));
  }
 
  @Override
  public Representation represent(Variant variant) throws ResourceException {
    JSONObject json = new JSONObject();
    
    try {
      
      json.put("requestId", "0");
      json.put("status", "ok");
      json.put("signature", "6173382439516707022");
      
      json.put(JSON_NAME_TABLE, this.createTable());
      
    } catch (JSONException e) {
      throw new ResourceException(Status.SERVER_ERROR_INTERNAL);
    }
    
    JsonRepresentation jr = new JsonRepresentation(json);
    
    jr.setCharacterSet(CharacterSet.UTF_8);
    
    return jr;
  }
  
  private JSONObject createTable() throws JSONException{
    JSONArray columns = new JSONArray();
    JSONArray rows = new JSONArray();
    JSONObject r_c = new JSONObject();
    r_c.put(JSON_NAME_COLUMNS, columns);
    r_c.put(JSON_NAME_ROWS, rows);
    
    this.createColumns(columns);
    this.createRows(rows);
    
    return r_c;
  }
  
  private void createColumns(JSONArray columns) throws JSONException{
    
    columns.put(this.createColumn("A", "Date", "d", "M/d/yyyy"));
    columns.put(this.createColumn("B", "Budget", "n", "#0.###############"));
    columns.put(this.createColumn("C", "Revenue", "n", "#0.###############"));
    columns.put(this.createColumn("D", "Movie", "t", ""));
    
  }
  
  private void createRows(JSONArray rows) throws JSONException{
    
    JSONArray row = new JSONArray();
    
    row.put(this.createCell("new Date(1981,10,6)", "11/6/1981"));
    row.put(this.createCell("5000000.0", "5000000"));
    row.put(this.createCell("4.2365581E7", "42365581"));
    row.put(this.createCell("Time Bandits", null));
    
    rows.put(row);
    
  }
  
  private JSONObject createCell(String v, String f) throws JSONException{
    JSONObject jo = new JSONObject();
    jo.put(JSON_NAME_ROWS_V, v);
    if(f != null)
      jo.put(JSON_NAME_ROWS_F, f);
    return jo;
  }
  
  private JSONObject createColumn(String id, String label, String type, String pattern) throws JSONException{
    JSONObject jo = new JSONObject();
    jo.put(JSON_NAME_COLUMNS_ID, id);
    jo.put(JSON_NAME_COLUMNS_LABEL, label);
    jo.put(JSON_NAME_COLUMNS_TYPE, type);
    jo.put(JSON_NAME_COLUMNS_PATTERN, pattern);
    return jo;
  }
 
}

web.xml

XML:

<?xml version="1.0" encoding="UTF-8"?>
<web-app id="WebApp_ID" version="2.4"
            xmlns="http://java.sun.com/xml/ns/j2ee"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee
                 http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
   <display-name>first steps servlet</display-name>
   <!−− Application class name −−>
   <context-param>
      <param-name>org.restlet.application</param-name>
      <param-value>
         firstSteps.JSONApplication
      </param-value>
   </context-param>
 
   <!−− Restlet adapter −−>
   <servlet>
      <servlet-name>RestletServlet</servlet-name>
      <servlet-class>
         com.noelios.restlet.ext.servlet.ServerServlet
      </servlet-class>
   </servlet>
 
   <!−− Catch all requests −−>
   <servlet-mapping>
      <servlet-name>RestletServlet</servlet-name>
      <url-pattern>/json/*</url-pattern>
   </servlet-mapping>
</web-app>

You can get JSON Object in Java here.

Here's the JSON texts you'll get.

Code:

{"status":"ok","requestId":"0",
"table":
  {"cols":[
        {"id":"A","pattern":"M/d/yyyy","label":"Date","type":"d"},
        {"id":"B","pattern":"#0.###############","label":"Budget","type":"n"},
        {"id":"C","pattern":"#0.###############","label":"Revenue","type":"n"},
        {"id":"D","pattern":"","label":"Movie","type":"t"}  
      ],
  "rows":[
        [
          {"f":"11/6/1981","v":"new Date(1981,10,6)"},
          {"f":"5000000","v":"5000000.0"},
          {"f":"42365581","v":"4.2365581E7"},
          {"v":"Time Bandits"}
        ]
      ]
  },
"signature":"6173382439516707022"}

Then, you can call the service, get DataTable, then process to visualize. Here's an example to show a simple table in Ajax.

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <script type="text/javascript">
    
    var xhr;
    try {  xhr = new ActiveXObject('Msxml2.XMLHTTP');   }
    catch (e)
    {
        try {   xhr = new ActiveXObject('Microsoft.XMLHTTP');    }
        catch (e2)
        {
          try {  xhr = new XMLHttpRequest();     }
          catch (e3) {  xhr = false;   }
        }
     }
 
   xhr.open("GET", JSON_RESTFULL_WEB_SERVICE_URL,  true);
   xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
   xhr.send(null);
  
   xhr.onreadystatechange  = function()
    {
         if(xhr.readyState  == 4)
         {
              if(xhr.status  == 200){
                  the_object = eval( "(" + xhr.responseText + ")" );
                  handleQueryResponse(the_object.table);
              }else{
                
              }
         }
    };
 
      // Query response handler function.
      function handleQueryResponse(table) {
 
        var html = [];
        html.push('<table border="1">');
 
        // Header row
        html.push('<tr><th>Seq</th>');
        for (var col = 0; col < table.cols.length; col++) {
          html.push('<th>' + escapeHtml(table.cols[col].label) + '</th>');
        }
        html.push('</tr>');
 
        for (var row = 0; row < table.rows.length; row++) {
          html.push('<tr><td align="right">