Sort in Lucene

There are so many different posts I want to add, that in the thought of making them I am making none 😛

Well have been recently working on Neo4J . Neo4J is a Graph Database. It has several advantages over Relational Databases where there are a lot of traversals ( For examples in Social Networking sites, where one moves from one friend to another frequently). With updating and traversal, each traversal requires a Join operation. And Join’s are known to be expensive. Further its tough representing common data structures like ordered lists, hierarchies, trees or web page content in Relational DB.

Well this post is not about Graph DB vs Relational Db. That will come sometime later.

Neo4J internally uses a text processing engine called Lucene (Yes! twitter heavily relies on Lucene). So just for fun, I had indexed 3 lakh nodes in Neo4J. And as I said Neo4J uses lucene, I gave a small query to sort these 3 lakh nodes depending upon their unique random number allotted to them.

And I also using HeapSort sorted 3 lakh random numbers. The below are the results:

So it took 1.593 seconds to sort 3 lakh random nodes. And 0.468 to search a node with some other random number.
Quick Sort takes 0.3 seconds. HeapSort takes 0.25 sec (MergeSort should be the best choice, but had to code and implement it, heapsort should be pretty close).
Well 1.593 sec is absolutely brilliant as there is i/o involved because it needs to load the nodes from the database. So with all that, its quite impressive.

PS: My comp config –
2.4 GHz, Quad Core
32-bit Windows 7
Java version:
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) Client VM (build 21.1-b02, mixed mode, sharing)