Hack Puzzle – 1

I have been thinking for quite some-time to build a puzzle. I think I have built an awesome one and hope everyone finds it awesome to hack :)

You can download the jar from here.

Aim:
  • Make the program terminate gracefully
Terminate gracegullyImplies that the program should end. And it should not end by throwing an Exception or in an abnormal manner. Using `System.exit` or the program terminating due to an exception is not graceful termination.
To run:
  • java -jar DeadLock2.jar
If successfully run it will output:

Oops there is a deadlock. Please solve the deadlock to make me move forward.

Rules:
  1. You cannot replace any of the jar contents with your files
  2. You cannot recompile any of the classes in jar and replace them
  3. You are permitted to alter/modify the contents of the jar/class files
  4. You are permitted to add or use any external library to solve it
If successfully solved, the program prints the below:

Puzzle Soved!! Congratulations you have solved the puzzle.

Comment below the tools you used and time it took to solve. Happy Hacking :-)

Replace View Bounds

Well the scala community has decided to deprecate view bounds in Scala. I suspect it might have to do with brevity. Full discussion can be viewed here

Well looks like most of the view bounds (<%) can be converted to context bounds. Lets see different strategies that can help us do the same. Suppose we have:


scala> def foo[T <% Int](x: T):Int = x
foo: [T](x: T)(implicit evidence$1: T => Int)Int

scala> implicit def a[T](n:T) = n match {
 | case x:String => x.toInt
 | }
warning: there were 1 feature warning(s); re-run with -feature for details
a: [T](n: T)Int

scala> foo("23")
res4: Int = 23

To convert it to `context bound` we need to implicitly have something that does T => Int. We could:


scala> type L[X] = X => Int
defined type alias L

scala> def goo[T : L](x: T):Int = x
goo: [T](x: T)(implicit evidence$1: T => Int)Int

scala> goo(23)
res2: Int = 23

We could combine the type declaration with the function definition as:


scala> def goo[T : ({type L[X] = X => Int})#L](x: T):Int = x
goo: [T](x: T)(implicit evidence$1: T => Int)Int

//quite cryptic though

A more readable version:


scala> def goo2[T](x: T)(implicit ev: T => Int):Int = x
goo2: [T](x: T)(implicit ev: T => Int)Int

scala> goo2("12")
res8: Int = 12

More generalized version:


scala> def goo[E,T : ({type L[X] = X => E})#L](x: T):E = x
goo: [E, T](x: T)(implicit evidence$1: T => E)E

scala> def goo2[T, E](x: T)(implicit ev: T => E):E = x
goo2: [T, E](x: T)(implicit ev: T => E)E

scala> goo("1000")
res10: String = 1000

scala> goo2("1000")
res11: String = 1000

Asynchronous Distributed Key-Value DB

I have developed a simple-beta version of an synchronous distributed key-value database. It is simple and quite robust and I plan to make it not so-simple and open to other languages (apart from JVM based ones) in the future :). You can download it and start using  from github.

In short (Refer wiki for more info): Suraj Bhirani

  1. It is Asynchronous
  2. You can have a single primary node with any number of secondary nodes which can join or leave arbitrarily. 
  3. It is truly distributed. You just need the address of “Arbiter” using which any node running on any instance can start acting as a node.
  4. Insert (Key-Value pair) can only happen at primary node. Search can be done on any node.
  5. Order and consistency is maintained at all nodes. i.e. given enough time, all replicas settle on the same view.
  6. Each node has a Persistence service which takes backup of every insertion/removal. This is client dependent and can be done as desired using any SQL/NoSQL database or file for that matter.
  7. Tested vigorously. I believe more effort has gone in testing it than building it.

The following are the current limitations. I hope to remove them by the next version

  1. Currently updates are only possible on a dedicated node, the primary replica.
  2. The primary (leader) node cannot fail during the uptime of the system.
  3. Membership is handled reliably by a provided subsystem (see section “The Arbiter”). New Replicas which wish to Join should send aJoin call to Arbiter. But for a replica to leave, the current process is quite tedious by sending a manual Replcias(Set[ActorRef])call to Primary Replica.
  4. The update rate is low, meaning that no incoming requests are rejected due to system overload. In case of rejecting an update the store is left in a possibly inconsistent state which may require a subsequent succeeding write to the same key to repair it.
  5. Clients are expected not to reuse request IDs before the request has been fully processed and responded to.
  6. There is an utter small chance for the database to be in in-consistent state. Please refer the section “Consistency in the case of failed replication or persistence” in the overview.

But in short “It Works!!”

Technologies:

  1. Scala
  2. Akka
  3. SBT as build tool

I highly recommend everyone to have a look at Akka. There are very frameworks that scale as beautifully as Akka does.

Special Thanks to the course “Principles of Reactive Programming” at coursera to help me get started.

Tagged ,

Atomicity from single-cycle compare & swap

There is a wonderful function defined in scala.concurrent.Future trait called onComplete. Here is the declaration:

def onComplete[U](func: Try[T] => U)
              (implicit executor: ExecutionContext): Unit

There is one key-point in the documentation which says :

If the future has already been completed, this will either be applied immediately or be scheduled asynchronously.

The reason I call it is wonderful function is its implementation. Roughly speaking doing:

import scala.concurrent._
import ExecutionContext.Implicits.global
val future = future{
Thread.sleep(1000)
"Successful"
}
future onComplete{
case Success(i) => println(i)
case Failure(i) => println(i)
}

Line 3 dispatches the method to a thread and returns a Future. At line 7 we start an asynchronous call which says – When the task is completed, depending upon the return value, please print it.

On reading the documentation, it  might suggest that it creates a new java.lang.Thread as soon as line-7 is executed. This Thread waits for “future” to complete and then print the value. But say if the thread is from Executors.newCachedThreadPool then this might also mean that for every call to onComplete, a separate Thread is responsible to execute the above mentioned. A Disaster!!

But its inventors were smart. What they instead did was to have an internal list which stores all functions to be called once the Future completes. Once it completes, it calls back all those registered functions itself or using ExecutorService. This way no extra Thread has to be creates or blocked. What is interesting is that they did not use any internal list to store the list of callback functions.


private volatile Object _ref;
final static long _refoffset;

protected final boolean updateState(Object oldState, Object newState) {
return Unsafe.instance.compareAndSwapObject(this, _refoffset, oldState, newState);
 }

Now calling onComplete does below. It replaces previous pointer with a new list containing previous list and a new addition. On calling multiple times, it keeps adding to the previous list.

updateState(listeners, runnable :: listeners)

function tryComplete of CallbackRunnable is called once the future isCompleted. Now is when all the above listeners are to be executed which are dispatched to the ExecutorService.

def tryComplete(value: Try[T]): Boolean = {
  val resolved = resolveTry(value)
   (try {
     @tailrec
     def tryComplete(v: Try[T]): List[CallbackRunnable[T]] = {
      getState match {
        case raw: List[_] =>
        val cur = raw.asInstanceOf[List[CallbackRunnable[T]]]
        if (updateState(cur, v)) cur else tryComplete(v)
            case _ => null
      }
    }
   tryComplete(resolved)
  } finally {
    synchronized { notifyAll() } //Notify any evil blockers
  }) match {
     case null => false
     case rs if rs.isEmpty => true
     case rs => rs.foreach(r => r.executeWithValue(resolved)); true
   }
 }
def onComplete[U](func: Try[T] => U)(implicit executor: ExecutionContext): Unit = {
 val preparedEC = executor.prepare
 val runnable = new CallbackRunnable[T](preparedEC, func)

  @tailrec //Tries to add the callback, if already completed, it dispatches the callback to be executed
  def dispatchOrAddCallback(): Unit =
   getState match {
    case r: Try[_] => runnable.executeWithValue(r.asInstanceOf[Try[T]])
    case listeners: List[_] => if (updateState(listeners, runnable :: listeners)) () else dispatchOrAddCallback()
   }
  dispatchOrAddCallback()
  }
 }

What is nice about the above is how they have used a single cycle processor call – “Unsafe.instance.compareAndSwapObject” to achieve synchronization. At line-30 recursive calls are made to swap it. Say if some other thread at the exact moment did not change the reference (i.e. add a future), then in the first recursive call, the operation succeeds. Say if someone did, then one more recursive call is made where it tries again. This is a superb optimization where without using any synchronization, it is able to do both “Memory Visibility” and “Atomicity”.

Here is lazy initialization (github link) implementation in Java inspired from Scala’s lazy val implementation. It uses similar technique to guarantee atomicity with memory-visibility.

Tagged

Issues with Java 7 on Mac OS

At my work place, I was involved in building a desktop application for Mac (priority) & Windows (lesser priority). So for about 5 months, I did all my development work on Mac and now I think I have a good feel of the OS.

So the application was built using  JavaFX. There were native calls to be made to some C libraries, and I used awesome JNA to do it. But in the whole, I had a terrible time working with JVM on Mac and below is a list of why it is such a pain:

 

Some History:

Till Snow Leopard (not sure), Java came per-installed on Mac. Apple had its own bindings to HotSpot VM. But after Java 7, Apple decided to stop porting JDK and Oracle provided the built. And here starts, half the problems:

(In no particular order)

  • Firstly it is excruciating working with Java 7 based desktop applications . Do not even try it on Retina display; For the font.  It is horrible and blurry. I use Monospaced 13 on eclipse and it is pathetic. And I dont think any more text can ever describe how blurry and pathetic it is.

It is a registered bug which will only get fixed by Java 8 (What!!). It seems the behavior is due to the fact that Oracle built does not support HiDPI. A quick working solution is to use Apple Java 6 to run your applications if you can . For eclipse and netbeans, here.

  • A lot of apple libraries such as Quick Time and others are 32-bit builds. The canon sdk, I was working with only has a 32-bit build available. And only a 32-bit JVM can do native calls with 32-bit libraries. Apple Java 6 is a 64-bit VM and can be run in 32-bit mode using `-d32` jvm argument.

The issue: Java 7 no longer supports `-d32` option. Which means you cannot use it to call 32-bit native libraries. And this breaks half the code. Hence, I simply cannot use Java 7 for development.

  • It is buggy. Our client was Norwegian. Obviously, he had a lot of files named in Norwegian. The issue: Java 7 could not detect files that were named with non-ascii characters (norwegian). (This was caught at the client side and believe me it hurts when you see this happen).  It was a bug which was solved but the damage was done.

And then it was corrected. But during testing, I realized that the DirectoryChooser of javaFX did detect them, and on user selection, weirdly returns null instead of file. I have submitted the bug. But again it was painful telling the client not to use files/directories with Norwegian characters. And I had no time building a custom component.

With all the above, it was still important to use Java 7. We wanted to build a cross-platform application and had no developer resources to write them separately for each OS. This made C/C++ out of picture. We did not want to use Adobe Flex (license & future maintenance issue). Not python, as no one in team knew it and we all were comfortable with Java.

We built on Java 7 because:

  • Apple does not give license to redistribute VM along with application. But my company people wanted to port it along. Hence Java 6 cannot be used.
  • We wanted to use JavaFX (awesome MVC UI toolkit) and not Swing. JavaFX looked promising (which actually is) and hence the decision was taken. But unlike windows, JavaFX build is only available from Java 7 on Mac. Hence we are stuck with it now.

The only positive was this small issue, which worked well on windows and not on Mac.

 

PS: There are a lot more and will keep adding up the list as time permits.

Tagged ,

Continuation….

This post is in continuation to the previous post – Co-variance and Contra-variance

Arrays in Java are Co-variant. Which basically means if “String” extends “Object” then “String[]” also extends “Object[]“.
Do you see  any pitfall’s above??

Lets look at the following example

Example – 1

String[] a = new String[]{"Hello","Hey"};
 Object[] b = a;//line 2
b[0] = new Integer(23);//line 3
a[0] =??? 

Now because arrays in Java are co-variant, you can do the above. a[0] now will refer to an Integer which is illegal. To prevent this line3 will throw an ArrayStoreException at runtime. Truly this should actually be caught at compile time. Thanks to co-variance of arrays for behavior. Because for some reason arrays were decided to be co-variant and they are reifiable, i.e. they retain the type information at runtime. Which allowed us to do line-2 but also gave us line-3. To prevent this, they manually had to check the type info for every access of array which again is bad and extra overhead. But why were arrays made co-variant?? Java founders wanted to have Generics right from day 0 (Generics were introduced in Java 5). Generics follow type erasure, i.e. the type is checked for compatibility and erased at compile time. More on it here. Due to lack of time, they had to postpone it. Now without generics a method like this would had been impossible. Unless Arrays were made co-variant.

Arrays.sort(Object[] ob)

The above method takes an object array and sorts them based on the Comparables. Hence a single method will suffice. Without covariance of arrays, they will be a need to write a separate method for each of the Data Type. Hence the decision.

Another consequence is that you cannot instantiate an array of a generic type. Nothing special for generics.

Example - 2
List<String>[] l = new ArrayList<String>[10]();//line 1
Object[] a = l;
List<Integer> l2 = new ArrayList<Integer>();
a[0] = l2;
l[0] = ??? //list of string or integer?

Hence line-1 is illegal. Though arrays with wild-cards are allowed i.e. List<?>. Why is left for user to analyse. Similarly, why Generics were decided to be invariant is again left to the user (Hint: Similar to the above example, try adding any animal (pig) to list of dogs) So now we have seen that writing onto co-variant mutable types always throws one or some other trouble. This gives gives rise to 2 questions:

  1. If Arrays were invariant, would it solve the purpose?
  2. If Arrays were left co-variant, what change can be made to make it correct? Has mutability has got anything to do with it?

Suppose Arrays were invariant, then the bugs would be detected right at compile time as raised in Example -1. Thus Invariance of Arrays looks to be the right way forward. Scala gets this right. Lets look at the second question. Suppose Generics below were co-variant. Lets consider immutable List for example. (By ImmutableList I mean, any addition to the list creates a new List by copying the previous elements and adding the latest element. Similar to CopyOnWriteArrayList just that addition returns a new List every time)

</pre>
<blockquote>
<pre>ImmutableList<String> a = new ImmutableList<String>();
ImmutableList<String> b = a.add("Hey");//line2
ImmutableList<Object> x = a;//line 3
ImmuabltList<Object> y = b.add(new Object);

In Line-2, adding a string returned immutable list containing only “Hey”. ImmutableList “a” still is empty. Now on adding Object in Line-4 gave a new List containing (“Hey” and an “Object”) to become “y”. The return type of list is “Object”. Hence it is safe at all the stages. As now “y” is a list of Objects containing a “Hey” and an “Object.

So it turns out that with immutability, co-variance can be used safely. Com’on Co-variance is such a lovely tool. It lets you do stuff like if A <:  B then it is handy to have C[A] <: C[B]. Isn’t it?

So remember [P2] (from last post). Preferably with rights co-variance can be trouble-some. Perfect for Reads just like returns in functions.

Interestingly Contra-variance is made for writes. More on it and beautiful implements of it in Scala in the next post.

Tagged , ,

Co-variance & Contra-variance

I have been working on Scala recently and it has been a pure pleasure. I am in love with the language. And Yes! it indeed is a beast.

Generics/Parametrization is something that has always kept me on the hook even after good amount of experience in it. Thanks to Scala and lectures by Martin Odersky, I am much better now. There have been some realizations on the go, but to explain that I will first have to explain some basics. Though theoretical and boring at many places, I promise this will improve your architect skills if understood well :)

I have planned the post in 4 parts:

  1. Brief overview of Co-variance & Contra-variance
  2. Above paradigm behavior in Java, Python, C#
  3. Overview and short-comings of Generics in Java. Design flaw with Arrays
  4. A detailed theoretical explanation with help of Scala

Before starting, there is one very important theorem called Liskov Substitution principle in object oriented world. This should always be behind your mind while designing API’s. It states-

Let q(x) be a property provable about objects x of type T. Then q(y) should be provable for objects y of type S where S is a subtype of T.

The above means- If “A” extends “B” (A <: B), then all the things that one can do with “B” should also be legal with “A”.

Co-variance & Contra-variance

 Wiki will give you a formal definition. Crudely:

  • Co-variance ( <: ) is converting type from wider type to narrow. In Java terms- ? extends E*
  • Contra-variance (>: )is converting type from narrow to wider. In Java terms- ? super E*
  • In-variant cannot be converted
* Where E is generic type declared and “?” is a wild card which crudely means “Anytype”
So for example:
class Animal { }

class Dog extends Animal{ }

class Example{

void run(){
 Dog dog = new Dog();
 getAnimal(dog);
 }

void getAnimal(Animal animal){
 .....
 }

}

In the above example, one is ultimately doing

Animal animal = dog

So you are converting a dog type (narrow) to Animal type (wider). This is contra-variance. The reverse of it is Co-variance. The above paradigm is also followed in C# and python.

For co-variance:

class A{
   Animal getCreature(){
    return new Dog();
   }
}
class B extends A{
   Dog getCreature(){
     return new Dog();
   }
 }

Check the return type of the getCreature function. getCreature() ultimately returns an Animal. But it returns type of Dog in class B. This is co-variance. The above is perfectly legal, because the code that invokes getCreature() of class A expects an Animal in return; which class B provides a Dog (ultimately dog is an animal. Vice-versa wont work, think!)

Invariance is when the type cannot be converted.

By default return types of functions in Java, C# and python are co-variant and argument types are invariant.

Points to Remember:

I will raise some points now and the explanation will be given by the end. The whole point of the post are the below points.

  1. Contravariant types can only appear in places, where you write something (i.e. parameter types) and covariant types can only appear in places, where you read something. [P1]
  2. Co-variance is only suitable when the data is immutable. More on it below. [P2]

I will first raise the issues caused in Java by not following [P1] i.e Arrays disaster for which Generics (type erasure) will be discussed. And then issues with [P2]. Finally improvements and how the issue was dealt successfully in Scala in the next post :)

Edit: Well some more examples to understand better:

</pre>
scala> class A[T](n:T)
 defined class A

scala> class B[+T1,-T2](n:T1, m:T2) //co-variant T1 and contra-variant T2
 defined class B

scala> class C[-T1, +T2](n:T1, m:T2) // contra-variant T1 and co-variant T2
 defined class C

scala>List(new B(1d,1), new B(2,"2"),new B("3",3d))
 res0: List[B[Any,Double with String with Int]] = List(B@67ef6bbf, B@178afde8, B@2623966b)

scala> List(new C(1d,1), new C(2,"2"),new C("3",3d))
 res1: List[C[String with Int with Double,Any]] = List(C@13f23e56, C@4da8a55, C@27afd4f0)

scala> List(new A(1d), new A(2),new A("3"))
 res2: List[A[_ >: String with Int with Double]] = List(A@6800bea8, A@8e53cf0, A@203b520)

For evaluating `res0`, compiler looks for a common super-type of all arguments.

  1. All arguments are of type B[something]. So type something has to be evaluated. type something has to satisfy variance bounds of B
  2. Since type T1 is co-variant, compiler needs to find a type such that “Int <: something & Double <: something & String <: something” and that can only be: Any
  3. Since type T2 is contra-variant, compiler needs to find a type such that “Int >: something & Double >: something & String >: something” and that can be: Int with Double with String

To confirm point-3, try doing below:

type Custom = String with Int with Double
def f[T](implicit ev: Custom <:< T){}

f[Int]
f[String]
f[Double] // all compile fine

Hence res0 is evaluated as type List[B[Any,Double with String with Int]]. res1 is evaluated in a similar procedure. Evaluating res2 is again interesting:

  1. All arguments are of type A[something].
  2. Since type T is in-variant, compiler needs to find a type such that it is the minimum-most possible in the hierarchy and satisfies “Int =:= something & Double =:= something & String =:= something“. This satisfies perfectly _ >: String with Int with Double Though such a list will be useless as you cannot do much with it. It is no better than Any

		

Cache

This is  the first part of series of posts I plan to put-up on effective memory-centric cache implementations.

Well I have been working and going through different cache implementations in Java. Below is a discussion on some of them. The aim is to choose the perfect cache for a respective scenario. Below I have tried to explain the basics of Java dependencies to keep it language independent.

Now in general it is found out that LRU (Least Recently Used) is preferable for most of the scenarios. And it can be very easily be implemented in Java by extending a beautiful data structure “LinkedHashMap” and overriding removeEldestEntry(final Map.Entry<A, B> eldest).  More on it here. But the issue with it is the size. How will you set the size of the cache? If the cache size is too small, it would not solve the purpose. If it is too large, it might eat away applications required memory.

Fixing the size of the cache indeed is tough. Ideally you would want a cache that scales according to the application. Under low memory consumption by application, one wont mind cache taking too much of memory. Under high memory contention, one would want cache size to decrease (though it will create performance problems as cache might be non-existent then. But it is atleast better than repetitive Full GC’s).

Hence I have been in search for an effective implementation of Cache which is:

  • LRU
  • Size variant depending upon the available Memory.

Though it is still work in progress, I will be discussing about the second option above in this post and the next. And a culmination of both in the third post.

Scaling of Cache

The aim is be to generate specifications based on quantitative analysis to for a perfect outlook to implement such a scenario.

To implement such a case, we need some support from the language which shall timely give us input about the available memory; Better if it can itself remove objects from cache. Luckily Java gives us this flexibility be providing us with a java.lang.ref package which provides limited communication with Garbage Collector.

For analysis here I have taken the code from from org.neo4j.kernel.impl.cache package of Neo4J graph database. I have slightly modified the code to generate statistics. The code is a property of NeoTechnologies and is used here only for educational purposes. It very well implements what we desire. The link to cache is here. Primarily it has 2 cache versions:

  • SoftRefCache
  • WeakRefCache
Both of them internally use ConcurrentHashMap data structure which hold a key-value pair. The difference in both above is that- SoftRefCache uses a SoftReference wrapper above the value  and WeakRefCache uses a WeakReference wrapper above the value pair in the HashMap.
The key is that – under different memory scenarios, the objects in cache automatically become Softly or Weakly reachable and ultimately becoming ready for garbage collection. Hence cache scaling according to needs.

Cache & Performance – In Worst Case Scenario

I have carried some tests to decide which version should be used under what scenario. The below is analysis under high contention where all different objects are repeatedly added to cache. In reality, this can only happen under heavy stress.

The code to exhibit the above can be found out at CacheTest.java inside cache.neo4j package. Roughly it is:

private void go() {
 SoftLruCache<EntityInteger> cache = new SoftLruCache<EntityInteger>("Soft Cache");
 test(cache);
 }

private void go2()
 {
 WeakLruCache<EntityInteger> cache2 = new WeakLruCache<>("Weak Cache");
 test(cache2);
 }

private void test(ReferenceCache<EntityInteger> cache) {
 for (int i = 0; i < Integer.MAX_VALUE; i++) {
 cache.put(new EntityInteger(i));
 }
 }

Below is individual analysis for each of the case. Though it is not true comparison of cache’s but rather SoftReference and WeakReference.
My configuration:

java version “1.7.0_05″
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) Client VM (build 23.1-b03, mixed mode, sharing)
Windows 7.
intel i3 processor.
JVM arguments: -server  -Xmx200M -verbose:gc -XX:+PrintGCDetails (Xmx means JVM can at max hold 200 mb of RAM)

SoftLruCache

Below is a graph of size of cache vs time.

The memory behavior:

softlru

 

* Above the memory result is not correct as profiler itself might be consuming decent amount of memory. Without loss of generality, once can assume that it is consuming about 80-100 mb.

And the log.

Results:

  • The memory expand to full 200mb allotted, after which results in repeated Full GC’s which frees up memory. During which, the cache size drastically reduces from ~1,700,000 to mere 400,000.
  • The above cycle is repeated periodically. With the cache size increasing to 1,400,000 and then a drastic fall to less than 100,000. In some cases to even below 1000.
  • On an average, around 75% of applications running time was spent in Garbage Collection.
Analysis:
You for sure will face your administrators wrath if you are going to use it in such a case. 75 % time on GC is just a straight kill. Basically cache fills up till there is no more memory available. Which yields in Full GC. A lot of memory is cleared up and the cache again starts filling up.
Actually people might not agree with me. Here the objects are added at a very very fast rate. So its no wonder that performance is so bad. Though yes waiting till there is no memory available to clear up might be a bad option (as it might effect applications functioning). But in cases where you know that the size of the cache might never grow such significantly high; And even if it does, if the objects are not added at such high rate later. It is actually might be a good option.
To verify the above, I modified the program as: As soon as more than 3,000,000 elements are added to cache, reduce the rate of addition to cache by 500 times. The results were:
  • Time spent on GC reduced to less than 50% and below.
  • Constant memory consumption of 160 MB. This is different from previous results.
  • Cache size hardly ever went below 1,000,000 which in a way is a good thing.

Next post will be the same analysis on Weak Reference. Later repetition for a more realistic scenario

References:

For more on java.lang.ref:

  1. http://docs.oracle.com/javase/7/docs/api/index.html

Another reason to love Neo4J

For one of the projects where Neo4J is being used, I have been working on implementing K-Shortest Path algorithm using Yen’s loopless improvised algorithm.

Now an issue in the algorithm is that; Say  in computing top 4 shortest path, what the algorithm does is that it when computing the 2nd shortest path; It removes arcs of the 1st shortest path from the graph. And so removes the 2nd while computing 3rd shortest path and so on. And later in the end, it adds back all the removed arcs.

The problem now is that when one thread is computing a kth shortest path, other threads cannot use the algorithm as the graph is not in its complete shape. i.e. because first thread in the computation has temporarily removed some arcs, and the same graph cannot be used for any other purpose till those arcs are again added. Hence multiple instances of the algorithm cannot be used in parallel. One can solve the problem by  instead of removing them, have a flag in the relationship for a particular par of nodes. Flag will be cleared once the computation is over. But when running many instances in parallel there shall be many  such key value pairs in the relationships.There is an extra load in teh end to clear the flags.

Now this would have been a problem while using any other graph library. But the transaction model in Neo4J solves the above issue.

Here is the crazy part which I am unsure of if at all its a good practice:
I will have the code inside a transaction, but not let the transaction succeed. i.e. I will implement the algorithm and remove the respective arcs in the process, when I have the desired output, I will call the transaction a failure. hence the arcs are never removed and the same instance of the algorithm can be used. Further there will be no need to later add those arcs as in reality they were never removed.

This is so cool. Though there is another issue with it as pointed out by Mattias Persson of NeoTechnologies-

The only problem I  see here is that removing relationships will grab write locks which will be held throughout the transaction and prevent any other shortest path algo to remove those relationships as long as the transaction is alive.

How big this issue is, I do not know. But if carefully crafted should behave well. In any case its way better than removing the arcs and halting all the other computations on the graph.

More on it here.

 

New Development:
Another reason why I use N3o4J is to reduce the development time.  Reason being had I used some graph service at expense of memory, rebuilding the graph even to test small things takes a lot of time. With Neo4J I dont have to worry about that. Though with JUNG the problem can be solved by using a custom class loader and using the test’ as plugins hence saving time. But all this requires effort.

 

 

MIT IMI

In the summer of 2011, I (in a team of five) had participated in Massachusetts Institute  of Technology, Indian Mobile Initiative – The NEXT BIG IDEA Challenge. We had constructed an App cum service called KickStart. The below are the certificates with official MIT hologram :)

 

 

 

 

PS: The article which appeared on KickStart in Navhind Times