Synchronized block performs 100x faster than Unsynchronized one

Like everyone, in the books and blogs, I have heard a lot about how expensive high context switches can be till I personally experienced it recently. The below part is again quite interesting because synchronized code performs must faster than the unsynchronized code (120 times precisely consistently for about 30 runs).

I have been working since some time in developing my own library very similar to Looper and Handler in Android. It has many other functionality which will probably be covered in my next post.  Anyways it is 80% completed now and since last week I have written many test programs and have profiled it to search for any data and race conditions. Its been working well but felt it could be more optimized ( beware: premature optimization is always bad). The below is small snippet of my code:

 

private static final ConcurrentHashMap<Double,Boolean>
  mapBoolean =   new ConcurrentHashMap<Double, Boolean>();
private static final ConcurrentHashMap<Double,LinkedBlockingQueue<Runnable>>
  map  = new ConcurrentHashMap<Double, LinkedBlockingQueue<Runnable>>();

protected static <T> Future<T> execute(final Double id,Runnable runnable,
                                                                   T result)
    {
        if(runnable==null)
            throw new NullPointerException("Callable is null")
         //check if runnable is null
        synchronized(id)// line 10
        // id in this case is the same for all thread accessing this object        
        {
            if(mapBoolean.get(id))//line11,check if there is a call for shutdown
            {
                setPollTime(0);
                throw new RejectedExecutionException();
            }

            RunnableFuture<T> ftask = new FutureTask<T>(runnable,result);
            LinkedBlockingQueue<Runnable> queue = map.get(id);
            queue.add(ftask);
            return ftask;        
        }       

    }

For this post, it is only the line 10 we are concerned about. ‘id’ in this case is the same for all thread’s accessing this object.

Anyways, when I profiled it, the result was positive but just for the fun I decided to modify the code as:

protected static <T> Future<T> executeLoosely(final Double id,                                                  Runnable runnable,T result)
    {
        if(runnable==null)
            throw new NullPointerException("Callable is null")
        //synchronized is not there

            if(mapBoolean.get(id))//check is there is a call for shutdown
            {
                setPollTime(0);
                throw new RejectedExecutionException();
            }

                RunnableFuture<T> ftask = new FutureTask<T>(runnable,result);
                LinkedBlockingQueue<Runnable> queue = map.get(id);  
                queue.add(ftask);
                return ftask;
    }

ie i basically removed the synchronized(id) part from my previous code. Because atomicity was not that important, it should still work fine. But when I profiled it, I got the below results:

Comparision between block with synchronization and unsynchronized one.

What it means is that the unsynchronized code is around 100 times slower  than the synchonized part. This is absolutely weird. Initially it blew my mind off because I have never ever seen such an instance before. When I profiled probably for about 30 times, I figured out that the reason is something else.

For testing, I made a program where I generate 600 threads . Each thread among 600 will in-fact call both of the above methods 20 times each. By using a cyclic barrier I made sure that all the thread’s start at the exact time. That means each of the above method will be called 12000 times in a very very short period of time. Hence there is high contention among the threads to gain access to the synchronized part in execute(…). And obviously there is no contention to gain ‘id’ lock in executeLoosely(..).

But still under such high contention, execute performs much better than executeLoosely().

I will cover the analysis of the above code in my next post.

Till then Happy thinking :)