Twitter

Saturday, November 27, 2010

Ever wondered why google's home page is that quick ?

Because they cheat ... Read more here :)

http://blog.benstrong.com/2010/11/google-and-microsoft-cheat-on-slow.html

Monday, May 17, 2010

Parallel GC Vs Concurrent GC

Although the words "Parallel" and "Concurrent" are used synonymously, they are not the same. Ok, not in the context of Java garbage collection at least.

Parallel collector belongs to the family of pure-stop-the-world collectors. That means, GC won't kick in until JVM runs out of memory in the old-generation part of the heap. And when it starts all the mutator (application) threads will stop running.

Where-as the concurrent collector (CMS) runs mostly-concurrent along with the other mutator (application) threads, and tries to free up memory so the mutator threads can keep on running. Nevertheless it also stops-the-world for 2 very short time periods called initial-mark and remark phases.

I am not going to explain the internals of common GC techniques. For that purpose, please read this. But I am going to show you visually what is the difference between these two collectors in terms of application throughput and responsiveness.

In the test program (download here) there are 3 mutator threads that continuously produce strings and put them into a map. Every 2 seconds another thread clears the map, i.e. all the cleared string objects are garbage and can be garbage collected. Both the test runs were of 60 seconds each. The tests are carried on a 16 core machine.

During the first run parallel-old GC is used and the resulting VisualGC output is shown below:
java -XX:+PrintGCTimeStamps -verbose:gc -Xmx2G -Xms2G -XX:+UseParallelOldGC GcComparison


During the second run CMS collector was used and the resulting VisualGC output is shown below:
java -XX:+PrintGCTimeStamps -verbose:gc -Xmx2G -Xms2G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=30 GcComparison


Let us look into some graphs.

GC Time (3rd from top):
- Parallel: 32.4s for 59 collections. 55 (young gen) + 4 (old gen).
- Cms: 28.01s for 104 collections. 78 (young gen) + 26 (old gen).
- Even though the number of collections for parallel < cms, the total time for which the application threads were stopped for parallel > cms.
- Note that the light green area > parallel. This means cms was running for more time than parallel collector. But even then the overall time consumption was less because it does the collecting process concurrently with the application threads.

Eden Space (4th from top):
-There is not much difference here.

Old Gen (7th from top or 2nd from bottom):
-Parallel: 4.2s for 4 collections. 4 Full GC. ie. application threads were stopped for 4.2s.
-Cms: 1.1s for 26 collections. No Full GC. ie. application threads were stopped only for 1.1s.
-Also you can see that cms collector works concurrently (gradual rise and fall) whereas parallel stops-the-world(4 spikes).
-The height of the gradual rise and fall of cms can be adjusted with-XX:CMSInitiatingOccupancyFraction option. This options tells at which point cms collector should start working.

Hence it is better to use:
1.cms collector when
-you have high number of cpus
-your application demands short pauses
-you have more memory

2.parallel collector when
-you have less number of cpus
-your application demands throughput and can withstand recurring long pauses
-you have less memory

Wednesday, May 12, 2010

Firefox Vs Chrome - Screen area

Some of my friends and colleagues say that Chrome is occupying less screen area than Firefox (ie. the menubar + navigationbar + etc). I think it is true when you run firefox with default settings. But when you tweak a bit you get more browsing space in Firefox than in Chrome.

Have a look in the following picture and decide yourself. I think that the difference is not that much as the others exaggerate.

My vote is always for Firefox.

Browse more - learn more :)


Tuesday, April 6, 2010

A peek into JDK7 - java.util.Objects class

There are some nice(?) utility methods available with JDK7.

The java.util.Objects comes with nearly a dozen of very simple static utility methods. But at least I don't see any big usage of these methods ;), since most of them are available (in some other form) in the older JDK versions.

In the following we will see some sample code for 2 of the newly introduced methods.
You can have a look at the rest of them here.

equals(Object a, Object b) - Returns true if the arguments are equal to each other and false otherwise.
hashCode(Object o) - Returns the hash code of a non-null argument and 0 for a null argument.

Integer obj1 = new Integer( 1 );
Integer obj2 = new Integer( 1 );
Integer objNull = null;
System.out.println( Objects.equals( obj1, obj2 ) );
System.out.println( Objects.equals( obj1, objNull ) );

prints the following...
true
false

One can get the hash code of an object using (Line 1) Objects.hashCode(Object o). But i don't know what is its purpose since we already have Object.hashCode() that does exactly the same(Line 2).

Integer obj5 = new Integer( 5 );
  //(Line 1)new one
  System.out.println( Objects.hashCode( obj5 ) );
  //(Line 2)this is the already existing method
  System.out.println( obj5.hashCode() );
Prints the following
5
5

Let us see whether the Java guys add some more utility methods before the release of JDK7.

Stay tuned.

Wednesday, March 31, 2010

Java - Cache (frequently used immutable objects) using ConcurrentHashMap.

One of the better way to reduce the impact of garbage collection in any Java application is to reduce the number of newly created objects thereby reducing the amount of garbage produced.

Let us take an example of university admission application. There are two possible titles (Masters and Bachelors) and two possible majors (Computer and Electrical). Let us name the combination of title-and-major as Degree.

Assume that there are thousands of applications submitted every day. Each application will be for the particular combination of the above mentioned Degree. eg. Masters-Computer, Master-Electrical, Bachelor-Computer...

Instead of creating new Degree object for each and every application, we can cache the Degree object. When a particular Degree object is demanded, look for that in the cache. If the cache doesn't contain that, then create it, put it inside the cache and then return it. Since the cache is built using ConcurrentHashMap, it is also thread safe. i.e, Even when there are more than 1 thread running the Degree.valueOf() method for a same set of "title" and "major" strings, ONLY one instance of that particular Degree instance will be constructed and will be used by the threads.

It is clear that we will produce less garbage using caching. But is there is another add-on advantage. It is easier to check whether two Degree objects are equal using reference equality (==) rather that equals() method. ie. we can do Master_Computer==Master_Computer_ANOTHER instead of Master_Computer.equals(Master_Computer_ANOTHER). On my machine == is 9 times faster than equals(). Thereby saving some CPU cycles.

Have a look in the following code on how to build a cache of Objects (Degree) having two String properties. Complete code here.

The main method is the Degree.valueOf() method. Where all the caching is done.

This is just an example. This technique can be used in the telco application servers running with load in terms of 1000s of TPS. Eg. Consider a header/value type of protocol. Instead of creating a particular header 1000s of times per second we can reuse the existing cached header thereby relaxing the CPU and RAM for other useful processing.



......

        Degree MASTER_COMPUTER = Degree.valueOf(MASTER, COMPUTER);
        Degree MASTER_ELECTRICAL = Degree.valueOf(MASTER, ELECTRICAL);
        Degree BACHELOR_COMPUTER = Degree.valueOf(BACHELOR, COMPUTER);
        Degree BACHELOR_ELECTRICAL = Degree.valueOf(BACHELOR, ELECTRICAL);
......

        // we ask the cache for all the possible present values that were created by the above lines.
        // Therefore it returns the existing values; Nothing is created anew.
        System.out.println("\nNo more new constructions for existing entries...");
        Degree MASTER_COMPUTER_1 = Degree.valueOf(MASTER, COMPUTER);

......
        // now ask for something that is not there; cache will create it anew and caches them
        System.out.println("\nNew constructions for non-existing entries...");
        Degree.valueOf("Phd", "Computer");

......    
        // one more advantage of caching : we can compare the reference of two objects instead of checking their equal()ity
        // this is because all requests to Degree.valueOf(MASTER, COMPUTER) always return the very same object
        System.out.println("\nFaster equality check...");

        

Output
New Degree object: Master_Computer
New Degree object: Master_Electrical
New Degree object: Bachelor_Computer
New Degree object: Bachelor_Electrical

No more new constructions for existing entries...
Returning existing instance of Master_Computer

New constructions for non-existing entries...
New Degree object: Phd_Computer

Faster equality check...
Checked with equals().
Checked with reference equality.

Thursday, March 11, 2010

A peek into JDK7 - ForkJoinTask example (RecursiveAction example, Forkjoinpool example)

Consider tasks like sorting an array, doing a complex math on each and every element in an array. Eg. we want to increment by 1 all the elements in the array {0,1,2,3,4,5,6,7,8,9}.

The simplest way is to loop over the entire array and do array[i]=array[i]+1. However this will run in a single thread.

But what if we can take advantage of multi-core CPUs, i.e. break the array into two halves and give it to two threads. So that the first thread operates on the left-half (thereby modifying the array entries to {1,2,3,4,5,......}) of the array whereas the second thread operates on the second-half of the array (thereby modifying the array entries to {......,6,7,8,9,10}).

The ....s means that the corresponding thread doesn't know what is there. It doesn't have to care. It is not part of its job!

This is where the JDK7's ForkJoinTask comes into the play. We give a complex task to be executed. Along with that we also have to specify a threshold. If the task's size is greater than the threshold then the task divides itself and fork()s them and wait for them to finish by join()ing. Hence the name ForkJoinTask. There are two implementation of ForkJoinTask - RecursiveAction and RecursiveTask.

Here is an example. The applyAlgorithm() is the CPU intensive method where each element in the array is modified. When the array is bigger than 5000 (threshold), then the array is divided into two and the two new arrays are handled in parallel by the threads available in the ForkJoinPool.

Following are the results from 2 different machines. One on a 16 core machine and another on a dual core machine. In both the cases the parallel execution is well ahead the single threaded numbers.

You can download the java code here.

1. On a 16 core machine
myServer $ java -cp test/jsr166y.jar:. ForkJoinAlgoritmicTask
Number of processor available: 16
Array size: 10000000
Treshhold: 5000
Number of runs: 5
 
Parallel processing time: 198
Parallel processing time: 69
Parallel processing time: 64
Parallel processing time: 61
Parallel processing time: 59

Number of steals: 579

Sequential processing time: 438
Sequential processing time: 437
Sequential processing time: 436
Sequential processing time: 436
Sequential processing time: 437

2. On a 2 core machine
muruga-Study$java -cp jsr166y.jar:. -Xms1G -Xmx1G ForkJoinAlgoritmicTask
Number of processor available: 2
Array size: 10000000
Treshhold: 5000
Number of runs: 5

Parallel processing time: 227
Parallel processing time: 206
Parallel processing time: 226
Parallel processing time: 203
Parallel processing time: 208
Number of steals: 12

Sequential processing time: 385
Sequential processing time: 385
Sequential processing time: 385
Sequential processing time: 385
Sequential processing time: 385

Wednesday, March 10, 2010

Adapting netbeans default license template.

After participating in the JavaEE6 codecamp, somehow I got in love with Netbeans.


But whenever I create a new java file, the editor kept on adding the following default license template to the files.

/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/

If you would like to avoid this then go to "Tools->Templates". The "Template Manager" window will pop up and there expand the "Licenses" folder. Select "Default License" and click "Open in Editor" button (at the bottom).

There you can customize the license text or you can just delete the contents.

Hope this helps someone.

Thursday, February 25, 2010

Optimistic CAS Vs Pessimistic synchronized...

As Brian Goetz says, be optimistic and not pessimistic.

Exclusive (synchronized) locking is pessimistic and CAS (compare-and-swap/set) used by AtomicInteger, AtomicLongs etcs are optimistic.

Synchronized (locking) is pessimistic in the sense that we fear that something can go wrong, so lock our stuff and do our work.

CAS is optimistic in the sense that we do some work optimistically and then try to commit our work. But if another guy has did what we did, we re-try to do our work once again.

More here in DeveloperWorks article on Going Atomic.

I wanted to see how CAS outperforms synchronized stuffs and the following code was the outcome.

Here we increment two variables with two different tasks.

The first task PlainIncrementTask increments the plain int of the Holder class using the custom-built synchronized getPlain() and incrementPlain() methods.

The second task AtomicIncrementTask increments the AtomicInteger of the Holder class using the AtomicInteger's incrementAndGet() and get() methods.

You can download the code here.

......
public int incrementAtomic() {
return atomicInteger.incrementAndGet();
}

public int getAtomic() {
return atomicInteger.get();
}

synchronized public int incrementPlain() {
return ++plainInteger;
}

synchronized public int getPlain() {
return plainInteger;
}
.....

And the results (of course) favour AtomicInteger.
mbp $ java Holder
Time taken for PLAIN: 1010 ms
Time taken for ATOMIC: 171 ms
mbp $ java Holder
Time taken for PLAIN: 1045 ms
Time taken for ATOMIC: 169 ms
mbp $ java Holder
Time taken for PLAIN: 1043 ms
Time taken for ATOMIC: 172 ms


So it is always better to use Atomic.* for counters and so on.

And never try to outperform the Java Gurus. ;)