Parallel collector belongs to the family of pure-stop-the-world collectors. That means, GC won't kick in until JVM runs out of memory in the old-generation part of the heap. And when it starts all the mutator (application) threads will stop running.
Where-as the concurrent collector (CMS) runs mostly-concurrent along with the other mutator (application) threads, and tries to free up memory so the mutator threads can keep on running. Nevertheless it also stops-the-world for 2 very short time periods called initial-mark and remark phases.
I am not going to explain the internals of common GC techniques. For that purpose, please read this. But I am going to show you visually what is the difference between these two collectors in terms of application throughput and responsiveness.
In the test program (download here) there are 3 mutator threads that continuously produce strings and put them into a map. Every 2 seconds another thread clears the map, i.e. all the cleared string objects are garbage and can be garbage collected. Both the test runs were of 60 seconds each. The tests are carried on a 16 core machine.
During the first run parallel-old GC is used and the resulting VisualGC output is shown below:
java -XX:+PrintGCTimeStamps -verbose:gc -Xmx2G -Xms2G -XX:+UseParallelOldGC GcComparison
During the second run CMS collector was used and the resulting VisualGC output is shown below:
java -XX:+PrintGCTimeStamps -verbose:gc -Xmx2G -Xms2G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=30 GcComparison
Let us look into some graphs.
GC Time (3rd from top):
- Parallel: 32.4s for 59 collections. 55 (young gen) + 4 (old gen).
- Cms: 28.01s for 104 collections. 78 (young gen) + 26 (old gen).
- Even though the number of collections for parallel < cms, the total time for which the application threads were stopped for parallel > cms.
- Note that the light green area > parallel. This means cms was running for more time than parallel collector. But even then the overall time consumption was less because it does the collecting process concurrently with the application threads.
Eden Space (4th from top):
-There is not much difference here.
Old Gen (7th from top or 2nd from bottom):
-Parallel: 4.2s for 4 collections. 4 Full GC. ie. application threads were stopped for 4.2s.
-Cms: 1.1s for 26 collections. No Full GC. ie. application threads were stopped only for 1.1s.
-Also you can see that cms collector works concurrently (gradual rise and fall) whereas parallel stops-the-world(4 spikes).
-The height of the gradual rise and fall of cms can be adjusted with-XX:CMSInitiatingOccupancyFraction option. This options tells at which point cms collector should start working.
Hence it is better to use:
1.cms collector when
-you have high number of cpus
-your application demands short pauses
-you have more memory
2.parallel collector when
-you have less number of cpus
-your application demands throughput and can withstand recurring long pauses
-you have less memory