Friday, May 31, 2013

strace and java - Talend JVM troubleshooting

While you could use jstack I always find myself preferring my good all friends strace or truss when things get really difficult. Ultimately system calls will reveal way more of what you need. You actually find interesting problems in your code as well like unnecessary disk and network access ;-)

Suppose you have a Talend java job which is after all a shell script wrapping a java command. Such job would look like:
jobname/jobname/jobname_run.sh --context_param 'p1=v1' --context_param 'p2=v2'
To trace that command you would create a test.sh, add the line above followed by an ampersand to run the command in the background and the command below after:
sudo strace -F -p`/opt/jdk/bin/jps|grep jobname|cut -f1 -d ' '`
All you are doing is making sure your "jobname" java command is traced so far. Now the final step would be to send to a file the output of that trace:
./test.sh 2>&1 | tee ~/strace-talend-ti.log
As an example just so you have an idea the talend job was not saying so much, just that it could not open a file:
Exception in component tFileTouch_2 java.io.IOException: No such file or directory at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:947)
But the name of the file was missing. Of course inspecting the source code could help you getting an idea, but what about business rules? The file name might be a complicated string to be determined without realtime debugging. Now you start thinking, I just need to put a trace from my talend job so I log the exact file name. However this might be production and your release cycle is prohibiting at the moment any quick traces without compromising other deliverables. Tracing at system level will give you that information:
[pid 20946] open("/path/to/tmp.file", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 ENOENT (No such file or directory)

No comments:

Followers