Saturday, May 31, 2014

Continuous delivery needs faster server startup. Could #Tomcat #Spring applications cope with that?

I am reluctant to accept th emyth that Java web applications don't fit well in agile environments. The main criticism is the fact that unless you use a commercial tool, a plugin architecture or an OSGI modularized app you will end up having to restart the server in order to deploy the latest application version.

But what about if actually the time that the application would take to load would be few seconds? Will a user differentiate 10 seconds delay originated from a slow database access or backend channel web service request in comparison with a server restart? The answer is: No, the user experiencing a delay does not really care about the nature of it. If we maintain a proper SLA this will be simply "a hiccup".

Even the fastest web services out there would be slow for certain random actions. You will never know if Gmail is taking longer because of a quick server restart for example. As long as you can get what you need a wait of few seconds won't natter.

If the definition of "Done" is "Deployed to production" developers will take more care of deployment performance. Waiting long time for a deployment, means disruption. Business will never approve minutes of downterm. On the other hand if you increase your WIP limits you will slow down, quality will suffer. This bottleneck becomes a great opportunity for improvement as expected. You have reach a Kaisen moment.

There is a need to work on deplopyment performence. Without addressing that important issue you will constantly delay deployments, you will constantly get more tasks piling up, the consequences will be terrible and will only be discovered if you are actually visualizing the value stream. You need to tune your server and your application for it to load faster. A restart should be a synonym of a snap.

In a typical Spring application you will proceed as with any other application. Logs are your friends. Go ahead and turn on debug level. Search for "initialization completed" and confirm how much time this process takes. In production you better use lazy initialization:
<beans ... default-lazy-init="true">
This contributes of course to the overall "Server startup". But there is more to do. Check this out.

It should become evident from simple log inspection what is the culprit for a slow server startup. Let us review the below example:



The first clear issue is that ”Initializing Spring root WebApplicationContext” takes 36 seconds which is almost half of the time the whole server takes to startup. The second issue is that “Initializing Spring FrameworkServlet” takes 14 seconds which is a moderate 10% of the whole server startup time. Spring tuning is needed in this example.

What about the other 40% of the time? Servers also can be tuned. For tomcat there is a lot we can do. For example, like explained in the link, if you find an entry for "SecureRandom" in catalina.out most likely your server is spending valuable seconds generating random patterns for use as session id. Using the below setting saves you those seconds as explained in the link:
-Djava.security.egd=file:/dev/./urandom
I found myself saving ten seconds by adding the attribute and node shown below. The explanation again can be found in the provided link:
...
<web-app ... metadata-complete="true">
    <absolute-ordering />
...
Eliminating unnecessary jars demands to list them all first. Note that I am sorting them on purpose just in case we are using one in the app that is already included in the container or if we are using two versions of the same (which is impossible if you are using a plugin to check for suplicated anayway):
find /opt/tomcat/ -name "*.jar"|sed 's#.*/##'|sort
Then find which classes are inside the those you are unsure if you need or not. Now you do need the whole path so let us pick as an example jtidy-r938.jar which hibernate includes as dependency. Here are the relevant commands which you will need to adapt according to your machine paths:
find /opt/tomcat/ -name "jtidy*.jar"
jar -tvf /opt/tomcat/myapp/ROOT/WEB-INF/lib/jtidy-r8-20060801.jar
find /home/dev/workspace/ -name "*.java"|xargs grep "org.w3c.tidy"
In my case after saving 9MB worth of jar files I saw no startup time savings for the specific project I picked for this research.

I saw under 10 seconds savings after using the below in the app web.xml as suggested in the link:
<web-app ... metadata-complete="true">
        <absolute-ordering />
        ...
The use of the special attribute startStopThreads in server.xml#Engine should have no effect if you are running only one application however I saved some seconds I believe after I turned it on:
<Engine ... startStopThreads="0">

Friday, May 30, 2014

Apache mod-proxy should allow for a retry policy before sending back the response to the client

Apache mod-proxy should allow for a retry policy before sending back the response to the client. There is a failonstatus setting which by definition:
failonstatus - A single or comma-separated list of HTTP status codes. If set this will force the worker into error state when the backend returns any status code in the list. Worker recovery behaves the same as other worker errors. Available with Apache HTTP Server 2.2.17 and later.
However as soon as the status code is returned for the first time by the backend the proxy sends it back to the client. This behavior should be configurable with for example SilentOnStatus which works just as FailOnStatus but it prevents feedback to be sent back to the client.

As it stands our only resource is to create an ErrorDocument and include some logic to automatically retry again while communicating the user that a recovery from the error is coming soon. For example you could redirect to the domain root after five seconds with Meta Refresh: This is a feature needed to make sure users do not get an error message when an application server is restarting and so unavailable (500) or when it is available but at a point where the application has not been loaded (503).

Thursday, May 22, 2014

Mod proxy suddenly failing with 500 / 502 errors because of self signed expired certificates

Apache was returning 500. From logs: The open ssl self certificate validation would say: So it will not state the classical "Verify return code: 10 (certificate has expired)" when indeed the certificate is expired. That is why you better check for expiration directly:

Thursday, May 15, 2014

Talend Open Source needs Dynamic Schema for Delimited files

Talend Open Source needs Dynamic Schema for Delimited files. Only the commercial version allows dynamic schema.

We need to build a component called tFileInputDelimitedExtract. You could use tFileInputCSVFilter as a reference for the implementation. Unfortunately I don't have the time at the moment for this implementation but at least let me enunciate the specifications for it in case someone decides to go further with the implementation. It could be a good project for someone willing to learn talend component creation for example. At the moment a quick "hack" for new unexpected inner columns would be to use 'cut' to exclude them. Below we remove the 7th unneeded column from a pipe delimited file:

Wednesday, May 14, 2014

Where did your Java architecture go?

Where did your Java architecture go?. Classycle might have good answers for you. It is easier to configure than jdepend which was the de facto open source cycle analyzer before and which you might still want to check out.

Here is how to analyze the spring core jar for example. Even though the below uses plain command line, an eclipse and maven plugins are available at the moment so you might want to check those out. Specially you should build ddf files to enforce your architectural layers and make sure the build fails if it is violated.

Subversion anonymous access for just one directory

In the apache configuration file for the "Location" directive use "Satisfy" before "Require". Note that you might have a second "Require" directive below a "LimitExcept", make sure you *also* use the "Satisfy" there, for example:

Friday, May 09, 2014

Is TDD dead?

Is TDD dead? The question drove today the hangout between Kent Beck, Martin Fowler and David Heinemeier Hanssom.

David is challenging the TDD supporters stating that doing TDD feels most of the time unnatural and not enjoyable. While Kent and Martin agree that TDD is not the perfect solution to resolve all problems they argue it has had a tremendous value for many projects they have had in their hands.

Probably Test Driven Development is not a good technique for all projects, however Test Driven Delivery is. I mean you would never come up with "continuous delivery with high quality" if you do not think about how would you test the feature you are about to ship up front.

Have you ever wondered why so many development teams state exactly the same "Business does not know what they want"? Probably if Business would think about how to test their idea once it is implemented they would not ask for unnecessary features and forget about important ones.

Have you ever wondered why the defect ratio is making impossible for the team to deliver a feature in less than a week? Perhaps if not only user stories but a clear acceptance test criteria (and test cases derived from it) would have been provided the developer would have been automated them because of course the developer is lazy and will not spend time testing manually in two, three or four different environments.

I would say Test Driven Delivery is very much alive. Is the enforcement of Test Driven Development not good?, probably yes if it is an imposition and not a necessity. Velocity and enjoyment cannot be increased at the expense of business value creation.

"In what ways can a desire to TDD damage an architecture?" is the question Martin proposed to be answered. Show us the numbers for a conclusive answer.

There is definitely a way to go from few features delivered once a month to increasingly delivering new features at a quicker pace to achieve multiple daily deployments. That cannot be achieved without the confidence Kent is advocating for.

Make sure issues are required to come with test cases up front, ideas are required to come with acceptance criteria up front and make sure the tests run before the feature is considered delivered.

If Business complains about too much time being spent on testing then keep a backlog of all the acceptance criteria test cases that can be manually followed but were not yet automated, measure the defect ratio AND the cycle time to deliver features (not bug resolutions). Switch back to providing the tests and measure again. The numbers should show that over a period of 3 months the team is able to deliver more features when providing test automation. But ultimately it will demonstrate that having test procedures documented is the very first step to deliver beautiful software that just work as intended, with no more and no less than what is actually required to make money. IMO Quality is the most important non functional requirement.

Thursday, May 08, 2014

NFS extremely slow in VMWare Solaris guest

I had to investigate an issue related to slow NFS writes from a VMWare Solaris VM.

To debug protocols issues you need to use of course TCP packet sniffers. So I started with the following test for Solaris: Basically we create a 5MB file and transfer it via NFS. The file was taking two minutes to be transferred. The result from /tmp/capture uncovered a lot of DUP ACKs: From a Linux box we then run something similar: And then I confirmed it the write went fast and with no DUP ACK. After we shipped the issue to Infrastructure they found out the culprint to be the usage of a conflictive network adapter in VMWare. Using vmxnet3 network adapter looks to be the right option when it comes to supporting NFS traffic. No DUP ACK anymore.

Who should define Usability? Ask your users

Who should define Usability? Ask your users. They know better than anybody else.

Even when you have an argument about a backend implementation to deliver a feature try to think about the final impact it will have in UI and UX and then simply ask your users if in doubt, their responses will drive you to come up with the best and simpler approach. Always know *why* you are doing what you are doing.

Administrators should not be able to login from the wild for security reasons

Administrators should not be able to login from the wild for security reasons. This is something Unix and later Linux got right up front. If you want to become a super user or administrator you need to do so after you have gained access to the target system. You still see people doing all kind of stuff to overcome this "limitation". Don't do it!

Nowadays everything needs to be accessible from everywhere, JSON services feed Web Applications and native mobile applications. The trend will continue with the Internet Of Things (IoT), wearables, you name it. But we cannot forget about the basics: An application administrator should not have access to the application from the wild. In fact several other roles should better be restricted to have access to the application only from internal networks. Exposing too much power publicly (even if strong authentication and authorization mechanisms are used) is a vulnerability that we can avoid if we are willing to sacrifice usability for privileged accounts.

The Administrator does not need the same level of usability as the rest of the users. Higher privileged accounts might not need them either. Be wise about IP authorization.

Disclosure of ID data or Predictable ID format vulnerabilities

Disclosure of ID data or Predictable ID format vulnerabilities are considered low risk. In fact you can search around and you will find not much about it. For example most folks will highlight the advantages of using UUID versus numbered IDs when it comes to load balancing but few will acknowledge the security issue behind the usage of predictable numbers.

Don't be misled by risk classifications, these vulnerabilities can be serious and could cost companies their mere existence.

I hear statements like "well, if the account is compromised then there is nothing we can do". Actually there is a lot we can do to protect the application against stolen authentication. Double factor authentication is one of them which many times is associated to just the authentication phase but which can be used also as added authorization protection. Sometimes it is just about compromises with usability.

Disclosure of ID data is about listing views. A list should never provide sensitive information. If you want to access such thing you should go an extra step and select first the entity to see that information in the details page only. However there is little protection on doing that. The IDs are still in the list view and from those each detail view can be retrieved. Avoiding listing pages that lead to sensitive information sounds like the only possible defense but still a difficult one to sell. IMO listing pages should exist only for those knowing what they are retrieving, for example records should be originated only when providing keywords like names, addresses, known identifiers, etc.

Predictable ID format is about detail and form views. These types of views will demand the use of an ID. If that ID is predictable like an incremental number then someone can easily pull the sensitive data for all IDs. If your current model uses sequential, general numeric IDs or even symmetric encrypted IDs you should consider using a map from those to a random stored value. You could achieve this if you generate a random UUID per real ID and store it in a mapping table. You can then expose to the user just the UUID while still persisting the real ID.

Defense is to be practiced in depth. Even if the account is compromised you can still avoid a list of sensible information across your client base to be accessible from the wild.

Monday, May 05, 2014

Using snoop or tcpdump for NFS troubleshooting

Create a test file in Linux or Solaris

Solaris killall command kills all active processes rather that killing all processes by name

Solaris killall command kills all active processes rather that killing all processes by name. This is confusing for those more used to Linux as the command killall in Solaris as per man pages "kill all active processes" but in Linux you read "kill processes by name". Use something like the below in Solaris:

On Defense in Depth: Web Application Security starts at checking the source IP

On Defense in Depth: Web Application Security starts at checking the source IP. Even if you have firewalls and proxies in front of your application server you must have access as a developer to the original IP for the user and the URLs managing private information must be restricted to Intranet use.

Let us say for example that you have prepared a report with a list of users with some sensitive information (like their real home addresses) for certain high privileged roles only. Let us supposed this has ben exposed not only in your intranet web front end but also on the one facing the outside. Right there your issues start. Now, the user can access this information from outside the company premises which means it will be of public knowledge if the user session is compromised.

However if you have designed your middle tier to check for the source IP the user won't be able to access the service from outside even if the functionality leaked for whatever reason.

It is then crucial that all sensitive information related HTTP endpoints are identified. Those should not allow external IPs. It is also crucial to inspect your logs and confirm that you are getting the real IPs from the users that are hitting your system.

Use Defense in Depth concepts when building applications.

Friday, May 02, 2014

Recreating accidentally deleted vfstab in Solaris

So you have accidentally deleted vfstab in Solaris? You should look into /etc/mnttab: You can recreate /etc/vfstab out of it but you will need some understanding of the different fields. Or you can always look at a similar machine for guidance. For the above in /etc/vfstab we will end up with: Just run 'mount -a' to verify everything will mount correctly. Good luck!

Install sudo in Solaris

Make sure you have opencsw pkgutil installed: Installing sudo is easy:

List all hidden files in a directory

Thursday, May 01, 2014

Use PDF Bash Tools when your BI tooling like Talend is not enough

Use PDF Bash Tools for quick pdf manipulation from command line. Ghostscript and xpdf, both open source are a great combination to get the most difficult PDF transformations done.

If your BI Framework / Tooling does not have good solutions for processing pdf files ( like it is the case of Talend ) then you can leverage your old friend, the shell and in specific bash. Simple and effective.

Followers