Saturday, June 25, 2011

Java EE application servers - learning from the past

(Edit 15 July 2011: JBoss AS 7 is here, and brings a huge improvement to class loading and memory management. It's not full isolation, but it limits the exposed contact surface between app server and app greatly, and massively improves class loading. Brilliant!)

If you've used a Java EE application server like Glassfish or JBoss AS for long, you will have experienced classloader leaks, though you may not have realized it.

If you've ever seen the error java.lang.OutOfMemoryError: PermGen space at deploy-time, read the linked article above. If you have ever worked on the JVM, on app servers, or on EE applications, please read on.

Even if you haven't hit classloader leaks, you should be aware of the various ways Java EE applications can cause memory leaks in the server and what to do about them.

For those coming here for help fixing an immediate issue with their app: Read the links above, and this article on using jhat's JavaScript interface to find likely leaks. More fancy JavaScript OQL tricks are here.

Background: classloader leaks

Classloader leaks are one of several symptoms of a design flaw in all current application servers. The memory space of the server is not clearly separated from that of applications running on the server. The Java Security Manager (if enabled) prevents apps from casually reaching into the appserver's memory to patch and tweak things. Unfortunately, this doesn't help the application server free all the memory used by an app when undeploying (killing and uninstalling) an application.

Any class in the application server its self, the core Java libraries, or additional libraries deployed to the app server for use by all classes may cause a classloader leak. All it has to do is hold a regular reference (rather than a WeakReference) to an application-defined class in an object that isn't destroyed when the app is undeployed. The app defined class in turn holds a reference to the classloader that loaded it. That reference keeps the app's old classloader alive, preventing that memory from being freed when the app is undeployed.

Since the Java libraries were not designed with application servers in mind, they're full of places where exactly that can happen. The article linked to in the intro shows one such case in java.util.logging .

One could try to clean up all places in the Java libraries where references to user-defined code may be held, but it's going to be a losing battle. Worse, it won't help protect the app server from libraries like JDBC drivers that typically get deployed to the server for the use of all apps.

Even if you did that, you still wouldn't have solved the problem. Classloader leaks are not the only possible kind of leak, they're just one of the most common and troublesome. Classes and classloaders exist in the PermGen space in the JVM, which is generally of fairly restricted size. This tends to make the leaks more obvious because the app server starts throwing OutOfMemory exceptions when an app is redeployed.

Another big problem area is libraries that use ThreadLocal storage for caches. It seems like a great idea, as using a ThreadLocal means you don't have locking overheads and delays. Unfortunately, in an application server environment things like HTTP threads are pooled between all applications on the server. If you have 200 HTTP threads and one of your applications likes to keep a 5MB ThreadLocal cache, even if that app isn't used very often you'll eventually land up with 1GB of memory used by those caches!

My understanding is that it's hard for the application server to clean up such caches. It can't tell which application owns which ThreadLocal data so it doesn't know what to get rid of at undeploy time.

Lessons from the past

Rather than trying to manually patch up every leaked reference and fix every library that uses a ThreadLocal cache, why not learn from operating system design?

Old operating systems used to share the same memory space between the OS "kernel" such as it was and the application(s). Apps could freely reach into kernel space using a simple pointer. A mistake in pointer arithmetic or numerous other errors could cause the app to trample all over OS memory. Perhaps more importantly the app had to carefully release any resources it acquired from the OS, otherwise the OS wouldn't regain the use of them until it was rebooted.

Modern systems protect themselves from processes running on them by isolating each process in a separate memory segment. The process cannot access the kernel's memory or that of other processes directly. Access to other processes is done via the kernel, and access to the kernel is done via a very restricted interface (usually a trap). Not only does this have huge advantages for security and protection against accidental memory corruption, but it means the OS can free all the memory used by a process by dropping the memory segment(s) it allocated for that process. It doesn't have to care about individually cleaning up each little string, struct and class allocated by the process, and it doesn't care if the application has memory leaks because they're confined to the application's process space so the leaked memory is recovered along with everything else.

Applying the lessons to EE app servers

How does this apply to Java EE application servers? Two ways. First: Cleaning up guest apps in detail and trusting them to behave properly is a losing battle, you need ways to brutally sweep away the remanants when you terminate an app. Second: Modern OSes already provide the required tools by running JVMs in protected memort spaces.

Future Java application servers should run each application in its own JVM, using efficient IPC mechanisms for server/client communication. References to appserver-provided objects held by the client would be proxies that used the IPC mechanism to do their work. Because no IPC mechanism is perfectly efficient, this means that some of the application server code would probably be run as a library within the application's JVM. For example, CDI/Weld would need to run in the app's space, as would any JDBC drivers, Facelets libraries, JAX-RS, etc.

To keep things reasonably efficient, the app server kernel would need to be able to pass sockets (HTTP connections etc) to client JVMs to avoid the overhead of copying HTTP replies via the master instance, but that's possible in most if not all OSes.

By clearly defining where the application server core ends and the application begins, it'd be possible to finally fix classloader leaks and all the other issues of the shared process space once and for all.

I'm well aware that this would have performance consequences, and that lots of work would be required to restructure app servers to be able to run this way. It's highly likely that core JVM changes would be required too. Nonetheless, if JavaEE app servers are ever to become as dependable as the OSes they run on, it's going to have to happen.

Right now, the common workaround to the problem is to run each application in its own application server domain, running on its on JVM listening on its own ports, etc. When apps are re-deployed, the app server domain is restarted to free leaked classes. A front-end server is usually employed to redirect HTTP requests to the appropriate app server(s). This model is even less efficient than the one I describe above, because it prevents app server instances from sharing anything.

No comments:

Post a Comment

Captchas suck. Bots suck more. Sorry.