Thursday, July 22, 2010

JPA (and Hibernate/EclipseLink/OpenJPA/etc) in desktop apps is awful

Using the JPA API in desktop apps seems like a great idea. It provides a standard interface to Java ORM systems, letting you use annotations to map database entities to beans in a quick and convenient manner. Implementations may be plugged in as required, so you can switch from Hibernate to EclipseLink under the hood with little pain and fuss.

You'll find lots of examples and tutorials on the 'net that demonstrate how to use JPA to build simple "CRUD" apps with various frameworks, like the NetBeans platform, the (now defunct) Swing Application Framework, etc. They make it look like a quick, convenient and easy way to develop database access apps in Java.

There's just one wee problem... they don't work in the real world. With desktop apps, that is; I'm sure Hibernate and friends are great on the server in a 3-tier Enterprise Application with a suitable army of coders.

JPA 2.0 relies on lazy initialization of beans to be efficient. You don't want to download all the data you have on hand about 10,000 customers in your database just to display their names in a tree view. You want to retrieve just their names, lazily fetching most other attributes. Later, when the user goes to (say) edit the customer, you can just fetch the lazy attributes on first use. That's how JPA works, and it seems like a great approach that saves a lot of hassle.

Theoretically, anyway. JPA lazy fetching only works within the context of an open EntityManager session. Entity beans retrieved by an EntityManager typically have a much longer lifetime than that of the EntityManager, ie they become detatched from the database session. Unfortunately, detatched entities will fail to access lazily loaded properties, throwing an exception if access to a lazy property that's not yet loaded is attempted.

It might seem like the sensible thing to do is just keep the EntityManager around, so the entity beans remain attached to the session and can lazily load properties whenever required. Unfortunately this isn't very practical. For one thing, you'll often be using entity beans with APIs (like Swing models) that don't know they're entity beans and have no awareness of their connection to the database. Making this possible is half the point of JPA, but gives you no way to link the lifespan of the EntityManager to that of the beans it manages, except with the ugly hack of inserting a reference to the entity manager into each managed bean. Even then you can't guarantee to close the EntityManager when the last bean that uses it is being disposed of, because of the unreliability of on-finalize actions in Java. Letting the entity manager be quietly gc'd is unacceptable for the same reason - there's no guarantee it'll properly release its non-memory resources like any connection it's checked out of the pool, so if you don't explicitly close the entity manager you can leak connections from your pool.

Worse than that, though, is that even if you keep the entity manager around to handle lazy property loading and you've somehow solved the lifespan/scope issues, lazy loading can cause a thread to block waiting for database/network access whenever you access a bean property that happens to be mapped as lazy. You have no control over what thread the database access is done in, and no way to let a thread do other things while lazy properties are loaded. There's no way to respond to a call to"getCustomerName()" with "Er, try again later, I'm finding that out for you now". This is awful for GUI applications, because it means that the UI thread will block while the database is accessed without the ability to even display a busy cursor. What if the user's wifi drops, or they're on cellular? Your app just "crashed" as far as they're concerned.

Working around the blocking issue removes most of the benefits of JPA use. You have to do all your database work in a separate database worker thread (or set of threads). Remember Swing 101: don't block the EDT. Because you can't control blocking when lazy loading is used, you have to detach all entities before letting them escape the context of the database worker thread. This means that you either can't use lazy properties (so you waste a LOT of network bandwidth and loading time retrieving things you don't need) or your non-database components need to be aware of how to handle detatched entities with lazy properties.

Because there's no way to reliably tell what lazy properties of an object might be required when you're retrieving it, you'll find yourself reloading entities frequently as they're passed around the app. Each time there's a database access delay, and at each point you have to be able to handle the user going off and doing other things in the UI, or have to block the UI with appropriate feedback. Every GUI component starts to sprout code to check entities its passed to see if they have the required properties loaded and if not, return control to the EDT while reloading the entity with the required properties preloaded. For lazy loading to be useful in an environment where you can't block on database access, you need to know what properties of a bean will be used by what components in advance.

At this point, you may start to wonder if working directly with JDBC was such a bad thing after all. At least with JDBC it's easy to retrieve arbitrary subsets of your data, rather than fighting a framework that wants to have a fixed set of lazy/eager attributes and requires messing with "left join fetch" HQL hacks to selectively eagerly fetch.

Update: Someone else who's written about this has some pretty similar issues and no better answers. See

No comments:

Post a Comment

Captchas suck. Bots suck more. Sorry.