Saturday, June 23, 2012

Mail to the JPA 2.1 Expert Group re fetch control

I'm setting up another opportunity to look foolish - which I view as a good thing; if I don't risk looking foolish, I don't learn nearly as much.

I've mailed the JPA 2.1 EG re control over fetch mode and strategy on a per-property, per query basis, as I'm concerned this may not otherwise be considered for JPA 2.1 and thus Java EE 7. As previously expressed here I think it's a big pain point in Java EE development.

I'd appreciate your support if you've found JPA 2's control over eager vs lazy fetching limiting and challenging in your projects. Please post on the JPA 2.1 users mailing list.


UPDATE:: JSR388 has been released with support for "fetch graphs", a feature that appears to meet the specified needs. I'm no longer working with Java EE or JPA, so I haven't tested it out.

UPDATE: http://blog.ringerc.id.au/2012/06/update-on-jpa-21-and-fetch-control.html

UPDATE: Many of these problems are solved, albeit in non-standard and non-portalbe ways, by EclipseLink. EclipseLink extensions can be used to gain much greater control over fetches at the entity definition level and more importantly via powerful per-query hints. Glassfish uses EclipseLink by default, but you can use EclipseLink instead of Hibernate as your persistence provider in JBoss AS 7.



To the JPA spec team, the users list, and possibly interested others I've BCC'd:

I've been seeing increasing evidence on user-facing forums and mailing lists that control over fetching via JPA is a real challenge for developers. It's certainly been a huge one for me. I'm interested in whether this can be improved for JPA 2.1 and Java EE 7, as in my view the fetch issues are a big pain point.

I'm writing to raise this with the JPA 2.1 spec team, as I don't see any enhancements regarding fetch strategies and modes in the latest draft and didn't spot discussion of it on the list. I'd like to strike up a discussion about what, if anything, can/should be done about this for Java EE 7.

What do apps need to do?

In JPA 2.0 there's solid control over lazy vs eager fetching on an entity/property/relationship basis using the usual @...ToOne / @...ToMany (fetch=FetchType.[LAZY|EAGER]) annotations and the orm.xml equivalents. This works well, but is too simple for many projects.

From my admittedly rather limited experience, and from discussions I've seen, it seems common to have widely referenced entities that you don't want to eagerly load the relationships of most of the time, but *need* them loaded in some situations. Commonly this is because you'll be using them detached from an entity manager context and know you'll need access to normally lazily loaded properties. Sometimes it's a performance issue where you can't afford the expense of lots of little database hits as proxied lazily loaded properties are loaded.

What's currently possible?

Right now, my understanding - and I don't claim it's a great one - is exactly one option to override normally lazy fetching with standard JPA: use a left join fetch, either in JPQL or via Criteria API. That's OK much of the time.

What's wrong with the current situation?

Being limited to a "left join fetch" can also be really problematic:

There's no way to ask the provider to use a different fetching strategy, like a follow-up batched SELECT, or use subselect fetching.

A left join fetch is fine when you're eagerly fetching one or two lazily fetched entity relationships. It scales extremely poorly if you have several things to fetch and/or more than one level, eg "a.b.c".

Apps sometimes need to do extra JPQL / criteria queries and repeat work in order to load required entities into the persistence context without expensive multiple joins.

The key problem in my view is that the JPA API doesn't give the user any way to ask for normally-lazy relationships to be eagerly fetched without also forcing them to be fetched in a single SQL query. That can be really sub-optimal, and it conflates joins (a matter of query logic) with fetching (a matter of what's retrieved). You can't say "fetch x.y in whatever way is optimal".

I've seen numerous recommendations, especially on the Vaadin lists and around Swing apps, to use EclipseLink and allow it to lazily load properties of detached entities using proxies. This is a nasty thing for people to be relying on, as (a) each load is a query, so it's the ultimate in n+1 or worse with nested properties; and (b) those later loads are generally in new transactions, breaking the DB's consistency guarantees in ways optimistic locking often can't help with. That people are having to rely on this is IMO of concern.

It doesn't help that the Root.fetch(...) API is difficult to use correctly and has been acknowledged to be poorly specified. It's easy to land up doing a second unnecessary join, or to get a " query specified join fetching, but the owner of the fetched association was not present in the select list" error. This article used to talk about it:

http://blogs.sun.com/ldemichiel/entry/jpa_next_thinking_about_the#comment-1291653518000

but has since been devoured by the Oracle transition.

What can be done via implementation-specific extensions?

Some JPA implementations offer fetch controls via extensions, but there's nothing consistently available.

EclipseLink gives quite good fetching control via JPA query hints, allowing default fetch modes to be overridden on a per-property basis and allowing the specification of alternative fetch strategies. It also supports lazy loading of properties in detached entities, which has several problems as mentioned above.

Hibernate, as far as I've been able to determine, doesn't expose anything equivalent at the JPA level. It has setFetchMode(...) in its own Criteria API, but as far as I've been able to find out it doesn't expose that to JPA via hints or other mechanisms. I'm frequently told that Hibernate is best suited for short-transaction stateless applications because it doesn't lazy load on detached entities - presumably because it's too hard to specify what you want eagerly loaded.

I'm not sufficiently familiar with other implementations to say what they offer.

What's needed?

In my view, the key thing is that JPA needs to do is provide join mode and strategy controls at a per-query, per-relationship level without requiring a left join fetch. I'd be interested in what your thoughts are.

Per-query, per-property overrides for eager vs lazy fetching

Clients need to be able to specify to the ORM that a given property should be eagerly or lazily fetched in a particular query. An API that avoids the need for providers to have to parse free-form properties (and is thus more checkable) would be good, so adding something like:

CriteriaQuery.setFetchMode(String propertyName, FetchType fetchType)

would seem ideal to me, where "propertyName" can be a dot-path to sub-properties, or of course a metamodel object/path.

Different fetch strategies are supported by different implementations, and I don't think the JPA spec can really specify a complete set of possible strategies, so the fetch mode type should probably be a simple EAGER | LAZY enum, handily already provided by javax.persistence.FetchType . The implementer should be free to choose the most appropriate fetch method, so long as properties marked EAGER are in fact attached to the persistence context when the query completes.

Per-query, per-property control over fetch strategies

IMO if explicit specification of fetch strategy is provided though the JPA API (which would be nice) it should be by string names for strategies, or at least allow them. There's no predicting what fetch strategies will be possible. For example, with PostgreSQL's new JSON data type support it's possible to do an eager fetch of a relationship using a join or subquery with query_to_json, using array_agg and array_to_json, or using record_to_json. The ORM no longer needs to de-duplicate a cross product. Standardizing this would be nuts, but a way to ask an ORM that's aware of it to use it makes sense. I'd like to see something like:

CriteriaQuery.setFetchStrategy(String propertyName, String strategy)

... and maybe ...

CriteriaQuery.setFetchStategy(String propertyName, FetchStrategy strategy)

... with FetchStrategy being an enum { JOIN, SELECT, SUBSELECT, ANY } , as those are the widely recognised strategies plus one that lets the implementation choose (default for FetchType.EAGER).

Fetch groups?

It may also be worth thinking about another often-sought-after facility, fetch groups, but IMO control over fetch mode and strategy on a per-query, per-relationship level is much more important.


BTW, I wrote a bit about this earlier here:
http://blog.ringerc.id.au/2012/06/jpa2-is-very-inflexible-with-eagerlazy.html

--
Craig Ringer

6 comments:

  1. I created a spec issue for this very problem a while back at: http://java.net/jira/browse/JPA_SPEC-27

    Unfortunately it doesn't seem to have gotten much attention yet.

    ReplyDelete
  2. This feature is absolutely essential. I can not imagine moving from our custom data access layer to JPA unless this is implemented.

    ReplyDelete
  3. Absolutely a crucial thing to have!

    Why aren't more people focal about this? Why isn't the JIRA ticket listed above getting more votes?

    If -we- want to change anything here, then -we- have to speak up.

    ReplyDelete
  4. You mean just like JDO fetch groups (also applicable on queries), available since 2005 :-) Good luck on getting that; absolutely essential for any serious project

    ReplyDelete
  5. Hi Craig,

    Great blog!!!
    Is there any email to contact you in private?

    Regards,
    Dimitris

    ReplyDelete
    Replies
    1. echo 'cmluZ2VyY0ByaW5nZXJjLmlkLmF1Cg==' | base64 -d

      Not that there's really much point in masking it given it's all over the 'net, but comment scraping on blogger seems particularly aggressive.

      Delete