Tuesday, July 3, 2012

PostgreSQL rocks, and so does explain.depesz.com

I was modifying one of my aggregate views today and I was struck by just how impressive PostgreSQL is, and how powerful relational databases in general really are. PostgreSQL was executing queries against the view a completely different way when one entry was being queried vs when the whole view was being requested.

You might take this for granted - but when you think about it, it's seriously amazing how a database can take your description of what you want and work out the how for its self. Mostly.

Imagine writing this yourself. Say you're working in Java. You have a complex Criteria query against several different tables related to information about customers. You now want to get information for just one customer, so you add a WHERE clause to the top level. If the database didn't nearly magically push this filter criterion down into all those complex sub-queries and joins you'd be at it for hours; days or weeks if you wanted to make it re-usable and generic across a set of similar criteria queries.

Instead, the DB just does it for you.


Compare the query plan for fetching the whole view to a plan for fetching only one entry. They don't even look remotely similar, yet they're both plans a simple SELECT * FROM customer_summary_dynamic; one is just constrained by a WHERE clause to only return one customer entry.

Yes, explain.depesz.com is awesome. If you use PostgreSQL, you should know about it. It even understands the extended explain syntax, the JSON format, etc.

The amazing thing is that such flexible query planning is utterly routine for relational databases and is usually beneath developers' notice, as is the database's transparent maintenance of data consistency and all the other vital work it does.

Pg is a vital tool for me. Sure, it has issues, but the issue-to-awesome ratio in Pg is superior to almost anything else I've used.

Whenever you think "is an NoSQL database a better solution for my problem" - remember what you're throwing out, as well as what you're gaining.

It does make me a little sad that the relational model as currently approached fits relatively poorly with session-per-request pagination. The relational answer to pagination is cursors, but cursors don't work well when you have users who might wander off for a few hours. We really need a way to allow the DB to use a cursor internally but discard it and re-do the work if it's abandoned, idle for too long, or the resources are needed elsewhere.

No comments:

Post a Comment