The Tech Faucet: Natural sorting: An example of the utility of Pg's composite types and arrays

Friday, October 19, 2012

Natural sorting: An example of the utility of Pg's composite types and arrays

While looking at a recent stack overflow question I found myself wondering if it was possible to write a natural sort for strings containing numbers interleaved with non-number text using only PostgreSQL's core functionality.

Natural sorts are an important usability feature, as Jeff points out in his post on natural sorts above.

So I asked for ideas, and it turns out that yes, you can, though it's a bit long-winded. Props to Erwin Brandstetter for persistently refining the approach. The general idea is to create a composite type of `(text,integer)` then sort on an array of that type. See the linked question for details.

This illustrates how powerful Pg's composite types and arrays are, though I'm not sure you should consider any of the proposed solutions for real world production use.

It it also helps to show how nice it'd be to have access to native OS-independent Unicode collation in PostgreSQL using the International Components for Unicode (ICU) project, which would not only solve those nasty Windows-vs-Linux locate name issues when restoring dumps, but would also allow the use of advanced collation flags like UCOL_NUMERIC_COLLATION.

I'd really love to be able to use a custom collation function in Pg, either via an ORDER BY extension or by creating a collation that uses a user-defined collation function then using that collation in the COLLATE clause. Then I could write a C function to use ICU to do the special collation required for a particular job. This doesn't appear to be possible at the moment.

I recommend reading Jeff's post on natural sorting and why it's important; as usual, it's excellent.

2 comments:

Josh BerkusOctober 21, 2012 at 3:07 AM
Craig,

IIRC, the reason we didn't add ICU support years ago was sheer size; the ICU library is huge and would have added a significant amount to the overall PostgreSQL code base. Not sure if that's still a real concern.

BTW, can you please remove the Captcha on your blog comments? It's really annoying ... this will be my last comment if it's still there in the future.
ReplyDelete
Replies

Add comment

Captchas suck. Bots suck more. Sorry.