Friday, May 25, 2012

PostgreSQL usability - PgAdmin-III and Pg need some usability love

As anyone who's read much here will know, I'm a huge fan of PostgreSQL, the powerful open source relational database management system. It is an amazingly project with a team that keeps on releasing high quality updates full of very useful new features and improvements.

This is my blog, so there must be a "but", right? You're quite right.

I already wrote a PostgreSQL: Great even when you can see the warts in response to a perhaps overly glowing write-up someone did a while ago. I'm not covering that again here; this post is specifically about a topic more and more dear to my heart, that of usability.

As a long-time UNIX geek I've always found PostgreSQL quite easy to use, because I live on the command line where psql is just wonderful. Recently, though, I had to use PostgreSQL on Windows, and in the process couldn't help seeing it from the new user's point of view. In fact, I tried to think like a new user as I performed the tasks I needed to do.

It wasn't all roses. I knew enough about Pg to get past the issues pretty easily, but some of them would be real roadblocks for new or even intermediate users, and I think they're worth highlighting. Many of the issues were with PgAdmin-III, but far from all of them.

This post started out as a minor critique of a few usability sore points in Pg and PgAdmin-III. As I went through step-by-step producing screenshots and thinking through the process as a user, though, I realised it's an absolute, complete and utter train-wreck. If this had been my first experience with PostgreSQL, I'd be a MySQL, MS-SQL or Oracle user now.


The task that prompted all this was me trying to restore a Pg backup to a Windows box, but most of these issues aren't Windows specific at all.

We start with a simple scenario: A user has a backup of the database bs, created with pg_dump -Fc -f bs.backup bs on a Linux box running Pg 9.1.3. They want to restore it to a Windows box running Pg 9.1.3. Maybe they created the backup file following a command-line recipe, or maybe they were given it by someone else; the point is, we're dealing with a relative newbie not an experienced cross platform command line user. I'm trying to write from their perspective, drawing on experience with questions on the Pg list, Stack Overflow, etc to get some idea what people struggle with.

If this seems overly harsh in places, well, sorry. I actually found myself getting frustrated as I wrote this up, it was really beginning to piss me off how borked some of the behaviour was. After all, the entirety of the following was equivalent to simply running:

C:\....>"C:\Program Files\PostgreSQL\9.1\bin\psql.exe" -U postgres
Password for user postgres:
postgres=# \i /Users/Craig/Desktop/globals.sql
ALTER ROLE
psql:/Users/Craig/Desktop/globals.sql:37: ERROR:  role "postgres" already exists
postgres=# CREATE DATABASE bs WITH OWNER bs_dbowner ENCODING 'UTF-8';
CREATE DATABASE
postgres=# \q
C:\....>"C:\Program Files\PostgreSQL\9.1\bin\pg_restore.exe" -U postgres --dbname bs bs.backup
Password:

... which I won't claim is user friendly (what with the normal/expected error message, the need to restore globals separately, the need to use 'nix style forward slashes in paths in \i, and the inability to use pg_restore -C because of locale name mismatches) but hey, at least it's simple if you know what you're doing with Pg. All those non-PgAdmin specific quirks and more are covered below, along with a bunch of PgAdmin-III issues.

All I want to do is restore a backup!

We have a file called bs.backup and we want to restore it to an already-installed local PostgreSQL instance.

This should be the simplest task, but it really, really isn't.

I'll just open the backup file

The first thing a Windows user is going to do is try to "open" the backup file. OK. *clickety click*.

Nope, that's not going to work. It doesn't help that PgAdmin-III uses the file extension .backup. When you see a file called something.backup do you think "PostgreSQL database backup"? Or "huh?". At least it has a header so file can tell it isn't just a tar archive.

Recommendation: PgAdmin-III really should encourage a better file name convention like .pgbackup, associate its self with the file extension on Windows/Mac, get a vendor MIME type for it, and associate with that MIME type on *nix.

Recommendation: In the mean time, PgAdmin-III should install a non-default "Open with..." handler for .backup files on Windows.

Back to our user. We'll assume the user knows that PgAdmin-III is their primary (GUI, on Windows) interaction point for PostgreSQL because, frankly, if they can't find it in the Start menu they're way below the skill level it's worth caring about for Pg usability.

Because they know it is their primary tool, and they know how to use Windows, they might try to associate PgAdmin-III with .backup files. Here's what happens if they do:

Nope, that won't work.

Recommendation: PgAdmin-III should accept .backup or .pgbackup files as command line arguments, prompt for a server, and offer to examine/restore. Since that'd probably be quite a lot of work to implement and I'm not going to have the time to do it, a reasonable quick and dirty alternative would be to emit a message saying "Cannot restore backup files from PgAdmin command line; please use the pg_restore command (manual link) or the Restore command from the Pgadmin-III GUI".

I'll just open it in PgAdmin-III

The next most commonplace interaction with a file is to open it from a suitable program, so let's try that. File menu...

Nope. Lets hunt around the menus. I should note at this point that the PgAdmin-III 3 help, when searched for "restore", doesn't tell you where the "Restore" tool is, but let's presume we found it, it's not exactly hidden.

Tools -> Restore is greyed out

OK, there's a "Restore" option in "Tools". That looks like what we want, but it's greyed out. Moving the cursor over it doesn't produce any hovering tooltip, nor any message in the status bar telling us why it's disabled. Clicking on it does nothing. Mystery. We're connected to the server, so we should be able to restore a DB, right?

Wrong. As any experienced Pg user knows, you have to connect to a database even if you're using pg_restore to create the database you are going to restore to. The "postgres" database isn't automatically used for this, and the user can even drop or alter it, it isn't a protected system database at all. A new Pg user will not know this, especially when they don't know the PgAdmin-III backup/restore GUI is a thin wrapper around pg_dump and pg_restore. They have no reason to expect to have to select a database.

Recommendation 1: PgAdmin-III should automatically use the "postgres" database and check the "Create..." option if no database is selected to restore to. "Create..." should be ticked and greyed out, with a note explaining no database was selected to restore to so one must be created. If the postgres database doesn't exist or can't be connected to, PgAdmin-III should tell the user about it, not just fail weirdly.

Recommendation 2: If disabled, the "Restore" option should use a statusline tooltip on hover to explain why it's disabled, eg "Cannot restore: Not connected to any databases". Either that, or should not be disabled and should show a dialog instead.

Recommendation 3: Longer term, Pg tools like pg_restore need to connect to an empty read-only database that always exists, like a version of template0 that accepts connections. Tools like pg_restore, createdb, createuser etc should use this db automatically (though of course not pg_restore without --create). Having to connect to one specific DB in order to create another or do some global task is counter-intuitive. Many people will use template1 for this, but it's way too easy to screw up and find yourself restoring random crap into template1 that way...

Recommendation 4: In PgAdmin-III, restore to an existing DB and restore to a new DB created during the restore should be entirely different UI options. It makes absolutely no sense for the context menu "restore" on one database to let you restore to a completely different one, yet that's currently the case. IMO, Tools->restore should ignore selection context and always ask you to pick a target server & dB or create a new DB on a server. The context menus should always operate on their targets, not a different database.

Time to RTFM

At this point at least some people will turn to the manual. Hopefully. A search for "Restore" in the PgAdmin-III manual. Oddly, it's mostly a long series of un-annotated screenshots, but there's some explanatory text down the bottom. At the default help window size this text is invisible, all you see is a screenshot.

At full size or scrolled down, you can see the limited explanatory text:

It:

  • Seems to imply you have to view a backup before you restore it.
  • Doesn't say what a "valid backup file" is. In particular, it doesn't explain that you can't use an SQL dump here, you can only use a backup created via PgAdmin-III's "Backup" dialog or via direct invocation of pg_backup with -Fc to get a custom format backup. The open dialog does say "tar/custom" for the file type, but few are going to know what that means.
  • Says the restore tool intentionally doesn't create a database, which is true by default but not always. It doesn't explain how to use the create option by connecting to a different DB like postgres to create and restore a database in one go.

The help hasn't directly answered our question ("why is Restore greyed out") but it's given us a hint: restore doesn't create the DB, we have to do that first. So we create a new DB and select it then choose Tools -> Restore or use the context menu.

Progress! OK, errors, but progress...

C:/Program Files/PostgreSQL/9.1/bin\pg_restore.exe --host localhost --port 5432 --username "postgres" --dbname "bs" --no-password  --verbose "C:\Users\Craig\dev\booksys\nymm\bs.backup"
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA pe
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 6; 2615 16393 SCHEMA pe bs_dbowner
pg_restore: [archiver (db)] could not execute query: ERROR:  role "bs_dbowner" does not exist
    Command was: ALTER SCHEMA pe OWNER TO bs_dbowner;

  ... followed by another 2300 lines of errors!

Our restore has failed. Why? The backup worked fine, and it's always restored fine when I dropped and re-created the database on the other machine. Instead I got this giant spew of errors.

(On a sidenote, to copy & paste the errors, I had to put the cursor before the 1st char, click & drag down, then waggle the mouse back and forth to scroll down while selecting because the window didn't autoscroll! Control-A doesn't work. It turns out there is a "select all" context menu item, but c'mon!)

If you know anything about Pg, you'll be thinking "missing roles, missing extensions, or missing languages". The error messages sure don't tell the newbie this in any terms they'll understand, though. Sure, role "bs_dbowner" does not exist, but why?. What's wrong with my backup?

No hints, no nothing. This is going to be a stumped-point for some people. There's certainly nothing to point them to the fact that they need to go back to the original server and run pg_dumpall --globals-only then run the resulting SQL script!

In fact, I'd go so far as to say that this is completely insane. A -Fc backup should automatically pull in all global objects (like roles) it refers to. Always. I view this a as a major bug, as unless the user tests their backups on a completely clean install they'll think a backup is complete when it isn't. There are some security considerations with including the password hashes of global login roles in the backup that bear consideration, but an answer is really needed here.

VERY STRONG RECOMMENDATION: pg_restore should always include global objects in custom-format backups unless instructed otherwise. An option to include their roles but not their encrypted passwords would perhaps also be wise; those can be reset, and the DB will restore fine if the roles don't have valid passwords.

Recommendation: When pg_restore fails, a link to an FAQ page on the wiki or in the docs should be produced so people can learn about these limitations and issues and what to do about them.

Recommendation: PgAdmin-III should understand control-A in the messages panel and the messages panel contents should autoscroll when the mouse is held while leaving the widget.


Restore globals first

OK, say somehow I've found out that I have to dump and restore the globals first. I've run pg_dumpall --globals-only on the server, which thankfully I still have easy remote access to because I'm not in a disaster recovery or remote situation. (Yes, test your backups, but how many users will actually do that?).

I've copied the resulting output to the local machine, it's in this globals.sql file here. Double-clicking on it produces the familiar "Windows can't open this file" because PgAdmin-III hasn't registered a handler - fair enough for a generic file type, but what do I do with it?

I'll try the restore dialog, now I've figured it out. The file selector filters for ".backup" files so I don't see it. I'll change to "All files (*.*)" and open it.

C:/Program Files/PostgreSQL/9.1/bin\pg_restore.exe --host localhost --port 5432 --username "postgres" --dbname "postgres" --no-password  --verbose "C:\Users\Craig\Desktop\globals.sql"
pg_restore: [archiver] input file does not appear to be a valid archive

Process returned exit code 1.

Nope, that isn't it.

Recommendation: PgAdmin-III's "restore" dialog should accept .sql files, offering to run them with psql and showing you the output in the messages window. Copy and paste of large SQL scripts into the SQL editor is not practical, and forcing a newbie or GUI addict onto the command line to run psql is really not ideal if usability is a concern.

It's an SQL script, there must be a "run script" somewhere...

  • File menu - nope
  • postgres database context menu - nope. There's "CREATE script" but that doesn't do what I want, and there's no "from script" in "new object".
  • server context menu - nope
  • Tools menu - nope. The "Scripts" menu doesn't have "run script", though it does have this "CREATE SCRIPT" option that doesn't do what I want. What about "Query tool"? Doesn't look like what I want...

OK, how the hell do I run this script? pg_dump from the command line? User hunts around GUI and finds "SQL" button on toolbar or happens to open "Query" tool on the off chance it'll help. Hm, ok, this Query window has a File -> Open option and matching toolbar entry. Opening the SQL script with that shows it in the editor, and I can run it with the green arrow. Ok.

ERROR:  syntax error at or near "\"
LINE 5: \connect postgres
        ^

********** Error **********

ERROR: syntax error at or near "\"
SQL state: 42601
Character: 48

W.T.F.. Seriously. At this point, as an experienced Pg user, this is beginning to amaze me. How the hell is a new or mid-level user supposed to know what the hell to do here? For that matter, why is that statement there, since:

  • The user can delete/drop the "postgres" database so it may not exist;
  • The user must already have specified a database to connect to if they're running psql or the PgAdmin-III SQL editor, and these are globals so it doesn't matter which DB the statements get run in.

Recommendation: pg_dumpall should not emit \connect postgres for --globals-only.

Recommendation: Pg needs a hidden, read-only, non-droppable system database for this kind of thing! Seriously!


Say the user successfully runs the globals restore script via psql from the command line or via psql's \i option. You'd expect them to also be able to run it by deleting the \connect postgres line, but no, PgAdmin-III has silently wrapped the script in a transaction (!!) and so the whole thing gets rolled back when it hits the CREATE ROLE postgres; line that will always fail on any normal PostgreSQL install. They have to delete that too. If you think having to hand edit your backups to restore them is insane, you're not alone, but this is pretty much what we're making users do.

RECOMMENDATION: pg_dumpall should emit CREATE ROLE ... IF NOT EXISTS statements at least for the postgres role. Well, if that were supported, anyway. It should not be normal for an error to be emitted by restoring a dump.

RECOMMENDATION: PgAdmin-III should make it explicit that it's wrapping code in the SQL editor in a transaction, and make that optional. Seriously, I had to think briefly about this error, and I've been using PostgreSQL heavily for years. In usability terms, this stinks.

Anyway, let's say our persistent user has managed to restore their roles, or has given up and hand-recreated them from the errors they got restoring their dump. Time to restore our data.

So close, then it hangs!

I create the target database in PgAdmin-III, right-click on it, and choose "Restore". I choose the file created by pg_dump -Fc on the origin machine, and click "Restore". It switches to a "Messages" tab where the following is displayed, with no progress indication other than the "cancel" button becoming enabled. When I click in the PgAdmin-III restore window, it becomes greyed out with a spinning "wait" cursor, the usual indication in Windows that a program is non-responsive (usually crashed or in an infinite loop). It's 4:42. I click the "close" box, and get a "this program is not responding" message.

Say I don't force-quit PgAdmin-III or throw my laptop across the room, but instead patiently wait. At 4:46, the restore finishes, as this is a pretty small database on a fast-ish machine. PgAdmin-III never did respond to UI interaction until the restore finished, it was completely blocked on pg_restore.

RECOMMENDATION: PgAdmin-III should monitor pg_restore via a separate thread, not block on reading it. If it doesn't continue to respond to window events in the main thread, Windows thinks it has crashed/hung, and so does the user.

Recommendation: I know this one isn't easy, but from the user's point of view some kind of progress for a restore is needed, even if it's just a field saying "Current command: " or a counter ticking the number of queries run or something that moves. Even a pointless spinner.

OK, we waited, and the dialog has produced a giant string of messages in the window all at once, followed by:

Process returned exit code 0.

Now, 0 is normally false (error) to anyone who hasn't been broken by the inverted convention used on the UNIX shell, but there are no obvious errors in the above messages spew, so our user is going to assume that all is well. Or hope so, anyway. "Cancel" is still enabled, but the "Done" button is enabled, and that must be a good thing, right? (Side question: At this point, does cancel actually cancel anything? I don't see what it could do...).

WE HAVE A DATABASE AGAIN!

Of course, I didn't have any extensions or extra PLs installed. If I'd had plperlu in my old DB, I'd still be up shit creek without a paddle.

A different approach

Now, imagine if we'd noticed the "Restore" context item on the postgres database instead of going and R'ing TFM, then spotted the "Restore options #1" checkbox "Include CREATE DATABASE statement". We check that, and we also think running the whole lot in a single transction is a great idea, so we check "Single transaction" too.

C:/Program Files/PostgreSQL/9.1/bin\pg_restore.exe --host localhost --port 5432 --username "postgres" --dbname "postgres" --no-password  --create --single-transaction --verbose "C:\Users\Craig\dev\booksys\nymm\bs.backup"
pg_restore: [archiver] -C and -1 are incompatible options
pg_restore: *** aborted because of error

Process returned exit code 1.

Um, ok. What are "-C" and "-1"? (Yes, of course, I know that, but the poor user doesn't, they bear no relationship whatsoever to the checkboxes they chose).

RECOMMENDATION: PgAdmin-III should refuse to check "Single transaction" when "CREATE DATABASE statement" is checked or vice versa. The excluded option should be disabled, and an explanatory label should appear beside the disabled option that says why. Ditto other mutually exclusive options.

RECOMMENDATION: When pg_restore is asked to create a database and run in a single transaction, it should create the DB, then run the job in a single transaction in the new DB. If the CREATE DATABASE succeeded then the DB that pg_restore connects to must be the one it just created, unless someone's dropped it and created a new one in the last few ms, or swapped two by rename. They'd have to be someone with the rights to do that, though, and thus had better be trusted. For extreme paranoia points, CREATE DATABASE could always be extended to report the oid of the created database, so pg_restore could make sure the DB it connects to is the same one it created. I think that's going too far.


Anyway: User looks those options up in the docs and matches them to the UI, or figures out they must be the two they selected. They run the restore again with only "Include CREATE DATABASE statement" selected.

C:/Program Files/PostgreSQL/9.1/bin\pg_restore.exe --host localhost --port 5432 --username "postgres" --dbname "postgres" --no-password  --create --verbose "C:\Users\Craig\dev\booksys\nymm\bs.backup"
pg_restore: connecting to database for restore
pg_restore: creating DATABASE bs
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3415; 1262 16392 DATABASE bs bs_dbowner
pg_restore: [archiver (db)] could not execute query: ERROR:  invalid locale name en_US.UTF-8
    Command was: CREATE DATABASE bs WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';



pg_restore: [archiver (db)] could not execute query: ERROR:  database "bs" does not exist
    Command was: ALTER DATABASE bs OWNER TO bs_dbowner;


pg_restore: connecting to new database "bs"
pg_restore: connecting to database "bs" as user "postgres"
pg_restore: [archiver (db)] could not reconnect to database: FATAL:  database "bs" does not exist
pg_restore: *** aborted because of error

Process returned exit code 1.

This is a major WTF point for anyone. Seriously. The cause of this error is the fact that Pg uses the system locale names, so en_US.UTF-8 doesn't exist on Windows and can't be used. But how the hell is the poor user supposed to know that? This is far from just a PgAdmin-III issue, it's a big fat wart on PostgreSQL in general and pg_restore in particular.

How is the user supposed to know they can create the DB themselves with the equivalent Windows locale (if there is one) then restore to it? Even once they figure out that they need to care about the first error message only, because the others are just consequences of the first?

How are they supposed to have a hope of mapping the locale names, unless they're a really experienced cross platform user who knows about text encodings?

Seriously, this is the kind of usability I expect to see from Oracle, not PostgreSQL.

Here's a list of posts, etc where people have been confused by these issues. They can be used to inform usability design and choices. I'm not actively searching, just adding things as I see them on the mailing lists.

7 comments:

  1. Great post. I really wish I could help with pgAdmin3 development but unfortunately I really dislike wxWidgets, and.. I don't have enough skills anyways.

    ReplyDelete
  2. I dislike Wx too (it's like an uglier buggier version of SWT). Then again, the only GUI toolkit I actually like is Qt, and nobody uses it, so I often have to work with what is there. Knowing different toolkits is a useful skill, however unpleasant.

    There's work to be done on things other than GUI code, though. Documentation fixes in PgAdmin-III, as well as improvements to pg_dump and pg_restore. Even if you can't code, you can join in patch review (https://commitfest.postgresql.org/, http://wiki.postgresql.org/wiki/Reviewing_a_Patch) and help out that way.

    I'm planning on tackling some of this, but there's a lot here. If you really want to help, even rewriting the "Restore" documentation page would go a long way.

    ReplyDelete
  3. I just ran into the issue with locales as well. The only option is to use createdb the dismiss -C option :S

    ReplyDelete
  4. Great post, it perfectly reflects my newbie frustration with the restore process. Thankfully I didn't attempt to "open" the backup files (come on, that's pushing it a bit ;))
    Anyhow, I'm trying to restore a .dump file and the help documents suggest using a command of the format psql dbname < infile, but the psql console only seems to recognize commands preceded by a backslash (and \psql isn't among them). I'd be grateful for any advice, or otherwise you can add it to your collection of potential newbie frustrations with pgAdmin3.
    Btw, finding the "PSQL Console" took me a loooong time.

    ReplyDelete
    Replies
    1. Frankly, I think you should be able to "open" the backup files. And, alas, it isn't pushing it; you'd be amazed by the number of "how do I open this backup file" and "I have this PostgreSQL backup now what do I do with it" questions that turn up on the mailing lists and Stack Overflow.

      As for psql dbname < infile ... that's a shell command, not a psql command, so it's run from the cmd.exe shell (Windows), Terminal.app (mac), etc. I'm not sure exactly which part of the above text you're referring to, either. An alternative is to open a psql session and run \i /path/to/infile .

      Delete
    2. Ah, you mean in the PostgreSQL help, not what I've written. Makes more sense. Can you link to the part of the documentation that was confusing, so I can take a look?

      Yeah, that's a shell command that uses stdin redirection. It's equivalent to:

      psql -f infile dbname

      and I'd recommend you use that form since it gives you better error messages.

      On windows, you'd run that by opening a Command Prompt session and running:

      "C:\Program Files\PostgreSQL\9.2\bin\psql.exe" -f "\path\to\infile.sql" dbname

      Note the double quotes. Adjust path and PostgreSQL version to suit your install. The reason for the full path is that psql isn't on the PATH by default on Windows.

      psql isn't part of PgAdmin, they're completely separate.

      Delete
  5. I have been trying a few times on windows, without success. Now, I have the manual. thx

    ReplyDelete

Captchas suck. Bots suck more. Sorry.