Friday, April 16, 2010

Reasons why Microsoft Access absolutely stinks as a DB UI RAD tool

When used as a UI builder & RAD tool for accessing a real database (in this case PostgreSQL), Microsoft Access absolutely stinks. It has some severe problems that require cumbersome workarounds. In fact, if you're vaguely competent with a programming language it's almost certainly better to just write the application in your preferred language and be done with it. Java + Swing + Hibernate, for example, would do the job very well, as would C++ & Qt .
I was forced to use Access in this most recent project, and thought I'd post a warning for others who might be considering it but have alternatives available.
Note that none of this applies if you use MS SQL Server as the backend since Access doesn't use ODBC linked tables for MS SQL Server, but rather its own custom interface that's much smarter and more capable.
Problems using Access with ODBC linked tables include:
  • It doesn't understand server-side generated synthetic primary keys. It can't just insert a record with a DEFAULT key and then retrieve the generated key through the ODBC driver. To use a PostgreSQL SERIAL column (a sequence) you must write an event procedure in VB that fires on the Form_BeforeInsert event. This procedure must issue a passthrough query to invoke nextval('seqname'), then store the value in the primary key field of the form. Hooray for a tool that does everything for you transparently, simply, and automatically.
  • When you write your passthrough query, you will notice when examining the server query logs that it fires twice. That's because, if the ReturnsRecords property is set on a query, Access appears to issue the query twice. Presumably it's using the first instance to enumerate the returned fields for type-checking purposes, as it does not do this when the ReturnsRecords property is false. In any case, this means that you can't get the return value of a stored procedure with side effects without invoking the procedure twice. So, for nextval, I've had to first call nextval('seqname') and discard the return value, then call currval('seqname') to get the return value. That's two round trips to the database that were completely unnecessary, and a total of four for a single INSERT.
  • Access has no concept of data dependencies between fields. If one field is (say) a combo box populated by a query that depends on the value of another field, you must use an event procedure in VB to trigger a refresh of the dependent field when the field it depends on is changed. This would seem to be one of the more obvious and basic things a RAD database tool would do for you, and I was stunned that it does not.
  • Access loves to retrieve huge amounts of unnecessary data. If you open a form that's bound to a linked table, it'll issue a query like SELECT pkey FROM table;, which will result in a sequential scan on the whole table for databases that can't return the data from the index (like Pg¹). Even for databases that can, they must still read and scan the whole index. That's also potentially a lot of network I/O. You can work around this with EXTREMELY careful use of filter rules, but you have to be quite paranoid to make sure it never sees the form/table without a filter rule.
  • Access cannot natively perform non-trivial queries server-side - you have to write them yourself with Visual Basic and use unbound forms, which isn't too dissimilar to how you'd be working with (say) Hibernate, just uglier. With a bound form and a linked table, Access will use a WHERE clause on the primary key when fetching a record, but that's about it. It can perform simple inner joins server side too, apparently, but I've hidden everything I can behind server-side updatable views so I haven't had to struggle with this aspect yet.
  • Access cannot handle NULL boolean values.. Use a smallint with a constraint on it instead.
  • Access needs some custom operators and casts defined to work with PostgreSQL.. It assumes that certain types may be cast to other types, compared to them, etc in ways that aren't allowed by default. Thankfully PostgreSQL supports user-defined operators and casts, so you can just add them. Here's one page with a useful set of operators and casts..
  • Access likes to test for nullity with the = operator.. This is just wrong - the result of NULL = NULL is NULL, not TRUE, since the two unknown values may or may not be equal (the result is also unknown). PostgreSQL has a hack parameter, transform_null_equals, that may be set to enable the incorrect behaviour that MS Access expects. Access MAY have been fixed in version 2007; I haven't had cause to test this particular quirk yet.
You can work around a lot with careful use of filters, lots of Visual Basic code, plenty of views, updateable views, and stored procedures, and by writing your stored procedures to have EITHER side effects OR a return value (never both). It's not much fun, though, and it's still far from efficient.
On the upside, with row versioning enabled in the PostgreSQL ODBC driver, Access uses the MVCC attributes of the tuples (xmin, etc) to do optimistic locking. This allows it to avoid long-running transactions and long-held locks during user interaction without having to mess with the schema to add a field for oplocking use. It'll even play well with the oplocking used by things like Hibernate, and will work properly with apps that use ordinary transactional locking (SELECT ... FOR UPDATE and so on) without the need for any sort of support triggers. It's very nice.
To use Access with PostgreSQL, you REALLY need to set these options in the ODBC DSN for the connection:
  • Row Versioning causes the driver to use only the primary key in WHERE clauses, and to use xmin & affected record count during updates to avoid conflicts.
  • True as -1
  • (unchecked) Bool as Char
¹ This isn't as stupid as it sounds. PostgreSQL uses an MVCC approach that allows for very efficient, highly concurrent inserts and selects. The price is that it must store tuple visibility information to know which transactions should see which tuples, since the table contains the data at many different points in time all at once. Visibility information isn't in the indexes (because it'd generate lots more write activity on the indexes, slowing down I/O) so once an index indicates a match on a tuple, it must be read from the heap to check visibility even if the index contains the actual data point we want. It's a performance trade-off that is not particularly great when you're doing a `count(*)' or `select pkey from table' but is wonderful on highly concurrent systems.

No comments:

Post a Comment

Captchas suck. Bots suck more. Sorry.