Thursday, August 23, 2012

For anyone who programs in any language or designs databases and formats

Anybody who does any kind of design or programming needs to read these two articles. So should anybody who designs data formats, database schema, or specifies any kind of standards.

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Falsehoods programmers believe about names

If you haven't read - and acted on - these then you're probably producing bugs at a fair rate. I harp on about some of these points fairly regularly, so I thought I'd collect the articles in one place.

Sadly it's often hard to avoid the whole "exactly one name structured in two parts" thing the real world, because you're often dealing with other mis-designed systems that expect them to be different fields. Some of those systems aren't even software; they can be business processes, legal processes, and more.

The same applies to gender/sex, where it really isn't as simple as that [M/F] radio button or pulldown you probably have ... but half the legal processes and 3rd party APIs you work with think it is. Your user is trans-F-to-M or XXY indeterminate? They just have to shove themselves into a box.

Sadly, most people with non-western-Eurpoean names living in western countries are used to butchering their names to fit the split name model. For that matter, people with western European names are used to butchering them to work in systems that like to turn "Renée" into "Renee" or - depressingly frequently - "Renée". Does your last name have a space in it? You're doomed to being "Jacobsen, James van", "Jacobsen, James Van" (thanks to "helpful" auto-capitalisation) or even "Jacobsen, James V." forever.

Would you like to have your name "corrected" - to something wrong - or fail to validate in every second system you use? If not, consider those for whom that's true and fix your software. Ditto for gender/sex - don't ask for it if possible, and if you must, provide a free-form field.

I usually compromise by having a "display name" field that's used throughout the application, and by making no assumptions about names being unique, comparable, or in any way divisible. If interaction with 3rd party systems that want split-form names is required I have an additional field for the user to enter the name they usually use in identity documents, etc, but I don't use it within my app and I make its purpose clear.