The Tech Faucet: 2010

Monday, November 1, 2010

Kogan KGN1080P32VAA - Avoid this TV and any other with a CultraView CV119MA mainboard

The Kogan KGN1080P32VAA TV is a lemon. Avoid it, and Kogan.

Support

Before I go on, I want to note that Kogan's support were responsive, if not helpful, about these issues. They acknowledged the EDID defect, and while they won't fix it or anything else that's wrong with the TV they did credit me $40 for my trouble, showing some sign they accept their error. Kogan have been much better than, say, Belkin when it comes to supporting their products. Given that they still haven't actually fixed anything, that says scary things about the industry, doesn't it? Anyway, here's why you should avoid this product, anything else by CultraView, and possibly anything else by Kogan:

720p EDID on a 1080p panel, mainboard programmed wrong

When I bought the KGN1080P32VAA I expected (and got) a 1080p TV with cheap and nasty LCD panel; this is, after all, a $600 1080p TV. Fine, you get what you pay for, and it's still a 1080p full HD panel.

What I didn't expect was an incorrectly programmed mainboard that sends the EDID for a 720p TV rather than a 1080p TV, making the TV not "full hd" at all (and in fact worse than native 720p due to scaling) for anything that respects the EDID! What does this say about quality control and product testing? I certainly didn't expect Kogan to have no interest in issuing a firmware update to fix it, given that the TV's mainboard, by physical examination determined to be the CultraView CV119MA is still in production and supports firmware updates via USB key. CultraView don't respond to queries and don't offer their tools or firmware for download without a username and password to their vendor/partner FTP site. Kogan say:

We have tried to get a firmware upgrade for you TV however we cannot get solution for this issue.

I would still suggest you to use the VGA as you get Full HD resolution on VGA without any issue.

Alas, there's only one VGA input and associated 3.5mm audio port. I didn't buy a TV with only one (working) input. Additionally, at 1920x1080 the (analog) VGA looks ... kind of nasty. This is partly explained by the picture issue described later.

HDMI audio and overscan

The problems with HDMI don't end with the EDID, though. The CultraView CV119MA, as shipped by Kogan, turns on overscan on digital inputs when it detects a HDMI audio signal. No HDMI audio, no overscan. HDMI audio, overscan. This means you can't get decent picture quality and HDMI audio at the same time - you have to endure the image butchery of overscan and overscan compensation, or have no audio on that input. Overscan makes no sense for digital inputs, where a perfect 1:1 pixel representation is sent and there are no scan line fringes, etc, to deal with. Sadly overscan does need to be an option because some older/broken digital output devices produce images with overscan for compatibility with ancient TVs, but should never be the default, and is something you must be able to disable. It certainly shouldn't be controlled by whether or not HDMI audio is being sent!

It's possible to disable overscan by disabling HDMI audio - if your device supports this. For my media PC, that requires telling the nVidia driver to completely ignore the EDID so it doesn't detect the HDMI audio capability of the output. I had to override the EDID anyway because the TV sends a totally bogus EDID with only 1080i and 720p resolutions over HDMI. There are a couple of ways to override the EDID though - I found that overriding just the resolutions left me with an overscanned 1080p image, while overriding the whole EDID got me a 1:1 pixel sharp-ish 1080p image. The difference: HDMI audio.

The TV doesn't offer a "1:1 pixel" mode in its zoom/aspect list, nor in its menus. The only way to control overscan is via HDMI audio. I'm gobsmacked.

Vendors unwilling to fix anything

Neither Kogan nor CultraView will supply a firmware update, or the tools to create one using the vendor customisation tools CultraView offer to vendors. Neither will supply the documentation for the mainboard, which has "PC DEBUG" (JTAG?) and "PC INPUT" (DB9 serial headers? Maybe?) on it and has updatable firmware. Examination of a firmware image I found on the 'net confirms that it's capable of being reflashed via USB.

Brightness controls pixel values not backlight

The problems don't end with a misprogrammed EDID and idiotic overscanning behaviour there, though. When you change the "brightness" setting in the TV, it doesn't adjust the backlight, it scales the pixel values down toward the black end, so you lose contrast and get horrid banding. While this can be written off as a "cheap TV" problem, it's one I never even imagined might be possible.

Horizontal blur at native 1080p

Worse: Even now that I've finally achieved a non-overscanned image at the panel's native 1080p resolution, the image is still crap. By creating a few stippled and striped b&w test patterns, I was able to determine that horizontal lines are sharp and clear - so there's no vertical scaling or distortion. An array of 1-pixel wide b&w vertical lines, though, are blurred and smudged when displayed on the TV, as if they've been scaled down then up, or up then down, before display.

The test image, unscaled, appears below. If your LCD display is set to native resolution and your browser isn't scaling the image, this image should appear as a block of fine vertical lines.

... and here's a photo of what you see on the Kogan TV. The connected machine was running Linux, but I see the same problem under Windows 7's (to use Media Center) over HDMI:

Here's an image I scaled down by 5%, then back up to original size, which demonstrates a similar effect:

Imagine what this does to text, typefaces, and most kinds of pattern. Argh!

It's just too crap

Before you say "It's a TV, not a computer monitor", let me quote from Kogan's product page for their new model 32" 1080p HD TV, the 1080P-BD32:

32"/ 81cm Full HD panel (1920x1080)
Full High Definition panel ensuring crystal clear TV with Progressive Scan up to 1080P. Doubles up as an excellent computer monitor.

While that text did not appear on the product page for my TV, it shows that Kogan, like me, think that it's reasonable to use modern HDTVs as monitors. In any case, I'm using mine for a lounge room media PC, not a regular PC, and just want it to work! I don't expect amazing quality, but actually displaying the pixels I send it and doing so with working HDMI audio surely isn't too much to ask?

After hours of work and several conversations with Kogan, I finally have the TV running at its rated 1920x1080 (1080p) resolution without horribly butchering the image with overscan. Of course, to get that far I had to override the EDID and disable HDMI audio. If there's no HDMI audio signal the TV plays the 3.5mm input on all HDMI channels - but that means you have only one input for all your channels, making the HDMI combined audio/video feature spectacularly useless, and rendering the TV effectively capable of only one distinct A+V input among from the 2 HDMI and 1 VGA ports. Needless to say, this sucks.

At this point the TV is annoying me enough that I'm planning on gutting my old laptop and using its 1080p capable LVDS output to drive the TV's panel directly, bypassing the TV mainboard. That way I'll be able to see how much of this is the panel and how much is that CultraView mainboard. If the panel is OK, I'll grab a little ARM or Via Nano board with a couple of HDMI inputs and a high-res LVDS output and build that into the case as an embedded media PC, getting rid of the CultraView part entirely.

Well, I've certainly learned one thing. I won't be buying from Kogan again unless I can test the product in person with my equipment before buying. They assemble products from 3rd party parts - which I knew. They don't push issues back to those suppliers and get fixes for them, which I didn't know and didn't expect. Poor show, Kogan.

Saturday, October 16, 2010

Are you a programmer+sysadmin+support person at a small company? You need to read this.

I'm a sysadmin, developer of in-house software, and end-user tech support person at a small company. I wear a lot of hats, and do everything from Java EE applications to data recovery from crashed user laptops. If you can't open that email attachment or get to that website, I'm your man.

Needless to say, this makes the programming side rather challenging. I've always thought of interruption as the enemy of effective programming, and I've observed my effectiveness falling as the rate of interruption increases.

This article gives me reason to reconsider that idea, and consider trying to plan for and work with interruption instead. I cannot recommend it enough.

http://www.stevestreeting.com/2010/09/04/work-2-0/

Friday, October 15, 2010

On SSDs

I suspect that some of the ideas behind the design of SSD drives as currently sold are rather flawed. Update: See the end of the article for an alternative viewpoint, though.

These drives are embedded computers, with their own CPUs, RAM, firmware, etc. They are much more complicated than they should be for a simple storage device, and are more like mini RAID controllers than they are like hard disks.

Notes on gdb use

I find myself looking up how to do certain things with gdb, because I use it rarely enough not to remember, but frequently enough for these to be somewhat annoying. This post notes how to handle things like signals and paging in gdb.

BoB (Belkin F1PI243EGau) DNS is still broken

Update Oct 19, 2010:Contacting Belkin sales and customer feedback and pointing out my increasing efforts to show how poorly they've performed in public finally got a response from Belkin. Admittedly the response is to say they've put the issue though to an "overseas engineer" ... but maybe something will happen. More likely, it'll stay trapped in another layer of disinterest and poor management, this time one I can't apply direct pressure to.

UPDATE Oct 6, 2010:An indirect approach, by bothering a friend who works at iiNet, got this issue through the support wall and to people who can deal with it. Belkin has been notified at a higher level too. It's a real shame that a clearly demonstrable issue like this got stuck behind support people at both companies, to the point where I had to bother a friend who shouldn't have to deal with this stuff just to get the issue looked into. Sometimes tech support acts as a barrier that prevents a company from finding out about real problems, an issue I've seen not only with iiNet and Belkin but with endless other companies. Anyway, hopefully Belkin will be getting onto this now.

A year ago, I reported to iiNet that AAAA lookups in the BoB (F1PI243EGau) DNS forwarder were always timing out, rather than returning SERVFAIL or correctly forwarding the query to the upstream server. This is the cause of the slow browsing issues reported for the BoB.

A couple of months ago I got hold of a pre-release firmware that fixed this, and about a month ago the firmware was finally put up for public use. This fixes the AAAA issues, so browsers like Firefox and Safari on IPv6-capable operating systems like Mac OS X and Windows 7 don't take an eternity to resolve every DNS query.

Unfortunately, Belkin didn't take this as a hint to properly test their DNS forwarder. They fixed AAAA lookup, but didn't fix it to return SERVFAIL when it encountered something it didn't understand, and failed to test other record types like TXT and SRV.

Sure enough, TXT and SRV lookup have the same problem. This is currently causing problems with Google Talk (using Pidgin) that require the configuration of a fallback connect server to bypass TXT record lookup.

Belkin support do not understand the problem. The low-level support folks at iiNet don't seem to get it either. Neither are passing the problem on to somebody with the experience and knowledge to understand the problem, and neither seem to have access to suitable hardware - or the inclination to use it if they do - to verify the issue.

Here's the explanation I sent to them.

Exception breakpoints

Do you have a StackOverflowError thrown deep inside some (probably 3rd party) code like EclipseLink's Criteria API? Are you trying to track down exactly where it's being thrown and under what circumstances? Is it driving you insane?

Use an exception breakpoint. In Netbeans: Debug -> New Breakpoint. Breakpoint type "Exception". Specify the exception class being thrown, constrain the throwing class if desired, and you're set.

This made my life a lot, lot, LOT easier recently. Thanks again NetBeans.

Sunday, August 29, 2010

Java EE apps as first-class citizens

Juliano Viana writes that he hopes to see Java EE applications promoted to first-class citizens in Java EE 8.

His comments are focused on protecting apps running in a container from each other, and the container from the apps. The primary focus of the post is on resource leaks. He's interested in making it possible to cleanly and totally unload an app, much as you can kill a process on a regular OS to completely clean it up.

This lead me to an interesting realization, though one I'm sure others have had before:

Java EE containers are in some ways a lot like MS-DOS, Mac OS 6/7/8/9, Symbian, etc ... applications share the same memory space and have access to each other's innards. There's polite agreement not to poke where they don't belong, but no strong enforcement of that. Without pointers and the resulting fun with corrupted stacks and dangling pointers leading apps to trample all over each other's memory the JVM and Java EE don't have it quite so bad - but there are still real issues with access control, resource ownership, etc.

Currently a Java EE app can delve into the innards of the container if it wants. It's not *meant* to, but it's not easily prevented and it makes the container's integrity less trustworthy. An app can break the container or cause it to misbehave. Expecting apps to be well-written is, as demonstrated by the OSes listed above, a great way to get a reputation for being unstable and generally crap. More importantly, as the referenced article notes, Java EE apps can cause the container to leak resources, or can disrupt resources the apps shares with other apps via the container, like pooled connections. That makes it hard to share one container among many different apps, let alone different users/customers.

This (and garbage collection + resource use issues with big JVMs) makes admins reluctant to have many apps hosted in a single container, so you land up doing all sorts of icky inter-JVM work to make your environment managable. Shared-hosting services with Java are expensive and uncommon due to related issues.

I certainly think that much stronger protection is needed to isolate the container's private core from apps, and apps from each other. I'm not at all convinced that trying to build that isolation entirely in software in the JVM is best, though. Modern successful OSes isolate apps using *hardware* features to do most of the grunt-work and protect them from implementation bugs; apps and the kernel are isolated from each other by default and can only interact by configured exception. Protected mode, separate logical address spaces, and the security ring system in x86 provide rather strong OS/app isolation. Trying to implement similarly capable isolation purely in software in the JVM would be bug-prone and hard to get right, kind of like the half-baked and buggy app isolation in Windows 3.x or even 9x.

Perhaps a more viable future approach would be to let multiple JVMs running on a host integrate more tightly via shared memory, high performance IPC, or whatever, so the container could run in its own JVM and each app could have a private JVM, with only those resources *explicitly* shared between them accessible across JVMs. That way, apps cleanup would be as simple as killing the app's jvm and letting the OS+hardware clean up the mess. The exposed surface for problems would be restricted to that part of the container that's explicitly accessible via the shared resources.

Or ... maybe the JVM will need more OS co-operation than that, so the JVM its self can use hardware features to isolate apps running within the JVM. I can't imagine OS designers wanting to let the JVM get its hooks into the kernel's memory mapping and process management, but with the advent of VMs and VM-friendly CPU extensions like VT-X and VT-IO I wonder if the JVM could use those instructions, like the kernel of a guest OS does in a VM, to isolate apps running within the JVM.

Much as I'd love the JVM to be able to isolate apps properly, in my admittedly pretty ignorant opinion I don't see it happening without some kind of major re-design. I imagine a half-assed solution that works about as well as Windows 9x did is possible with the current JVM, but giving the impression apps can be isolated and cleaned up without being able to really do a comprehensive job of it is IMO in many ways worse than the current situation. Without some really powerful, low level isolation features I don't see that happening. Look at how well "shared" OSes have typically achieved app isolation for examples of just how hard the problem is.

I find the idea of using hardware VT-X support to help the JVM isolate apps in an EE container quite intriguing. I wonder if it's something anyone's done any investigation of.

Thursday, August 26, 2010

Three hours in

I'm three hours or so into my real work, after being side-tracked isolating and reporting Glassfish and Mojarra bugs. I just found another one in Mojarra, albeit a trivial one.

(Edit) I'd written a little rant about the problem I was trying to solve, in the assumption that I'd hit another limitation/bug in JSF2. Well, this time I was just plain wrong; I'd used an incorrect test to draw an invalid conclusion and from there ran down the path - from my previous experience, admittedly - of assuming that this, too, was a tools issue.

The correct answer is here: To permit null/empty/none selection in a h:selectOneMenu driven by a f:selectItems , just add a f:selectItem with no value before the f:selectItems.

Wednesday, August 25, 2010

Converting a NetBeans-generated JSF2 project to use CDI

I started playing with Netbeans 6.9's JSF2 interface generation for database apps, and quickly noticed that (a) it uses JSF2 injection, not CDI and (b) it generates some verbose, ugly converters.

I decided to switch it over, as a bit of a learning experience. Here's what landed up being necessary.

Untangling Java EE 6 - a broad conceptual overview

WARNING

This document was written by someone who's only learning Java EE 6. On one hand, I remember the frustrations clearly and still encounter issues of understanding (and bugs) regularly, so I'm in a good position to think about how a newbie needs to have things explained. On the other hand, a bunch of this is probably wrong because my own understanding is still developing. Beware.

This is also a work in progress. It's been published for comment and review, and isn't a fully edited and checked final work.

If you're coming from Spring, rather than starting with Java EE cold, you should start here.

Java EE 6

Java EE 6 is not a product you can download and install. It is a specification that describes how implementing software should behave. It refers to many sub-specifications for particular features, each of which has one or more implementations in software. Some of these implementations can be used outside a full Java EE 6 environment as standalone products.

Java EE 6 doesn't "just work" in even trivial real-world uses

Lincoln Baxter III writes that with Java EE 6, It all “Just works”.

Personally, I have not found that to be the case in even simple, toy applications.

I suspect it'll be true in a few years, probably after a much needed "Java EE 6.2" and some major bug-fixing. Right now it all "just works" ... except when it doesn't, which is frequently, due to spec oversights, bugs, and a general lack of real-world use and testing.

Simple Java EE 6 project fails in different ways on Glassfish and JBoss AS 6

(Update: this issue was caused by a Glassfish bug. See Bug 13040 in the Glassfish tracker.)

I'm coming to really like Java EE 6 ... or at least, I would if anything worked properly. There seem to be an awful lot of quirks and bugs in even fairly basic functionality, though.

Take this simple project, where a @Stateless EJB that inherits from an abstract base is injected into a regular JSF2 backing bean. It's done two different ways - once using the new CDI/Weld injection facilities, and once with old-style JSF2 injection and EJB container management.

One works on glassfish, the other on JBoss. Neither works on both.

The glassfish failure is caused by a problem resolving methods inherited from superclasses in the Weld-generated local no-interface EJB proxies. The JBoss issue appears to be caused by issues with its local no-interface EJB support, too, though I'm not sure exactly what yet.

The point: Local no-interface EJBs were supposed to save time and pain for developers, yet all I'm finding so far is bugs, or at least quirks and inconsistent behaviour.

Sources to the sample can be found here and a deployable war is here.

Posted on StackOverflow and The Glassfish forums

Monday, August 16, 2010

Linux (ubuntu 10.04) ppp maxfail 0 and persist not working? (WORKAROUND)

I've been wondering for a while why, after an upgrade to Ubuntu 10.04, the ppp daemon on my router/server had stopped automatically reconnecting after it lost ADSL line sync or otherwise had its connection dropped.

The culprit turns out to be a helper script (/etc/init/network-interface.conf) in upstart that tries to handle hotplugging of network interfaces, automatically bringing them up when plugged in and down when unplugged.

Unfortunately, it views ppp interfaces as hot-plugged, and helpfully calls ifdown on them when they dissappear after a connection is lost. This kills pppd, preventing it from creating a new ppp interface and trying to connect again.

Unlike most interfaces, ppp interfaces are created and destroyed as a consequence of ifup and ifdown calls. Well, really pppd invocation, but that's usually done via ifup/ifdown these days. Calling ifdown when a ppp interface vanishes isn't always wrong, but it's wrong if the pppd is set to "maxfail 0 persist".

As the script isn't smart enough to know that, and I don't hotplug interfaces on my router anyway, I've opted to simply disable the script in question by renaming it to /etc/init/network-interface.conf.disabled. That'll be broken whenever the upstart package is updated, though, so a better solution is required.

Unconditionally ignoring all ppp interfaces in the script isn't necessarily right, as it would be nice to ifdown them cleanly when the pppd has exited and is thus no long retrying. It's hard to do that reliably as the interface is destroyed before pppd actually terminates, and it's hard to query pppd to find out if it plans to quit or retry. Arguably the retry logic should be moved out of pppd and into upstart or network-manager, but with the state of those tools at this point that's a recipe for pain and suffering.

In the mean time, you can work around it by disabling the problematic init script.

Sunday, August 15, 2010

simple script to extract Stanza epub ebooks from an iTunes backup

iTunes and the iPhone don't like to let go of data they have their claws into. Sometimes, however, you might need that data outside the Apple Garden, in which case you're going to have to get your hands dirty diving in iTunes backups, as they're the easiest way to regain control of your files.

(Tip: if you need to fight your software to access your own files, your platform is hostile. I don't use the iPhone myself and loathe the way Apple does things, but sometimes I have to work with it for other people. Hence this post.)

I needed to recover some Stanza ebooks from an iPhone. It's hard enough to get them *on* to the phone, and getting them off is nigh on impossible, as Apple continuously changes things to make it hard to access the phone via anything but iTunes. In this case, though, the latest change (to the backup format) made it easier to work with, not harder, so the extraction wasn't too hard.

I'm not smart enough to use Java EE

I don't seem to be smart enough to use Java EE.

Reading the Java EE tutorial certainly helps. It's very good at explaining a lot of the how, though not as much of the why; there isn't a strong overview of what EE is and how it fits together, but there's plenty on how the parts work and how to use them. Before sitting down for some quality reading time, I was totally lost. This is not something you can dive head-first into. After reading the tutorial I'm only very confused. [Edit 2012: The tutorial seems to have more overview info now, or I understood it better this time around. It's still hard to see what uses what, builds on what, and relates to what from the tutorial though.]

UPDATE 2011: I seem to be managing despite the learning curve and the quality challenges, and am trying to help others avoid the same suffering I went through by providing some overview level documentation on how it all fits together and what all the damn acronyms mean.

UPDATE mid-2012: It's two years since I wrote this, and sometimes I still struggle with the complexity of the Java EE stack. JSF2 in particular feels significantly over-engineered and doesn't have documentation to match its complexity; it feels as complicated as the rest of Java EE put together without close to as much useful documentation (and lots more bugs).

The remainder was written in August 2010, not long after I moved from Swing development over to Java EE 6 - unbeknownst to me, just as Java EE 6 implementations were in their infancy and not in any sense stable or production ready.

Problem 1: Acronym soup

Seam. Struts. Tapestry. Weave. Weld. JSF. JSF2. JPA. JPA2. EJB2. EJB3. CDI. Hibernate, EclipseLink, etc. J2EE. JARs in OSGi bundles in WARs in EARs. Glassfish. JBoss. Tomcat. Jetty. @Inject. @EJB. @Resource. @PersistenceContext. @javax.annotation.ManagedBean. @javax.faces.bean.ManagedBean. @Named. @Stateless. @Stateful. Spring. Servlet filters. @javax.faces.bean.RequestScoped, javax.enterprise.context.RequestScoped, @javax.faces.bean.SessionScoped, javax.enterprise.context.SessionsCoped, @javax.faces.bean.ApplicationScoped, javax.enterprise.context.ApplicationScoped, javax.enterprise.context.ConversationScoped, @javax.faces.bean.ViewScoped. @Singleton. DAOs and data access layers. JTA. JNDI. JMS. RMI. Deployment descriptors (web.xml, etc). Vendor deployment descriptors (sun-web.xml, sun-resources.xml, etc). Various config files, some of which have an effect even when empty (faces-config.xml, beans.xml). Maven plugins, oh god the maven plugins. RichFaces, MyFaces, IceFaces, PrimeFaces, SeamFaces.

Java EE makes me understand how "non computer people" feel when listening to programmers talk.

Problem 2: Bad or incomplete documentation, narrow-view docs without an overview

For a newbie, it's extremely hard to understand the layering, and made even more confusing by the mixing of specs (eg JAX-RS, JSF2) with implementations (eg Jackson or RestEASY, Mojarra or MyFaces). Each project's documentation talks largely about its self in isolation.

Most of the documentation, even the Java EE 6 tutorial, focuses on narrow parts of the spec without ever trying to help you understand how it all fits together. Some overview documentation that focuses on the basic concepts is sorely needed.

It doesn't help that the JSF2 spec is [wording edited] a very dense document that specifies but does not document or explain JSF2. The assumption seems to be that you'll get a book for JSF2[/end edit]. [Added 2012:] Many recommend Core JavaServerFaces over JavaServer Faces 2.0, The Complete Reference; I certainly found the former helpful.[/end added]

Problem 3: Legacy, duplication, and specs created in isolation

Part of the problem is that there are several generation of "technology" that are mashed together in an overlapping mess. For example, J2EE6 uses CDI with Weld, but everyone already does this with Spring, which also does all sorts of other things so you might want to still use it even though you're doing DI with Weld in a J2EE container. Your objects might be being managed by some combination of JSF2 @ManagedBean lifecycle management, @EJB lifecycle management, Weld, or (if you're using it) Spring.

Or does JSF2 actually use CDI to manage its beans via Weld? Who knows! (Update: the answer is "Maybe", btw. If you're using CDI, JSF2 beans are managed with CDI as implemented by Weld. Otherwise they're managed by JSF2 its self.) Every framework uses every other framework or has semi-transparent integration hooks for them - hooks that mostly work, but with some subtle bugs or limitations. Every framework claims to make everything simple, while adding yet another layer of nightmarish complexity. Specifications inevitably have a reference implementation from Sun/Oracle/JBoss, and an Apache implementation, so you have to figure out what the relationship between the implementation names and the spec name is for each, what they actually do, which implementation you might want to choose and why, etc.

When trying to learn about this stuff, it's hard to find materials that don't already assume you know the older technologies and just want to update to the newer ones. Hell, it's hard to figure out what the older ones and newer ones even are, or why you should want to use one over the other.

The J2EE platform releases are supposed to help with that, but seem to add complexity more than remove it, due to the need for backward compat, support for older frameworks and code that uses them, the messy overlap between different specs, the way different specs each added feature like injection in different and incompatible ways, etc.

Problem 4: Too much magic, not enough visibility when (not if) the magic breaks

There's so much transparent management of application state, object lifetimes, request handling, etc that it becomes quite hard to figure out what your app is actually doing, how and when to do something, or how to alter some part of the automatic behaviour when you need to. As for debugging... good luck following your app though a JSF2 request lifecycle.

Heaven save you if you forget to put an empty beans.xml magic marker file in a project. It'll be NullPointerException soup with absolutely NO indication that CDI is turned off. Obvious and simple with experience, hell when I first got started.

Struggling? You're not alone

On the other hand, it makes me feel better when really, really smart people point out that all this architecture makes it hard to see what you're actually trying to do.

... and all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.

.... and really, reading the manual makes it a little more approachable. Entirely avoiding JSF2 in favour of something sane like Vaadin or even Play if you're so inclined will help even more.

Wednesday, August 4, 2010

Great service providers I use

A few of the service providers I use deserve some credit for great work. They are:

Linode.com, a great VPS host with excellent prices, good service, and really, really good tools for VPS management and control. You'll never need to call support because you broke SSH or installed a busted kernel ever again, since you have a web-based virtual serial console and full bootloader control over the web interface. Check them out, they're excellent.
SimpleCDN.com, whose mirroring CDN service "Mirror Buckets" is truly incredible. I put about fifty bucks into it six months ago, and still haven't run out. It's about two thousand times cheaper than my old host's traffic charges - though admittedly that was an Australian VPS host with typically Australian bandwidth-prices-of-doom.
Internode, who are the only Australian ISP at time of writing who're offering native IPv6 to regular ADSL customers. They're otherwise pretty decent within the limits of a cut-price cut-throat industry. Imperfect, but they're in telecommunications, and compared to the telco/ISP average they don't even rank on the suck chart.

Monday, August 2, 2010

All hail GNU Parted

A user bought in a dead laptop today. It was booting into Dell MediaDirect (an XP Embedded OS stored on the same disk as the main Windows XP install) instead of Windows XP, then bluescreening early in MediaDirect boot.

The user had accidentally hit the hardware button that boots the machine into MediaDirect instead of XP.

The Java needs language-level property support (and a reset)

I find myself increasingly bewildered by some of the design choices in the Java world. The core language's use of accessor methods is a notable example.

Not that properties are the only bewildering Java design choice.

Sanity in the J2EE / JSF world

I was very impressed with this tutorial from CoreServlets.com, which explains JSF and JSF2 development without assuming that you know all the Java server-side acronyms and mess already.

It also has a rational, calm discussion of web framework selection, covering Google Web Toolkit (GWT), Wicket/Tapestry, JSF2, etc that focuses on what each is good for, rather than trying to name some One True Framework. In my reading on the J2EE world thus far, this is unique.

Another article that anyone looking to get into JSF2 should read is this DZone article summarizing how to handle some common tasks in JSF2. It won't make much sense until you've read some basics about JSF, like the tutorial linked to at the start of this post, but it's a great overview and refresher once you have.

Of course, as others have said very well indeed, getting into the Java Enterprise programming world of JSF2 etc may be a mistake.

Sunday, July 25, 2010

A message for SCO OpenServer users

I've seen a few queries on the PostgreSQL list from people who want to run PostgreSQL on SCO OpenServer, or upgrade ancient versions of Pg to modern ones on SCO OpenServer boxes. Some want to solve issues with connection count limits on their elderly SCO installs, so they can increase client counts above 100 or so. Sometimes people even want to upgrade to a new (ie post-1995, not truly new) SCO OpenServer release and want to know how Pg will cope.

I have a message for those folks, and the management behind them who're usually the ones pushing to stay on SCO.

Your boss may not realize that SCO basically dropped OpenServer as a product line in favour of UnixWare in the late 90s. Since then there was no significant work done on OpenServer. There's been no work done on it at all (as far as I can tell) since Caldera bought the SCO name and OpenServer product from the original Santa Cruz Operation, fired all the software engineers, hired some lawyers and sued world+dog. The Santa Cruz Operation renamed themselves Tarantella after their primary profitable product and went on with life, but "SCO" as a company is history.

OpenServer is dead, dead, dead. Any money put into products targeting openserver is a sunk cost, and you can't change that, but you should really avoid sinking more money into that mess. If your management is still sticking to OpenServer, they should probably read about escalation of commitment, a decision making tendency that's very dangerous and very easy to fall into if you don't think about it carefully.

Upgrading from 5.0.5 / 5.0.7 to 6.0 is like upgrading from Windows 95 to Windows ME in 2010. Or Mac OS 7.1 to Mac OS 9.2. You're upgrading from the corpse of an operating system to one that's still twitching feebly. This is not going to be a good way to invest time and money.

In case you think I'm just a Linux zealot flag-waving, I have a SCO OpenServer 5.0.5 box in the back room, running business critical applications. The apps are actually for Microsoft Xenix (yes, 1983 binaries) running in the Xenix persionality on OpenServer. I considered a port to OpenServer 6.0, but realized it was just slightly delaying the inevitable move to something modern.

So ... I keep it running - in VMWare*, since 5.0.5 runs about ten times faster as a VMWare guest on a Linux host than it does natively on the same hardware. It's faster because SCO doesn't use much RAM for disk cache, doesn't readahead, and is generally just sloooooow in its disk access and memory use strategies. The Linux guest in a vmware setup can cache the whole SCO OS and apps disk in RAM, making the whole setup much faster. It seems more stable under VMWare than running natively on modern hardware, too.

I'd recommend you do much what I've done. Move your SCO instances to VMs running under Linux. Provide modern PostgreSQL on the Linux host, and just compile libpq for the SCO guest. Then start work on migrating your app to run natively on Linux/BSD/Solaris/whatever.

* SCO OpenServer doesn't seem to run under KVM or qemu due to bugs and limitations in their SCSI emulation. VMWare Server is free, so just use that until you're free of SCO.

Thursday, July 22, 2010

Java "Enterprise" tools: every problem half-solved with twice the required complexity

The more I work with Java, the more I feel like it's a great core language surrounded by endless onion-like layers of increasingly awful frameworks and tools.

The Netbeans platform. JPA2. Hibernate / OpenJPA / EclipseLink / Swing. EJB3. OSGi. With tools like those, you'd think that a simple task like writing a business database application would be easy, right?

Ha! You spend so much time working around problems created by these tools' solutions to other problems that you're lucky to get any actual work done on your app.

Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly. -- Steve Yegge (2007, Codes Worst Enemy)

Examples demonstrating the use of JPA, Hibernate, etc are full of assumptions and limitations that render them useful only for utterly trivial toy projects. For tools that're touted as making it easy to build "scalable" apps, I find this interesting. One Hibernate+JSF tutorial, for example, writes:

One of JCatalog's assumptions is that no more than 500 products are in the catalog. All product information can fit into the user session.

We're building an Enterprise Application with n-teir Enterprise Architecture ... that's limited to many fewer items tracked than there are lines of code in the project. Does this seem off to you? If it's so easy to build scalable data access with Hibernate, why isn't it done in demoes and tutorials? My own experience suggests that's because it's not actually at all easy, it's painful and complex if possible at all.

This post expresses much of what I'm trying to say below, and expresses it much, much better than I can.

JPA (and Hibernate/EclipseLink/OpenJPA/etc) in desktop apps is awful

Using the JPA API in desktop apps seems like a great idea. It provides a standard interface to Java ORM systems, letting you use annotations to map database entities to beans in a quick and convenient manner. Implementations may be plugged in as required, so you can switch from Hibernate to EclipseLink under the hood with little pain and fuss.

You'll find lots of examples and tutorials on the 'net that demonstrate how to use JPA to build simple "CRUD" apps with various frameworks, like the NetBeans platform, the (now defunct) Swing Application Framework, etc. They make it look like a quick, convenient and easy way to develop database access apps in Java.

There's just one wee problem... they don't work in the real world. With desktop apps, that is; I'm sure Hibernate and friends are great on the server in a 3-tier Enterprise Application with a suitable army of coders.

I'm on the verge of recommending that people by Mac laptops

.... solely because the amount of work required for Joe Average to take your average Windows laptop and strip the crapware out is too high.

How to make SCO OpenServer 5.0.5 printing work

SCO OpenServer's printing is more than a little bit broken. lpsched tends to stop responding, and lpd seems to decide it doesn't feel like processing any jobs after a while. Killing lpsched and lpd then re-starting lpsched seems to solve the issue ... for a while.
I eventually got sick of it, and replaced the SCO print system entirely. I would've liked to use CUPS, but getting even the CUPS client to build on SCO was going to be an exciting effort in porting. The fact that one of the SCO patches broke the C compiler didn't help any, either.
In the end I wrote a simple Python script to relay print data to a Python server running on a modern Linux box, where the print data is handed to CUPS for printing. Since replacing the print system on the SCO box I've not had to intervene to fix SCO printing even once.
On the chance that others are stuck running this elderly operating system for their own scary legacy apps, I'm publishing the scripts here as a base for others to adapt to their own uses.

Brain-dead Bob

iiNet's BoB (actually a Belkin F1PI243EGau) isn't very bright. For a highly capable device, "he" has some bad habits that just aren't being trained out.

UPDATE September 2010: Belkin/iiNet have finally released a firmware with a fix for the broken IPv6 lookup / DNS delay issue. See Firmware 1.2.37.86 in support answer id 2498. It's taken a year too long, but they got there in the end, and you don't have to ask for the Super Top Secret firmware from support anymore.

UPDATE2 October 2010: Belkin's firmware update fixed AAAA records, but SRV and TXT records are still broken. Morons. Same thing: TXT and SRV queries just time out. This is less than impressive QA, especially after being informed of the AAAA issue. One would think that Belkin would have a test suite to verify basic DNS support in their products that's run as part of the release process...

Core 2 Duo CPU whine - workaround for Linux

Do you use Linux on a laptop and fear you're going to be driven totally insane by the high-pitched whining noise (squeal/screech) coming from your Core 2 Duo CPU's badly designed cpu packaging and voltage regulation? Like proximity to your laptop gives you tinea?

Me too. A workaround that'll cost you a little battery life (but probably not more than a few tens of minutes out of four-plus hour runtimes) is to add processor.max_cstate=3 to your grub command line. This turns off the problematic C4 power state of deep CPU sleep at reduced voltage, but otherwise leaves power management unaffected.

( For Ubuntu users, dpkg-reconfigure grub2 will offer to let you edit the kernel command line. )

Thursday, May 20, 2010

Using PKCS#12 client certificates with PgJDBC

(This post is a reference copy of a mailing list post I made explaining how to use client certificates with PgJDBC if you wanted to be able to accept user-provided PKCS#12 files.)

Using IET (ISCSI Enterprise Target) for Windows Server Backup

Windows Server Backup in win2k8 server is fantastic - it's a consistent, snapshot-based automatic backup system capable of full disaster recovery / bare metal restore. I'm not a huge fan of much of the way the Windows servers work, but the backup setup is fantastic. With one wee flaw...

Manual backups may be made to a network share, or to a local volume then copied to a network share. Fuss free, but only with operator intervention.

Unfortunately, automatic scheduled backups require direct access to a drive, they won't work on a mounted NTFS volume or on a network share. This doesn't do me much good for disaster recovery, as even a USB2 or FireWire drive nearby has a good chance of being destroyed by anything that takes out my server. It rained (and hailed) in my server room last month, so I'm taking disaster recovery even more seriously, and a nearby HDD just isn't good enough.

I could run a FireWire 800 drive over cat5e to the near-site backup location, but that's surprisingly expensive to do, especially as I want redundant storage to protect against pesky HDD failures. I have a perfectly good Ethernet-connected Linux server with a 10TB RAID array running Bacula to back up everything else on thge network, and I'd prefer to just use it for Windows Server Backup too.

The solution: Win2k8 has a built-in iSCSI initiator. Simply turn the backup server into an iSCSI target, then use Windows 2008's built-in iSCSI initiator to connect to it so Windows Server Backup sees it as a local disk and can write backups to it. This turns out to be astonishingly easy, at least on an Ubuntu system.

Security notice

The following configuration does NOT authenticate the windows server to the iSCSI target via iSCSI mutual authentication, so it may be possible to trick the server into backing up onto a different server and "steal" the backup. It also passes the actual backup over the network in the clear, as it doesn't use IPSec. You may wish to address those limitations in your implementation.

It would be a very good idea to enable mutual authentication, but by time of writing I was unable to get it working. The win2k8 iSCSI initiator complained about secret length, even though the provided secret appeared to match its criteria and had been entered in the main part of the control panel where the mutual authentication secret is expected. Similarly, IPSec wouldn't be a bad idea to prevent your backups passing over the network in the clear.

Configuring the iSCSI target

First, install the ISCSI Enterprise Target software (IET):

apt-get install iscsitarget

Now provision a volume to export as a target. This may be a local raw disk or partition, a logical volume provided by LVM, or even a great honking file on one of your mounted file systems. I'm using LVM, so I'll just allocate a logical volume:

lvm lvcreate -n winimagebackup -L 300G backupvg

There is no need to format the volume; Windows does that. Just export it via iSCSI by adding a suitable target entry to /etc/ietd.conf (it might be /etc/iet/ietd.conf on your system):

 Target iqn.2010-01.localnet.backup:winimagebackup
        Lun 0 Path=/dev/backup/winimagebackup_iscsi,Type=blockio
        Alias winimagebackup
        IncomingUser iqn.1991-05.com.microsoft:winhostname xxxx

See the comments in the default ietd.conf and the ietd.conf man page for details on this. In brief:

Change "localnet.backup" to the reversed host and domain name of your target server's name (mine is called "backup.localnet").
Change "IncomingUser" to the user name you want the Win2k8 server to have to give to be permitted to connect, and "xxxx" to the password you wish to require. By default a 2k8 box will give the above user name, with "winhostname" replaced with the win2k8 box's hostname.
Set the path after "Path" to the location of your storage.
If you're using a file, you may need to specify "fileio" instead of "blockio" as the Type.

Restart ietd, and you're ready to connect the 2k8 box.

Connecting 2k8 to the iSCSI Target

Connecting to the target from win2k8 is similarly trivial. In the iSCSI Target control panel, in the "discovery" tab enter the dns name or ip of the target. Do not configure authentication (unless you've deviated from the ietd.confabove), just accept the dialog.

The server should appear in "target portals" and no error should be displayed. If successful, go to the "targets" tab, where you should see a target named "winimagebackup". Click "Log on..." to connect to it. Check the option to restore the connection at boot-time. Under Advanced, Configure CHAP authentication, using the password given in ietd.conf for IncomingUser under the target winimagebackup. Do not enable mutual authentication*. Accept the dialog, and the status of the volume should change to "connected".

Configuring Windows Server Backup

You're now ready to use Windows Server Backup with the volume. You do not need to format it under the disk mmc snapin before use. Just fire up Windows Server Backup and click "Backup Schedule", then follow the prompts, picking the iSCSI target as the backup storage when prompted.

Monday, May 3, 2010

Ubuntu Lucid and LTSP thin clients - wow

For a while I've been suspecting that most people who do any Linux-based GUI develpoment neglect remote X11 and thin client considerations. Performance has been decreasing slowly and painfully, and more and more workaround have been needed to get systems to behave.

Well, no longer. The new Ubuntu release, lucid, with the latest gtk etc absolutely screams along. I'm seriously astonished that remote X11 can be this fast, given the round-trip-happy nature of the toolkits and the protocol's flaws. It's a real pleasure to use.

To everyone involved in gtk development - thank you!. Doubly so to those who looked into my bug reports on specific areas where gtk used excessive network round trips, particularly the Evolution compose window bug.

By the way, for those of you who deploy LTSP, another thing that'll make a huge difference to performance is to make sure your LTSP clients are on a private VLAN and then enable direct X11 communication between client and server with LDM_DIRECTX = Y in lts.conf. With this option you still use ssh to login and establish a session (so there's no godawful XDMCP to fight), but only the ssh login is tunneled and encrypted. During session setup, DISPLAY is redirected to point directly to the client's listening X server. This offers a huge performance boost, especially to slower/older clients without onboard hardware crypto engines. (Oddly, the 600MHz Via C3 boxes outperform the Intel Core 2 boxes when all X11 comms are encrypted).

Friday, April 30, 2010

Splash screens are a plague upon Linux systems

How many splash screen tools can you remember appearing in at least one released version of at least one distro?

Too many, I bet, and all of them buggy.

In recent Ubuntu alone both usplash and plymouth have come to plague users. These tools purport to make bootup "friendly", but in fact:

Cover up important messages/warnings
Make recovering from issues during boot well-nigh impossible
Make boot-time fsck fragile and hard to interact with
Interact poorly with graphics drivers (xorg or kms)
Are much, much too hard to disable

Unfortunately, they don't even have a standard mechanism for disabling them. "nosplash" on the kernel command line used to work most of the time, but Plymouth displays a splash screen if the sequence "splash" appears anywhere in the kernel command line - including mid-word. It'll merrily accept "nosplash" as a request for a splash screen. With plymouth you must instead omit "splash" from the kernel command line entirely - and woe betide you if something else you need happens to include those characters!

Even better, Plymouth can't be uninstalled without ripping the guts out of an Ubuntu lucid system completely. It's wired deep into the package dependency system.

Argh. I'm coming to dread distro upgrades because I have to learn how to get rid of the blasted splash screen all over again. If only they'd stay uninstalled...

Saturday, April 24, 2010

Intel, where's my EFI Network/USB/FireWire Target Disk Mode?

I'm torn.

I really don't like working with Mac OS X (or Windows, for that matter - I'm a Linux user by preference) .... but I love some features of Apple's hardware. The build quality and disk bays of the Mac Pro, for example. But above all else, what I love and envy about macs is ... Target Disk Mode. It's a service tech and sysadmin's dream.

Intel likes to make a lot of fuss about all its fancy in-chipset managment features, yet it seems to lack that one most crucial and handy feature - a Target Disk Mode equivalent. C'mon Intel, you can do better than this! You can not only implement FireWire target disk, but Ethernet-based iSCSI Target Disk for true sysadmin heaven. For bonus points, add TPM support so only authorized service techs for the company can get in, and use the built-in network management features to let admins remote-reboot a machine into target mode.

Sadly, I suspect the reason we're not all using this is that the "good" (cough, cough) old PC BIOS is still malingering, and failing to decently give way to EFI/OpenFirmware/whatever like it should.

Friday, April 23, 2010

Use Linux software RAID? Schedule periodic RAID scrubbing or lose your data

This post is bought to you by the fun of unnecessary wasted time and work rebuilding a server after a double-disk RAID array failure. RAID scrubbing is essential - and is supported by Linux's software RAID, but not used without explicit user action.

Linux's `md' software RAID isn't unique in this, but as its use is so wide spread it's worth singling out. No matter what RAID card/driver/system you use, you need to do raid scrubbing. Sometimes RAID vendors call this a "patrol read", "verify" or "consistency check", but it's all generally the same thing.

Why you should scrub your RAID arrays regularly

I had a disk fail in the backup server (some time ago now). No hassle - replace it, trigger a rebuild, and off I go. Unfortunately, during the rebuild another disk was flagged as faulty, rendering the array useless as it had a half-rebuilt spare and a second failed drive.

You'd think the chances of this were pretty low, but the trouble is that the second failed drive will have developed just a couple of defective sectors (a SMART check confirms this) that weren't detected because those sectors weren't being read. Until the drive was sequentially read during the rebuild, that is.

To reduce the chance of this, you can periodically verify your arrays and if bad sectors are discovered, attempt to force them to be remapped (by rewriting them from redundant data) or failing that fail the drive. This will also detect any areas where different disks disagree on what the correct data is, helping you to catch corruption caused by failing hardware early.

Unfortunately, Linux's software RAID doesn't do this automatically.

A simple shell script like this, dropped in /etc/cron.weekly and flagged executable, will save you a LOT of hassle:

#!/bin/bash
for f in /sys/block/md? ; do 
    echo check > $f/md/sync_action
done

Make sure to TEST YOUR EMAIL NOTIFICATION from mdadm, too. If a drive fails and you never get notified, you're very likely to lose that data the moment anything else goes wrong.

Use S.M.A.R.T too

For extra protection from failure, install smartmontools and configure smartd to run regular "long" tests of all your RAID member disks, so you're more likely to discover failing disks early.

Unfortunately, many consumer-oriented disk firmwares lie to the host and try to cover up bad sectors and read errors - probably to reduce warranty costs. Manufacturer's disk tools tend to do the same thing. Some even seem to lie during S.M.A.R.T self-testing, re-allocating sectors as they find bad ones and then claiming that everything is fine. In fact, I've actually had a consumer SATA drive that can't even read sector 0 return PASSED when queried for a SMART general health check, though at least it failed an explicitly requested self-test.

My point is that SMART testing alone isn't sufficient to ensure your disks are trustworthy, you really need to use a redundant array with some kind of parity or data duplication and do proper RAID scrubbing. And, of course, good backups.

If you use "RAID friendly" disks (usually the same physical drive with honest firmware and a much bigger price tag) you shouldn't have as many issues with SMART self-tests.

Battery backed cache for Linux software raid (md / mdadm)?

Linux's software RAID implementation is absolutely wonderful. Sufficiently so that I no longer use hardware RAID controllers unless I need write caching for database workloads, in which case a battery backed cache is a necessity. I'm extremely thankful to those involved in its creation and maintenance.

Alas, when I do need write-through mode (write caching), I can't use mdadm software RAID. There's actually no technical reason hardware already on the market (like "RAM Drives") can't be used as write cache, it's just that the Linux `md' layer doesn't know how to do it.

I say "it's just that" in the same way that I "just" don't know how to fly a high performance fighter jet with my eyes closed. Being possible doesn't make it easy to implement or practical to implement in a safe, reliable and robust way.

This would be a really interesting project to tackle to bring software RAID truly on par with hardware RAID, but I can't help wondering if there's a reason nobody's already done it.

Are you wondering what write caching is, why write caching is so important for databases, or how on earth you can safely write-cache with software RAID? Read on...

LVM vs VSS - it's no contest

Richard Jones writes about Microsoft's Virtual Shadow Copy Service (VSS, not to be confused with Visual Source Safe), and laments the lack of any equivalent or good alternative on Linux servers.

I couldn't agree more, but there's more to it than what he focuses on. Application consistency is in fact the least of the problems.

APP CONSISTENCY

App consistency and pausing is missing in Linux, but is in principle not hard to introduce. It is something where D-BUS can be very useful, as it provides another piece of the puzzle in the form of inter-application and application<->system signaling. The trouble is that, like with most things in the Linux-related world, some kind of agreement needs to be reached by interested parties.

All it really takes for app consistency is two d-bus events:

"Prepare for snapshot - pause activity, fsync, etc."
"Snapshot taken/failed/cancelled, safe to resume"

... though there needs to be some aliveness-checking and time-limiting in place so that a broken application can't cause indefinite outages in other apps by not responding to a snapshot message.

Off the top of my head, products that'd benefit from this include Cyrus IMAPd, PostgreSQL, MySQL, any other database you care to name...

... IS ONLY A SMALL PART OF THE PUZZLE

As compared to VSS, using dm-snapshot (LVM snapshots) on Linux suffers from a number of significant deficiencies and problems:

Snapshots require you to use LVM. LVM doesn't support write barriers, so file systems must use much slower full flushes instead.
Snapshots are block-level not file-system-level, so the file system isn't aware of the snapshot being taken.
Because snapshots are block-level and not filesystem-aware, the snapshot must track even low-level file system activity like file defragmentation, erasing free space, etc. This means they grow fast and have a higher impact on system I/O load.
Accessing a snapshot requires mounting it manually to another location and faffing around with the changed paths. There's no way for an app to simply request that it sees a consistent point-in-time view of the FS instead of the current state, as in VSS. This is clumsy and painful especially for backups and the like.
The file system being mounted has to be able to cope with being mounted read-only in crashed state - it can't do journal replay/recovery etc. LVM doesn't even give the *filesystem* a chance to make its self consistent before the snapshot. Some file systems, like XFS, offer manual pause/resume commands that may be issued before and after a snapshot is taken to work around this.
Snapshots don't age out particularly gracefully. You have to guess how much space you'll need to store changes, as LVM isn't smart enough to just use free space in the VG until it runs out. The snapshot's change tracking space is reserved and can't be used for anything else (even another snapshot) until the snapshot is freed. If you over-estimate you can have fewer snapshots and need to keep more free space around. If you under-estimate your snapshot may die before you finish using it. Argh!

So: even with the ability to pause app activity, there's unfortunately a lot more to be done at the LVM level before anything even half as good as VSS is possible. LVM has been progressing little, if at all, for quite a few years now and some of these problems (such as the namespace issues and lack of FS integration) aren't even practical to solve with a block-driver level approach like LVM's.

At this point, one can only hope that BTRFS can do a better job, so that we can switch to runnning btrfs on mdadm raid volumes and breathe a sigh of relief.

The Australian Maginot Line - because it worked great last time

Peter Thrush of ICANN recently commented that the Australian Internet Filter proposal is akin to the Maginot Line of WWII French fame. We all know how well that worked.

This is a surprisingly good analogy. The Maginot line presumed that the attacker would do what was expected of them, and wouldn't take the defenses into consideration when planning what they were doing. In much the same way, the Australian internet filter presumes that if it blocks what people do now, they won't change their behavior to circumvent the blocking with trivially available tools and techniques like encryption, tunneling, outside proxies, etc.

We already know that's an invalid assumption - not only is it rather contrary to general human nature, but it's being seen over and over in China with the Great Firewall. This despite the fact that China's Great Firewall is much more restrictive than Australia's is ever likely to be even under the most moralistic, conservative, idiotic government. Let's not forget, also, that in China it can be unhealthy to circumvent blocks that prevent you from accessing or posting information that's not meant to get around ... something I don't see becoming the case here.

So - in much more hostile circumstances, people still just waltz through the Great Firewall. Heck, I've done it myself - I had a workmate in China who needed unfiltered access, and it was the work of a few seconds to help him set up an encrypted SSH tunnel to a proxy on work's servers from which he could get to whatever websites he liked and do so undetectably. It's not even possible to tell that the encrypted data is web browsing data rather than something else.

Once again, it's clear that the only way the internet filter can work is if it's a whitelist. If a site isn't approved, you can't access it. If a protocol can't be inspected and content-filtered, it's blocked. No encryption of any sort may be used. Even that's imperfect due to cracking of whitelisted sites and use of them for proxies, etc.

It's a dumb idea. Why are we still wasting time and taxpayer money on such blithering idiocy?

Scripts to add replaygain tags to a flac collection

If, like me, you like to be able to play your music without constantly having to lunge for the volume control, and you store the master copy of your (ripped) CD collection in FLAC format you might be interested in a quick way to (re)calculate all the ReplyGain tags on your music. I'll also go into how to automatically create smaller MP3 copies of the files with the replaygain bias "burned in" so stupid players still get the volume right.
Finally, there's another little script here that can be used to fix up issues like inconsistent artist naming. It only does "The X" -> "X, The" at the moment, but it's easy enough to extend.
Read on if you're happy with shell scripting (and a Linux or advanced Mac OS X user).

The Great Australian Firewall just won't work

The proposed Australian Internet censorship rules will not work. Like most such blacklist-based schemes without active human monitoring it can be trivially bypassed by anybody capable of following simple instructions. As such, all the people it's designed to stop (like kiddie porn scum) will ignore its supposed blocking effects completely. Meanwhile we'll all have to live with the performance degradation, reliability problems, and so on.

Reasons why Microsoft Access absolutely stinks as a DB UI RAD tool

When used as a UI builder & RAD tool for accessing a real database (in this case PostgreSQL), Microsoft Access absolutely stinks. It has some severe problems that require cumbersome workarounds. In fact, if you're vaguely competent with a programming language it's almost certainly better to just write the application in your preferred language and be done with it. Java + Swing + Hibernate, for example, would do the job very well, as would C++ & Qt .
I was forced to use Access in this most recent project, and thought I'd post a warning for others who might be considering it but have alternatives available.
Note that none of this applies if you use MS SQL Server as the backend since Access doesn't use ODBC linked tables for MS SQL Server, but rather its own custom interface that's much smarter and more capable.
Problems using Access with ODBC linked tables include:

It doesn't understand server-side generated synthetic primary keys. It can't just insert a record with a DEFAULT key and then retrieve the generated key through the ODBC driver. To use a PostgreSQL SERIAL column (a sequence) you must write an event procedure in VB that fires on the Form_BeforeInsert event. This procedure must issue a passthrough query to invoke nextval('seqname'), then store the value in the primary key field of the form. Hooray for a tool that does everything for you transparently, simply, and automatically.
When you write your passthrough query, you will notice when examining the server query logs that it fires twice. That's because, if the ReturnsRecords property is set on a query, Access appears to issue the query twice. Presumably it's using the first instance to enumerate the returned fields for type-checking purposes, as it does not do this when the ReturnsRecords property is false. In any case, this means that you can't get the return value of a stored procedure with side effects without invoking the procedure twice. So, for nextval, I've had to first call nextval('seqname') and discard the return value, then call currval('seqname') to get the return value. That's two round trips to the database that were completely unnecessary, and a total of four for a single INSERT.
Access has no concept of data dependencies between fields. If one field is (say) a combo box populated by a query that depends on the value of another field, you must use an event procedure in VB to trigger a refresh of the dependent field when the field it depends on is changed. This would seem to be one of the more obvious and basic things a RAD database tool would do for you, and I was stunned that it does not.
Access loves to retrieve huge amounts of unnecessary data. If you open a form that's bound to a linked table, it'll issue a query like SELECT pkey FROM table;, which will result in a sequential scan on the whole table for databases that can't return the data from the index (like Pg¹). Even for databases that can, they must still read and scan the whole index. That's also potentially a lot of network I/O. You can work around this with EXTREMELY careful use of filter rules, but you have to be quite paranoid to make sure it never sees the form/table without a filter rule.
Access cannot natively perform non-trivial queries server-side - you have to write them yourself with Visual Basic and use unbound forms, which isn't too dissimilar to how you'd be working with (say) Hibernate, just uglier. With a bound form and a linked table, Access will use a WHERE clause on the primary key when fetching a record, but that's about it. It can perform simple inner joins server side too, apparently, but I've hidden everything I can behind server-side updatable views so I haven't had to struggle with this aspect yet.
Access cannot handle NULL boolean values.. Use a smallint with a constraint on it instead.
Access needs some custom operators and casts defined to work with PostgreSQL.. It assumes that certain types may be cast to other types, compared to them, etc in ways that aren't allowed by default. Thankfully PostgreSQL supports user-defined operators and casts, so you can just add them. Here's one page with a useful set of operators and casts..
Access likes to test for nullity with the = operator.. This is just wrong - the result of NULL = NULL is NULL, not TRUE, since the two unknown values may or may not be equal (the result is also unknown). PostgreSQL has a hack parameter, transform_null_equals, that may be set to enable the incorrect behaviour that MS Access expects. Access MAY have been fixed in version 2007; I haven't had cause to test this particular quirk yet.

You can work around a lot with careful use of filters, lots of Visual Basic code, plenty of views, updateable views, and stored procedures, and by writing your stored procedures to have EITHER side effects OR a return value (never both). It's not much fun, though, and it's still far from efficient.
On the upside, with row versioning enabled in the PostgreSQL ODBC driver, Access uses the MVCC attributes of the tuples (xmin, etc) to do optimistic locking. This allows it to avoid long-running transactions and long-held locks during user interaction without having to mess with the schema to add a field for oplocking use. It'll even play well with the oplocking used by things like Hibernate, and will work properly with apps that use ordinary transactional locking (SELECT ... FOR UPDATE and so on) without the need for any sort of support triggers. It's very nice.
To use Access with PostgreSQL, you REALLY need to set these options in the ODBC DSN for the connection:

Row Versioning causes the driver to use only the primary key in WHERE clauses, and to use xmin & affected record count during updates to avoid conflicts.
True as -1
(unchecked) Bool as Char

¹ This isn't as stupid as it sounds. PostgreSQL uses an MVCC approach that allows for very efficient, highly concurrent inserts and selects. The price is that it must store tuple visibility information to know which transactions should see which tuples, since the table contains the data at many different points in time all at once. Visibility information isn't in the indexes (because it'd generate lots more write activity on the indexes, slowing down I/O) so once an index indicates a match on a tuple, it must be read from the heap to check visibility even if the index contains the actual data point we want. It's a performance trade-off that is not particularly great when you're doing a `count(*)' or `select pkey from table' but is wonderful on highly concurrent systems.

Nice work, NetworkManager

I'm yet to encounter a cellular modem that NetworkManager 0.7 (in Ubuntu 8.10 and 9.04-beta) doesn't automatically recognize without any user configuration, driver installation, or anything. Just plug it in (if not built in) and start using it.

Very nice work.

Dongles are EVIL

The device you see on the right is actually the devil. Or, at least, it's close enough if you are a system administrator.

It is a single piece of hardware that controls your access to business-critical programs. Lost the dongle? Whoops, no classified ads in the newspaper this week. Dongle broke? Ditto. Dongle fried by a computer malfunction or power fault? Ditto. Computer stolen? Ditto.

What's even more fun is that as computers move on and older interfaces become obsolete, it becomes hard to even find a computer you can plug the dongle in to. Most machines don't have parallel ports anymore, so parallel dongles like this one are a big problem. At least that can be worked around using USB adapters.

Of course, then you run into exciting issues like XP being unable to allow 16-bit code access to the parallel port. The program would work fine on XP, but for the stupid bloody dongle. So you're forced to maintain legacy hardware or waste time on complex emulation/virtualisation options just to get the program working, when it'd be just fine but for this dongle.

So, if you are ever offered software for any reason that requires a dongle, just say no.

Bought to you by the exciting battle to get an old and alas mission-critical win16 app to work under WinXP or even WINE.

Getting GNOME Evolution to offer a client certificate for IMAP SSL/TLS

GNOME Evolution isn't noted for its client certificate support. Entries in the bug tracker about it have rotted for years, and it has absolutely no acknowledged support whatsoever. Most other mail clients have had client cert support for years if not decades.

Unfortunately, Evolution is quite attractive in other ways - calendar integration, LDAP address books, etc. Unlike Thunderbird (especially when large images are involved) it also has acceptable performance over remote X11 connections.

So - I'd rather like to be able to use Evolution, but it's client support ... isn't.

It turns out, though, that Evolution uses the Network Security Services library from Netscape/Mozilla . It's used, among other things, for IMAP SSL/TLS support. This library does support client certificates; after all, Thunderbird and Firefox support client certificates and they do their crypto through NSS.

Is it not then possible to introduce a client certificate at the libnss level, so Evolution doesn't even know it's doing client certificate negotiation during its hand-off to NSS for SSL/TLS setup?

Why, yes, it is, and it takes one line of code in camel-tcp-stream-ssl.c to do it.

camel-tcp-stream-ssl.c:
- /*SSL_GetClientAuthDataHook (sslSocket, ssl_get_client_auth, (void *) certNickname);*/
+ SSL_GetClientAuthDataHook (ssl_fd, (SSLGetClientAuthData)&NSS_GetClientAuthData, NULL );

Because Evolution its self still has no idea about client certificates, if the server demands one and you don't have one installed you'll still get a useless error message instead of an appropraite prompt to install a client certificate. Just like Thunderbird and most other client-cert supporting apps. However, if you install a client cert by importing it into the Certificates section of the preferences, evolution (or more accurately libnss) will present it and use it when the server asks for it.

Update late 2009:

Committed in stable (gnome 2.28.1+) http://git.gnome.org/cgit/evolution-data-server/commit/?h=gnome-2-28&id=87238717ceb0a158a00c76fc07c6e27c769c2cf0
Committed in master (gnome 2.29.1+) http://git.gnome.org/cgit/evolution-data-server/commit/?id=429a106d101bf205ba0c8ee8f94a818327c2d736

Update mid 2010:

This code has now hit shipping Evolution versions in up-to-date distros like Ubuntu 10.04 and Fedora 13. I've tested it in Ubuntu 10.04 and verified that client cert support works now. Hooray!

Getting central certificate management working on modern Linux

Modern Linux (GNOME, anyway) systems actually have a central certificate store. It's a bit lacking in management UI so far, but it works, and you can use it instead of loading your PKCS#12 certificates into every app you use manually.

First, import your certificate into the GNOME keyring with:

gnome-keyring import /path/to/certificate.p12

Install the libnss3-tools package (containing modutil).

Now exit every application you can, particularly your browser and mail client. Kill evolution-data-server too.

Find all instances of the nss security module database on your homedir, and for each one (a) test to make sure it's not open and (b) install the gnome-keyring PKCS#11 provider in it. The following shell script snippet will do this for you. Just copy and paste it onto your command line:

for f in $(find . -maxdepth 5  -name secmod.db -type f  2>/dev/null ); do
  echo "Testing: `basename $f`"
  if fuser `dirname $f`/cert8.db >&/dev/null; then
    echo -n "In use by: "; fuser `dirname $f`/cert8.db; echo " - Skipping"
  else
    modutil -force -dbdir `dirname $f` -add GnomeKeyring \
            -libfile /usr/lib/gnome-keyring/gnome-keyring-pkcs11.so
  fi
done

Now all your NSS-based apps should know about gnome-keyring and use the gnome-keyring certificate store.

If you use Evolution and want client certificate support, patch evolution-data-server as per GNOME bug 270893 to enable that too. It'll use gnome-keyring automatically.

i1Pro and the Australia Tax

The X-Rite i1Pro is an important instrument for anyone doing serious computer colour work, particularly for print and pre-press. It's also incredibly pricey, especially if you happen to live in Australia.

There's this oddity known locally as the "Australia Tax". The Australian Tax Office may not know about it, but it appears that local distributors for international businesses are convinced that it exists, and that it's high. That's the only explanation I can find for some of the jaw-dropping price differences between US and Australian versions of the same products - most of which are made in China anyway.

I got the X-Rite i1Pro instrument cheap (ish) - at AU$1500 ex GST and shipping compared to the AU$1800 quoted price. X-Rite force you to buy through exclusive local dealerships that add a huge markup, so while the US price is US$995 for the same instrument (AU$1200 @ current rates) you can't just order from the US. They won't ship it to you. You can use a US remailing service but X-Rite won't register it and won't support it outside the US - and neither will the AU distributor. You can't get it recalibrated etc without a painful amount of effort.

The AU dealership tries to claim it "adds value" ... but they don't do local advanced tech support, don't have any techs or offices outside metro Sydney, ship the instruments off to the US (3-4 week round trip) for calibration, and don't even keep spares in stock. So what value, exactly, do they add, other than to the price tag?

X-Rite and their distributors are raking it in with this arrangement. Unfortunately, X-Rite have been buying out all their competitors (like GretagMacbeth) so they're the only game in town. Like Quark, they'll suffer for their customer-hostile attitude and parallel import restrictions eventually, but right now they're in the "raking in the dough" phase.

(Of course, literally three days after I bought the i1Pro, Graham Gill, who develops Argyll CMS, announced support for the much cheaper ColorMunki spectrophotometer ... but hey, the i1Pro is a much better instrument so no harm done.)

Client certificate WTF

Why does almost nobody bother to support X.509 client certificates properly? They're a weak, poorly implemented afterthought in many systems if they're supported at all.
(Note: Android info here may be dated, as I last tested with 2.0, and applies only to the stock Android distribution not apps or patched versions supplied by vendors.)

Microsoft Windows: Perfect support, though it requires PKCS#12 (.p12) format certificates. Most 3rd party apps (subversion, mozilla apps, etc) use own cert stores rather than the OS's for no good reason.
Mac OS X: Limited support in OS keychain. No support in OS services like Apple Mail (IMAP+TLS, IMAPs, SMTP+TLS or SMTPs), WebDAV over HTTPs, etc so I have no idea why they even bothered adding support to keychain. Some apps have their own support, eg Mozilla apps via NSS, but OS has none and most Apple apps have none. No 3rd party apps seem to look in the system keystore for certificates.
Linux systems: All major SSL/TLS libraries have support, but there's no system-wide or desktop-wide keystore or key management. Netscape security suite apps have good support but certificates must be installed individually in each app - even though the underlying libnss library supports a shared user-account-level security store. GnuTLS- and OpenSSL-using apps must implement their own certificate management but can support client certificates if they provide appropriate callbacks. As a result, support is very app dependent and often buggy or very clumsy. Real-world support is inconsistent - eg Subversion supports client certs, but many svn front ends don't handle the cert prompts; Thunderbird & Firefox support client certs via NSS; Evolution supports via NSS but has broken nss init code that breaks client certificates and is often built with GnuTLS instead of NSS anyway; etc. Overall it's painful but usually usable.
Symbian (Series 60) phones: Support is perfect in OS and apps. Very smooth.
Sony Ericsson phones: Seem to have no concept of client certificates, and treat request for client cert by remote mail server as an SSL/TLS error.
Windows Mobile phones: Basically perfect from all reports. Pity about the rest of the OS.
Apple iPhone: Decent client cert store support. Unclear how much access 3rd party apps have. It's used for safari. Reported broken in Mail (comment by Martin).
Android phones: are a near-total information void. Apparently it's just assumed you'll use Google's services, not (say) your own secure mail server with your work. Because, you know, who needs confidentiality anyway? If you download the SDK and phone emulator, you'll quickly find out that not only does the OS lack any way to import a client certificate or use one in negotiation, but it lacks any way to even import new CA certificates. That's stunningly, jaw-droppingly pathetic. Of course, this is a phone with a read-only IMAP client so it's not clear what, exactly, it's meant to do...

Sigh.

Update: It looks like there's half-baked and mostly user-inaccessible support for importing CA certificates in some flavours of Android. This app exploits the facility. Client certificates don't appear to be so blessed.

Google Android is not a smartphone OS

... it's a simple phone OS plus a web browser and some Google services. At least if the devkit phone simulator I used to see if this was something I might want to actually use is anything to go by.

It appears to lack some pretty fundamental things you'd expect from a smartphone.

Ability to browse and view local files on phone memory or SD card, eg open HTML files, PDFs, etc
An IMAP client that can delete messages, mark them as read on the server, etc
Any ability at all to support corporate private CAs (Certificate Authorities), since it can't import new X.509 CA certificates
Any X.509 client certificate support for secure mail and intranet access
Decent sync and backup facilities to a laptop/PC. Oh, wait, you only use Google services, right?

This is Google's fancy new phone platform? Call me again in a few years, once you've grown up a bit - right now, even the iPhone OS is a more solid choice.

The best eBook reader for Linux is currently....

Microsoft Reader run under WINE.

Sigh. Not only is it the best, it's practically the only one unless you're content with fixed-format PDF. Few eBooks are available in, or reasonably convertible to, HTML, and even if they were there aren't any HTML renderers that can do half-decent H&J. None at all can hyphenate even poorly, and justification support tends to be limited to clumsy expansion-only justification that is ugly and not very nice to read.

So, to get a decent result one would basically have to hand-convert a plain text or HTML format book (possibly after pdf-to-text conversion) to TeX and typeset it for a particular display. That's not exactly a nice, convenient way to sit down for a good read. Even then, unless you use pdftex and read with Adobe Reader it won't even remember your place!

By contrast the Microsoft Reader .lit format is fairly widespread, supports automatic and somewhat decent H&J (though nothing on TeX / InDesign ), remembers your place in each book in the library, tracks and manages the library without forcing a particular on-disk structure on it, supports easy drag-and-drop of a book onto the program even from Nautilus, etc. It's friggin' emulated* Windows software that hasn't been updated or improved since it was practically abandoned by MS in 2003 and it's still better than anything available natively for Linux.

The situation is just as dire for Linux-based ( and Symbian-based ) phones and tablets. Given the spread of Qt to more and more devices, as well as all major platforms, I'm increasingly tempted to start work on a Qt-based reader with decent H&J, library management, place tracking / bookmarking / margin notes, etc. But how can there not be something out there already? Am I just blind, or is there really a gaping hole this big in free software capabilities?

Any suggestions? Anyone interested in working on one?

* I know, I know; I just don't care that W.I.N.E. Kudos to the WINE folks for their amazing work.

Australian Bandwidth Pricing

I recently switched hosting of large files (12Mb or so PDFs) on my employer's website from the existing Australian host to one in the USA. Why? Because it's cheaper to send data from the USA to Australian users than it is to send it from within Australia..

About 100x cheaper, in this case, when comparing Anchor Networks per-Gb pricing to SimpleCDN's.

SimpleCDN doesn't have any Australian node(s). Data gets requested by their US nodes on first request, cached, and sent back to Australia. Yet they're incredibly, vastly cheaper than anything local I can find.

The root of the problem appears to be that Australian hosting providers charge all IP traffic as if it were to go via an international link. There's no provision made for peering or intra-national traffic at the majority of hosts. This may be an issue with the hosting provider its self, or it may be with their upstream bandwidth suppliers, but I don't care. Internet routing is designed to solve this sort of problem - thankyou BGP - and peering points exist for a reason.

It's actually way cheaper to store your data in the UK, Ireland, Germany, France, or pretty much anywhere except Australia even if your users are 100% Australian. Isn't that kind of sad?

Technical Support for Commercial / Proprietary Software

The Anchor Networks head sysadmin has an opinion on commercial support for software that's pretty similar to mine - it's garbage. Both of us have learned this from painful experience.

The post is well worth a read if you're in a sysadmin/tech line of work. It mirrors my experiences with several vendors very closely, except that this particular case doesn't include any inter-vendor buck-passing or blame games. There's a reason more and more of the systems at work run on software I have source code to and can rely on myself to maintain - because that way, things actually get fixed.

If you think Anchor's experience with dedicated commercial support organizations is bad, you should try contacting tech support for incredibly expensive commercial software you've licensed and asking them to support their product! I've had totally disinterested or completely useless support from vendors of ten thousand dollar software packages. After all, just because I paid for it doesn't mean I should expect it to work as advertised or expect them to be interested in fixing bugs, right?

Adobe, Quark, MYOB, Apple. This means you.

Anyway, the downside of doing all the support work in-house is that you need to have the skills to undestand and run the systems you use. You can't run a DNS server if you don't understand DNS, can't run a mail server if you don't understand IMAP,SMTP,POP3 and TLS, etc. However, given that vendor support seems to be totally useless except for problems a retarded monkey could figure out, it's beyond me how people with no understanding of the systems they work with ever get anything done, whether or not they're paying for support.

Maybe they don't? It'd explain a lot about many of the businesses I work with...

Monday, November 1, 2010

Support

720p EDID on a 1080p panel, mainboard programmed wrong

HDMI audio and overscan

Vendors unwilling to fix anything

Brightness controls pixel values not backlight

Horizontal blur at native 1080p

It's just too crap

Saturday, October 16, 2010

Friday, October 15, 2010

Monday, October 4, 2010

Wednesday, September 8, 2010

Sunday, August 29, 2010

Thursday, August 26, 2010

Wednesday, August 25, 2010

Tuesday, August 24, 2010

WARNING

Java EE 6

Sunday, August 22, 2010

Wednesday, August 18, 2010

Monday, August 16, 2010

Sunday, August 15, 2010

Monday, August 9, 2010

Problem 1: Acronym soup

Problem 2: Bad or incomplete documentation, narrow-view docs without an overview

Problem 3: Legacy, duplication, and specs created in isolation

Problem 4: Too much magic, not enough visibility when (not if) the magic breaks

Struggling? You're not alone

Wednesday, August 4, 2010

Monday, August 2, 2010

Sunday, August 1, 2010

Friday, July 30, 2010

Sunday, July 25, 2010

Thursday, July 22, 2010

Tuesday, July 20, 2010

Saturday, July 10, 2010

Tuesday, June 29, 2010

Wednesday, May 26, 2010

Thursday, May 20, 2010

Wednesday, May 5, 2010

Monday, May 3, 2010

Friday, April 30, 2010

Saturday, April 24, 2010

Friday, April 23, 2010

Friday, April 16, 2010