Tuesday, August 9, 2016

Gross overconfidence with public data

The Australian Buerau of Statistics is showing all the signs of being grossly overconfident with every aspect of the 2016 Census, bordering on incompetent.

You've heard all about the data retention in broad terms, but what exactly does it mean? And why could it be bad? After all the data is "anonymized" such that personally identifiable data is removed before being shared, right? Their original non-anonymized versions are encrypted and safe in the hands of ABS administration, so there's nothing to worry about.

Well, it's not that simple.

Lets talk about anonymization vs aggregation, how de-anonymization works, and why the "statistical linkage key" is appallingly flawed.