Skip to content

Multicultural NSW
Data randomisation

The information presented in the tables in is based on detailed tables produced by the Australian Bureau of Statistics at the Local Government Area level, and at the Statistical Area Level 1 (SA1) level for suburbs and small areas in 2011 (Census Collection District (CD) for prior Census years).

The Australian Bureau of Statistics (ABS) will randomise information it provides to preserve confidentiality. All cells are slightly adjusted to prevent any identification of personal details.

Data tables released prior to the 2006 Census had small numbers (values of 1 or 2) randomly adjusted to either 0 or 3 by the ABS. As tables are randomly adjusted independently of each other, totals differ slightly across tables with the same population. The effect of randomisation is increased with the aggregation of CDs into suburbs and localities.

For the 2006 and 2011 Census, a different method called perturbation was used. All figures included within any table may be randomly adjusted by a small amount. These adjustments result in small introduced random errors. This method was introduced, so that not only could individuals not be directly identified in the data, but “differencing” could not be employed to derive individual characteristics. Differencing is deriving two separate tables with a small difference, the calculation of which may be as little as one person. Perturbation makes this impossible.

Although the information value of the table as a whole is not impaired, care should be taken when interpreting very small numbers, since randomisation will affect the relative size of small numbers far more than larger numbers. The effect of the randomisation methodology also ensures that values of 1 and 2 do not appear in tables.

No reliance should be placed on small cells as they are impacted by random adjustment, respondent and processing errors.

Table totals and subtotals will be internally consistent but discrepancies may be observed between tables cross-tabulating the same population by different variables. While perturbation compromises the table totals by making them appear inconsistent, for a population of any significant size (over 1,000) this perturbation is insignificant, and still results in the best available socio-demographic data at the suburb level. This level of compromise should not impact on decision makers making effective resource allocation and planning decisions.