A software package for the application of probabilistic anonymisation to sensitive individual-level data: a proof of principle with an example from the ALSPAC birth cohort study

Demetris Avraam; Andy Boyd; Harvey Goldstein; Paul Burton

doi:10.14301/llcs.v9i4.478

A software package for the application of probabilistic anonymisation to sensitive individual-level data: a proof of principle with an example from the ALSPAC birth cohort study

Authors

Demetris Avraam Institute of Health and Society, Newcastle University, Newcastle, UK, and Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
Andy Boyd Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
Harvey Goldstein Graduate School of Education, University of Bristol, Bristol, UK and Institute of Child Health, University College London, London, UK
Paul Burton Institute of Health and Society, Newcastle University, Newcastle, UK, and Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK

DOI:

https://doi.org/10.14301/llcs.v9i4.478

Keywords:

Probabilistic anonymisation, disclosure control, measurement error, h-rank index, ALSPAC

Abstract

Individual-level data require protection from unauthorised access to safeguard confidentiality and security of sensitive information. Risks of disclosure are evaluated through privacy risk assessments and are controlled or minimised before data sharing and integration. The evolution from ‘Micro Data Laboratory’ traditions (i.e. access in controlled physical locations) to ‘Open Data’ (i.e. sharing individual-level data) drives the development of efficient anonymisation methods and protection controls. Effective anonymisation techniques should increase the uncertainty surrounding re-identification while retaining data utility, allowing informative data analysis. ‘Probabilistic anonymisation’ is one such technique, which alters the data by addition of random noise. In this paper, we describe the implementation of one probabilistic anonymisation technique into an operational software written in R and we demonstrate its applicability through application to analysis of asthma-related data from the ALSPAC cohort study. The software is designed to be used by data managers and users without the requirement of advanced statistical knowledge.

Downloads

Published

2018-10-19

Issue

Vol. 9 No. 4 (2018): Longitudinal and Life Course Studies

Section

Papers

License

Authors who published with Longitudinal and Life Course Studies Volumes 1–9 agreed to the following terms:

1. Authors retain copyright and grant the Journal right of first publication with the work, simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

2. Following first publication in this Journal, Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal, provided always that no charge is made for its use.

3. Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their own website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

A software package for the application of probabilistic anonymisation to sensitive individual-level data: a proof of principle with an example from the ALSPAC birth cohort study

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Information

Developed By