Empirical aspects of record linkage across multiple data sets using statistical linkage keys: The experience of the PIAC cohort study

Karmel, Rosemary, Anderson, Phil, Gibson, Diane, Peut, Ann, Duckett, Stephen and Wells, Yvonne (2010) Empirical aspects of record linkage across multiple data sets using statistical linkage keys: The experience of the PIAC cohort study. BMC Health Services Research, 10 41-1-41-13. doi:10.1186/1472-6963-10-41

Author Karmel, Rosemary
Anderson, Phil
Gibson, Diane
Peut, Ann
Duckett, Stephen
Wells, Yvonne
Title Empirical aspects of record linkage across multiple data sets using statistical linkage keys: The experience of the PIAC cohort study
Journal name BMC Health Services Research   Check publisher's open access policy
ISSN 1472-6963
Publication date 2010-02
Sub-type Article (original research)
DOI 10.1186/1472-6963-10-41
Open Access Status DOI
Volume 10
Start page 41-1
End page 41-13
Total pages 13
Place of publication London, U.K.
Publisher BioMed Central
Collection year 2011
Language eng
Formatted abstract
Background. In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services. Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases.
Methods. A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links.
Results. The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events.
Conclusions. The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm, based on statistical properties of match keys, is a useful addition to the linker's toolkit. In particular, it may prove attractive when insufficient data are available for clerical review or follow-up, and the researcher has fewer options in relation to probabilistic linkage.
© 2010 Karmel et al; licensee BioMed Central Ltd.
Keyword Pathways in aged care (PIAC)
Statistical linkage key (SLK)
General person identifier
Clerical review
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ
Additional Notes Article # 41

Document type: Journal Article
Sub-type: Article (original research)
Collections: Faculty of Health and Behavioural Sciences -- Publications
Official 2011 Collection
School of Public Health Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 9 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 13 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 11 Apr 2010, 00:07:41 EST