Symbolic rule-based classification of lung cancer stages from free-text pathology reports

Nguyen, Anthony N., Lawley, Michael J., Hansen, David P., Bowman, Rayleen V., Clarke, Belinda E., Duhig, Edwina E. and Colquist, Shoni (2010) Symbolic rule-based classification of lung cancer stages from free-text pathology reports. Journal of the American Medical Informatics Association, 17 4: 440-445. doi:10.1136/jamia.2010.003707

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
Bowman_authaffil.pdf Bowman_authaffil.pdf application/pdf 413.5KB 1
Bowman_authaffil_staffdata.pdf Bowman_authaffil_staffdata.pdf application/pdf 61.28KB 1

Author Nguyen, Anthony N.
Lawley, Michael J.
Hansen, David P.
Bowman, Rayleen V.
Clarke, Belinda E.
Duhig, Edwina E.
Colquist, Shoni
Title Symbolic rule-based classification of lung cancer stages from free-text pathology reports
Journal name Journal of the American Medical Informatics Association   Check publisher's open access policy
ISSN 1067-5027
Publication date 2010-07
Sub-type Article (original research)
DOI 10.1136/jamia.2010.003707
Volume 17
Issue 4
Start page 440
End page 445
Total pages 6
Place of publication London, United Kingdom
Publisher BMJ Group
Collection year 2011
Language eng
Formatted abstract
Objective To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification.
Design By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (Snomed CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated Snomed CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage.
Measurements The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines. Results Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches.
Conclusion A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using Snomed CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.
Keyword Cancer classification
Pathology reports
Lung cancer
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Official 2011 Collection
School of Medicine Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 16 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 32 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Tue, 15 Mar 2011, 08:31:31 EST by Lisa Hennell on behalf of Royal Brisbane Clinical School