Efficient discovery of risk patterns in medical data

Li, Jiuyong, Fu, Ada Wai-chee and Fahey, Pau (2009) Efficient discovery of risk patterns in medical data. Artificial Intelligence in Medicine, 45 1: 77-89. doi:10.1016/j.artmed.2008.07.008


Author Li, Jiuyong
Fu, Ada Wai-chee
Fahey, Pau
Title Efficient discovery of risk patterns in medical data
Journal name Artificial Intelligence in Medicine   Check publisher's open access policy
ISSN 0933-365; 1873-2860
Publication date 2009-01
Year available 2009
Sub-type Article (original research)
DOI 10.1016/j.artmed.2008.07.008
Volume 45
Issue 1
Start page 77
End page 89
Total pages 13
Editor K.P. Adlassing
Place of publication Amsterdam, Netherlands
Publisher Elsevier
Collection year 2010
Language eng
Subject 119999 Medical and Health Sciences not elsewhere classified
929999 Health not elsewhere classified
111706 Epidemiology
Formatted abstract Objective.
This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research.

Methods.
To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts.

Results.
The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners.

Conclusion.
The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.


Keyword Relative risk
Risk pattern
Data mining
Association rule
Decision tree
Epidemiology
References [1] Quinlan JR. C4. 5: programs for machine learning. San Mateo, CA: Morgan Kaufmann; 1993. [2] Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine 2001;1:89—109. [3] Li J, Wong L. Using rules to analyse bio-medical data: a comparison between c4. 5 and PCL. In: Dong G, Tang C, Wang W, editors. Advances in web-age information management, proceedings of 4th international conference. Berlin/ Heidelberg: Springer; 2003. p. 254—65. [4] Zhou Z, Jiang Y. Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble. IEEE Transactions on Information Technology in Biomedicine 2003;1:37—42. [5] Agrawal R, Imielin´ski T, Swami A. Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S, editors. Proceedings of ACM SIGMOD international conference on management of data. New York: ACM; 1993. p. 207—16. [6] Brossette SE, Sprague AP, Hardin JM, Jones KWT, Moser SA. Association rules and data mining in hospital infection control and public health surveillance. Journal of American Medical Informatics Association 1998;5:373—81. [7] Ohsaki M, Sato Y, Yokoi H, Yamaguchi T. A rule discovery support system for sequential medical data in the case study of a chronic hepatitis dataset. In: Proceedings of the ECML/PKDD-2003 discovery challenge workshop, [ http:// www.lisp.vse.cz/challenge/ecmlpkdd2003/, accessed 26.3.08]. [8] Paetz J, Brause RW. A frequent patterns tree approach for rule generation with categorical septic shock patient data. In: Crespo J, Maojo V, Martin F, editors. Proceedings of the second international symposium on medical data analysis. London: Springer-Verlag; 2001. p. 207—12. [9] Li J, chee Fu AW, He H, Chen J, Jin H, McAullay D, et al. Mining risk patterns in medical data. In: Grossman R, Bayardo RJ, Bennett KP, editors. Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2005. p. 770—5. [10] Quinlan JR. Induction of decision trees. Machine Learning 1986;1:81—106. [11] Ordonez C. Comparing association rules and decision trees for disease prediction. In: Xiong L, Xia Y, editors. Proceedings of the international workshop on healthcare information and knowledge management. New York: ACM; 2006. p. 17—24. [12] Hu J, Mojsilovic A. High-utility pattern mining: a method for discovery of high-utility item sets. Pattern Recognition 2007;11:3317—24. [13] Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C, editors. Proceedings of 20th international conference on very large data bases. San Mateo, CA: Morgan Kaufmann; 1994. p. 487—99. [14] Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery Journal 2004;1:53—87. [15] Chen J, He H, Williams GJ, Jin H. Temporal sequence associations for rare events. In: Dai H, Srikant R, Zhang C, editors. Advances in knowledge discovery and data mining, eighth Pacific-Asia conference. Berlin/Heidelberg: Springer; 2004. p. 235—9. [16] Ordonez C, Ezquerra NF, Santana CA. Constraining and summarizing association rules in medical data. Knowledge and Information Systems 2006;3:1—2. [17] Zaki MJ. Mining non-redundant association rules. Data Mining and Knowledge Discovery Journal 2004;3:223—48. [18] Bayardo R, Agrawal R, Gunopulos D. Constraint-based rule mining in large, dense database. Data Mining and Knowledge Discovery Journal 2/3 2000;217—40. [19] Roberto J, Bayardo J, Agrawal R. Mining the most interesting rules. In: Fayyad U, Chaudhuri S, Madigan D, editors. Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 1999. p. 145—54. [20] Webb GI. Efficient search for association rules. In: Ramakrishnan R, Stolfo S, Bayardo R, Parsa I, editors. Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2000. p. 99—107. [21] Webb GI, Zhang S. K-optimal rule discovery. Data Mining and Knowledge Discovery Journal 2005;1:39—79. [22] Cheung Y, Fu A. Mining association rules without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering 2004;9:1052—69. [23] Li H, Li J, Wong L, Feng M, Tan Y-P. Relative risk and odds ratio: a data mining perspective. In: Li C, editor. Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. New York: ACM; 2005. p. 368—77. [24] Ohsaki M, Kitaguchi S, Okamoto K, Yokoi H, Yamaguchi T. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In: Boulicaut J-FF., Esposito F, Giannotti F, Pedreschi D, editors. Proceedings of the eighth European conference on principles and practice of knowledge discovery in databases. New York: Springer-Verlag; 2004. p. 362—73. [25] Triola MM, Triola MF. Biostatistics for the biological and health sciences, 2nd ed, Boston: Addison-Wesley; 2005. [26] Wang J, Han J, Pei J. Closet+: searching for the best strategies for mining frequent closed itemsets. In: Getoor L, Senator TE, Domingos P, Faloutsos C, editors. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2003. p. 236—45. [27] Asuncion A, Newman DJ. UCI repository of machine learning databases, [ http://www.archive.ics.uci.edu/ml, accessed 26.6.08].
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: ERA 2012 Admin Only
School of Nursing and Midwifery Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 12 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 21 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Access Statistics: 129 Abstract Views  -  Detailed Statistics
Created: Tue, 09 Feb 2010, 13:56:21 EST by Vicki Percival on behalf of School of Nursing and Midwifery