A method for evaluating quality of clustering DNA fragments encoded in different nucleotide frequencies

Chan, Chon-Kit Kenneth, Hsu, Arthur L., Tang, Sen-Lin and Halgamuge, Saman K. (2007). A method for evaluating quality of clustering DNA fragments encoded in different nucleotide frequencies. In: Daniel Howard, Phill Kyu Rhee, Saman Halgamuge, Seong-Joon Yoo and Bob Werner, Proceedings of the Frontiers in the Convergence of Bioscience and Information Technologies FBIT 2007 International Conference. FBIT 2007: Frontiers in the Convergence of Bioscience and Information Technologies 2007, Jeju Island, Korea, (60-63). 11-13 October 2007.


Author Chan, Chon-Kit Kenneth
Hsu, Arthur L.
Tang, Sen-Lin
Halgamuge, Saman K.
Title of paper A method for evaluating quality of clustering DNA fragments encoded in different nucleotide frequencies
Conference name FBIT 2007: Frontiers in the Convergence of Bioscience and Information Technologies 2007
Conference location Jeju Island, Korea
Conference dates 11-13 October 2007
Proceedings title Proceedings of the Frontiers in the Convergence of Bioscience and Information Technologies FBIT 2007 International Conference
Journal name Proceedings of the Frontiers in the Convergence of Bioscience and Information Technologies
Place of Publication Los Alamitos, CA, U.S.A.
Publisher The Institute of Electrical and Electronics Engineers (IEEE) Computer Society
Publication Year 2007
Sub-type Fully published paper
DOI 10.1109/FBIT.2007.70
ISBN 9780769529998
0769529992
Editor Daniel Howard
Phill Kyu Rhee
Saman Halgamuge
Seong-Joon Yoo
Bob Werner
Start page 60
End page 63
Total pages 4
Language eng
Formatted Abstract/Summary The whole-genome shotgun sequencing technique has been successfully applied to environmental genomes. However, a considerable amount of DNA sequences and small contigs remain generally unassembled after the shotgun sequencing. Binning is a step of grouping these sequences based on some biological and molecular features. The combination of oligonucleotide frequency and Self-Organising Maps (SOM) clustering algorithm shows high potential as a compositional binning tool. As the previous work did not provide methods for assessing results, we proposed a systematic quantitative method to evaluate the clustering results specifically for this type of application. We used this method to investigate the suitability of each of di, tri, tetra and pentanucleotide frequencies as training feature for this binning technique. The results show that dinucleotide frequency is unable to bin Wkb DNA sequence fragments into well-clustered species groups. Furthermore, we noticed that increasing order of oligonucleotide frequency may deteriorate the assignment of DNA sequences to classes in our test, which indicates the possible existence of optimal species-specific oligonucleotide frequency. Results suggest that using trinucleotide frequency for the combination of oligonucleotide frequency and SOM as a binning process gives sufficiently good clustering quality in this case.
Q-Index Code E1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Conference Paper
Collection: School of Agriculture and Food Sciences
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 1 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Access Statistics: 46 Abstract Views  -  Detailed Statistics
Created: Tue, 22 Mar 2011, 23:18:02 EST by Dr Chon-Kit Kenneth Chan on behalf of School of Agriculture and Food Sciences