Integration of Open Access Literature into the Protein Data Bank Using BioLit

Prlic, Andreas, Martinez, Marco A., Dimitropoulos, Dimitris, Beran, Bojan, Yukich, Benjamin T., Rose, Peter W., Bourne, Philip E. and Fink, J. Lynn (2010) Integration of Open Access Literature into the Protein Data Bank Using BioLit. Bmc Bioinformatics, 11 . doi:10.1186/1471-2105-11-220

Author Prlic, Andreas
Martinez, Marco A.
Dimitropoulos, Dimitris
Beran, Bojan
Yukich, Benjamin T.
Rose, Peter W.
Bourne, Philip E.
Fink, J. Lynn
Title Integration of Open Access Literature into the Protein Data Bank Using BioLit
Journal name Bmc Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2010-04-29
Sub-type Article (original research)
DOI 10.1186/1471-2105-11-220
Open Access Status DOI
Volume 11
Total pages 5
Place of publication London, United Kingdom
Publisher BioMed Central
Collection year 2015
Language eng
Formatted abstract
Background: Biological data have traditionally been stored and made publicly available through a variety of on-line
databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now online
and providing an increasing amount of open access content, often free of copyright restriction, this distinction
between database and literature is blurring. To exploit this opportunity we present the integration of open access
literature with the RCSB Protein Data Bank (PDB).

Results: BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases,
based on the content of the article. For example, words matching to existing biological ontologies are highlighted and
database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are
mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and
exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using
these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and
text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of
the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly
accessible. A client library is provided that supports querying these services (Java).

Conclusions: The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a
broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB
structure even if it is not formally cited in the paper. Other structures related through the same literature references can
also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and
literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both
database and literature content.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: UQ Diamantina Institute Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 5 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 7 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Mon, 15 Dec 2014, 16:28:35 EST by Dr Lynn Fink on behalf of UQ Diamantina Institute