Show simple item record

dc.contributor.advisorBork, Peer
dc.contributor.advisorValencia Herrera, Alfonso
dc.contributor.authorK Shah, Parantu
dc.contributor.otherUAM. Departamento de Biología Moleculares_ES
dc.date.accessioned2016-10-11T08:32:43Z
dc.date.available2016-10-11T08:32:43Z
dc.date.issued2005-12-21
dc.identifier.urihttp://hdl.handle.net/10486/674061
dc.descriptionTesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 21-12-2005es_ES
dc.description.abstractFunction annotation in the genomic context is one of the major challenges faciog the discipline of Bioinfomiatics today. Seqwnces of entire genomes are continuously being deposited in public databws waiting to be analyzed and annotated. Computational methods and data wming out fmm various types of high-throughput experiments are now being used to assist in huictional annotations and knowledge discovery. Published findings mostly analyzing mles of individual genes are used for gene annotations. Similarly. curated sets of facts established in the literature are required in order to check the quality of computational methods and analysis of high-thmughput data. Hena. there is a great demand for infotmation exhaction tools to extnct structured information about gene and gene pmducts fmm scientific literature automaticaily and prepare hiowledgebases. Before one sets on to devise tools for infonnation extraction fmm scientific literature, several questions must be answered. Where does the useful infonnation reside? 1s this information structuredenough to be exhacted? What tools should be utilized for accurate retneval and exmtion of infomiation? Also, how useful mining of information form biomedical texts is for advancing level of present knowledge? Moreover, suitabitity of tools developed for processing of general Englih should also be checked for their usability for biomedical iexts. The work presented in this thesis nies to answer questions poscd above. Keyword-based analysis of full-text articles from Nature genetlcs was carried out in order to analyze and compare the distribution of information in different sections of papers. Keyword based methods while very useful to explore the overall struciure and article contents don't provide exact relationships memioned in the literature. Biologically importmt events and relationships can only be extracted usipg the BtnictUred templates based on contents of sentences descnbing events of interest, which is a non-tivial task. The potential of predicate argument stnictures for providing semantic templates for accurate information extraction was explorcd for verbs describing gene expression. molecular interactions and signal hansduction. Predicate argument structures (PAS) was d&ed for important verbs by analyzing sentences fmm Abstracts as well as full-text aiticles; they were then compared systematically with PropBank PAS for general English in order to characterize domain specific usage of predicates in biomedical texts. A database of transcnpt diversity was genented using a composite procedure that combined retneval of appmpriate sentences from MEDLINE and extncting information using niles basad on PAS. Suppon vector machines proved to be the best sentence categorization/retrievaI method when compared to other retneval methods. LSAT - a database of altemative tnnscnpts was generated after the PAS based information extraction sep. lnformation miding in LSAT was utiüzed for MeSH term and gene annotations, and studying about the extent of synergy and preferente of different transcript diversity generating mechanisms by different organ systems.en
dc.format.extent98 pag.es_ES
dc.format.mimetypeapplication/pdfen
dc.language.isoengen
dc.subject.otherBioinformática - Tesis doctoraleses_ES
dc.titleData mining from scientific literatureen
dc.typedoctoralThesisen
dc.subject.ecienciaBiología y Biomedicina / Biologíaes_ES
dc.rights.accessRightsclosedAccessen
dc.facultadUAMFacultad de Ciencias


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record