Métodos de validación de identificaciones a gran escala de proteínas y desarrollo e implementación de estándares en Proteómica
Entity
UAM. Departamento de Biología MolecularDate
2013-10-12Subjects
Proteínas - Análisis - Tesis doctorales; Biología y Biomedicina / BiologíaNote
Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 12-10-2013
Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
High throughput identification of peptides in databases from tandem mass spectrometry
data is a key technique in modern proteomics. Common approaches to interpret large scale
peptide identification results are based on the statistical analysis of average score
distributions, which are constructed from the set of best scores produced by large collections
of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate
individual peptide identification probabilities on the basis of theoretical models or from singlespectrum
score distributions constructed by the set of scores produced by each MS/MS
spectrum. In this work, we study the mathematical properties of average SEQUEST score
distributions by introducing the concept of spectrum quality and expressing these average
distributions as compositions of single‐spectrum distributions. Our analysis leads to a novel
indicator, the probability ratio, a non‐parametric and robust indicator that makes spectra
classification according to parameters such as charge state unnecessary and allows a peptide
identification performance, on the basis of false discovery rates, that is better than that
obtained by other empirical statistical approaches. We also developed another method based
on the construction of single‐spectrum SEQUEST score distributions. These results make the
robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a
very attractive alternative to determine peptide identification confidences and error rates in
high throughput experiments.
On the other hand, recent developments of HUPO‐PSI (Proteomics Standards Initiative)
standard data formats and MIAPE guidelines (Minimum Information About a Proteomics
Experiment) are certainly contributing to proteomics data‐sharing within the scientific
community. In addition, specialized journals have emphasized the use of these standards and
guidelines to facilitate the evaluation and publication of new articles. However, there is an
evident lack of bioinformatics tools specifically designed to manage these standards containing
the required information and its connectivity with the proteomics pipeline. In this work we
describe the development of a set tools based on PSI standards and MIAPE guidelines, such as
semantic and MIAPE validators of proteomics standard data files, a proteomics experiment
repository based on MIAPE guidelines, a Java library for the management and extraction of
MIAPE information from standard data files and a tool for a complete proteomics data analysis
workflow allowing the aggregation, filtering and inspection of large amount of data, as well as
its dissemination by preparing a complete ProteomeXchange submission. Additionally, here we
also present the contribution for the definition of the MIAPE guidelines for quantitative
Proteomics experiments, receptly accepted as a new global standard for the Proteomics
community.
Files in this item
Google Scholar:Martínez de Bartolomé Izquierdo, Salvador
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Related items
Showing items related by title, author, creator and subject.
-
Ultrasonografía de arterias carótidas en ictus isquémico: validación de escala de grises para la identificación de placas de ateroma inestables
Ruiz Ares, Adalberto Gerardo
2012