October 10, 2005
Similar papers 2
June 29, 2016
Despite the recent advances in mass spectrometry (MS), summarizing and analyzing high-throughput mass-spectrometry data remains a challenging task. This is, on the one hand, due to the complexity of the spectral signal which is measured, and on the other, due to the limit of detection (LOD). The LOD is related to the limitation of instruments in measuring markers at a relatively low level. As a consequence, the outcome data set from the quantification step of proteomic analys...
November 13, 2023
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural a...
June 11, 2015
Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on thi...
July 25, 2008
Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforwa...
August 17, 2021
All human diseases involve proteins, yet our current tools to characterize and quantify them are limited. To better elucidate proteins across space, time, and molecular composition, we provide provocative projections for technologies to meet the challenges that protein biology presents. With a broad perspective, we discuss grand opportunities to transition the science of proteomics into a more propulsive enterprise. Extrapolating recent trends, we offer potential futures for ...
June 4, 2020
Motivated by an open problem of validating protein identities in label-free shotgun proteomics work-flows, we present a testing procedure to validate class/protein labels using available measurements across instances/peptides. More generally, we present a solution to the problem of identifying instances that are deemed, based on some distance (or quasi-distance) measure, as outliers relative to the subset of instances assigned to the same class. The proposed procedure is non-...
October 30, 2019
The multiple-biomarker classifier problem and its assessment are reviewed against the background of some fundamental principles from the field of statistical pattern recognition, machine learning, or the recently so-called "data science". A narrow reading of that literature has led many authors to neglect the contribution to the total uncertainty of performance assessment from the finite training sample. Yet the latter is a fundamental indicator of the stability of a classifi...
April 16, 2013
This paper describes and compares two methods for estimating the variance function associated with iTRAQ (isobaric tag for relative and absolute quantitation) isotopic labeling in quantitative mass spectrometry based proteomics. Measurements generated by the mass spectrometer are proportional to the concentration of peptides present in the biological sample. However, the iTRAQ reporter signals are subject to errors that depend on the peptide amounts. The variance function of ...
October 25, 2018
There has been a significant increase in the number of diagnostic and prognostic models published in the last decade. Testing such models in an independent, external validation cohort gives some assurance the model will transfer to a naturalistic, healthcare setting. Of 2,147 published models in the PubMed database, we found just 120 included some kind of separate external validation cohort. Of these studies not all were sufficiently well documented to allow a judgement about...
December 23, 2024
Deep learning is an advanced technology that relies on large-scale data and complex models for feature extraction and pattern recognition. It has been widely applied across various fields, including computer vision, natural language processing, and speech recognition. In recent years, deep learning has demonstrated significant potential in the realm of proteomics informatics, particularly in deciphering complex biological information. The introduction of this technology not o...