Different software programs can be used to obtain the identification and quantification of peptides and proteins present in the sample (e.g., Proteome Discoverer for TMT analyses, MaxQuant for DDA analyses, DIA-NN for DIA analyses, Skyline for targeted analyses).
Identification
Peptide identification is based on comparing data from mass spectrometers with protein sequences available in public databases (Uniprot.org). For species with incomplete proteomes in the databases, alternative methods can be applied, such as using databases of closely related species or genomic data.
Peptides are then grouped by protein based on the principle of parsimony, taking into account the different isoforms present in the database. Therefore, groups of proteins are considered rather than individual proteins.
Lists of identified peptides and proteins are validated at 1% FDR using the target-decoy method, which estimates the number of false positives in the identification list.
Quantification
For quantification analyses, two methods are possible depending on whether it is a TMT-labeled experiment or an label-free quantification (LFQ) experiment.
For TMT analyses, the software extracts intensities of reporter ions for each peptide. Each of these ions provides the intensity of the peptide in the different samples.
For LFQ analyses (and targeted analyses), the software aligns different injections from the same experiment and uses spectral data to reconstruct the elution profile of each peptide in each sample (i.e., in each analysis). Integration of the area under the elution peak allows obtaining a quantification value.
To maximize quantification data, information is cross-referenced between different injections to search for signals corresponding to the same peptides even if they are of low intensity (match between runs).
In all cases, the software performs data normalization specific to proteomic data based on the entire signal observed in each sample. It is also possible to obtain non-normalized data to apply other normalization methods.
Data Representation and Statistics
For quantitative analyses, we use R software to obtain graphical representations and statistics from data generated by proteomic software.
We produce principal component analysis (PCA) and heatmaps to visualize the overall proteomic profile of each sample and assess variability between them.
We can also perform comparisons between two conditions/groups of interest (pairwise analysis) by calculating a ratio and a statistical test (Welch test, Limma) for each of the obtained proteins. Ratios are centered (z-score calculation) and p-values from the statistical test are adjusted for multiple testing using the Benjamini-Hochberg method to obtain q-values. Proteins with |z|>1.96 and q < 0.05 are considered significantly regulated between groups. This information is represented in volcano plots.
Other tests and graphical representations can also be obtained from proteomic data as needed for the project (ANOVA, box plots, correlation plots, clustering).
Other Bioinformatics Analyses
Upon request, we can also perform more advanced bioinformatics analyses such as functional analyses (Gene Ontology, KEGG pathways), interaction network analyses, biomarker signature searches, or multi-omics analyses.
For this, we closely collaborate with the bioinformatics platform of the Genomics Center