ProQuant™ is precise, but is it accurate as well?

All of our experimental optimisations during the development of the ProQuant™ proteomics platform were aimed at improving the precision of the method. We have already shown the marked effect that our proprietary optimisations have had:

But there’s a difference between precision and accuracy. There’s a good summary on Wikipedia if you want to delve a bit deeper, but put simply…

Precision is a measure of how close measurements are to each other
Accuracy is a measure of how close measurements are to the true value

By being highly precise, ProQuant™ gives you reproducible and reliable measurements of protein and peptide abundances and PTM fractional modifications. But how close are the measurements to the true value. This is where things get difficult of course, because knowing the true value of anything we attempt to measure in science is notoriously difficult.  The approach we took was to compare the results from one of our non-hypothesis driven analyses of human serum with the concentrations of those proteins we obtained from the scientific literature.

It’s not a surprise to find that there is an association between these two datasets, but what shows the superior performance of ProQuant™ is how much better the association is with our data compared with two renowned online databases:

a comprehensive absolute protein abundance database maintained by the Bioinformatics / Systems Biology group at the University of Zurich.

a compendium of results from MS proteomics datasets published by the human proteome organisation.

The conclusion from this is clear.  Proteomics analysis using ProQuant™ generates data with higher accuracy than the proteomics methods used by PAXdb and PeptideAtlas.

Details of the methods used

For those that are interested in the details of this analysis the methods used for this analysis were as follows. Firstly, data was retrieved from three different sources:

    1. A non-hypothesis driven analysis of human serum using ProQuant™ following depletion of seven abundant serum proteins. Using our standard methodology only proteins with at least two unique peptides were included.
    2. The “H.sapiens – Serum, SC (Peptideatlas,jul,2021)” dataset downloaded from the PAXdb protein abundance database on 06 Dec 2022.
    3. The “Human Plasma 2021-07 build” dataset downloaded from the PeptideAtlas protein abundance database on 06 Dec 2022.

The concentrations of proteins in the serum proteome from the scientific literature was collated by a scientist who had no access to any of the above datasets (Thank you Becca!). One of the key sources of data used was the Geigy Scientific Tables (vol 3) but where data was not available there a brief review of the scientific literature was undertaken. Where multiple sources gave similar concentrations for the serum concentration of a given protein the median value was taken, but where it was clear that there were considerable discrepancies in the literature or the data was hard to find that protein was excluded.

Having collated and aligned all of the data, two further restrictions were made prior to analysis.

Firstly, proteins that are composed of multiple polypeptide chains, or can be found in multiple forms comprising different polypeptide chains were excluded. For example, proteomics data for haemoglobin are output as HBA_HUMAN and HBB_HUMAN, whereas the literature combines this into a single ‘haemogloblin’ protein concentration. Similarly, members of the complement and clotting cascades are found in multiple forms in plasma, with the predominant form often comprising only some of the regions of the protein detected by proteomics methods.

Secondly, the ProQuant™ analysis of the human serum proteome was carried out following abundant protein depletion, so those proteins were excluded from the analysis.

At this point the data was filtered to only complete cases – where we have ‘predicted concentrations’ from the literature and data from all three proteomics datasets.  In this analysis we were left with 137 proteins.

r² shown on the graphs are Pearson’s r² from analysis of the logged data.

 

Read more case studies

Get in touch

Call Us 01223 839557
Email Us [email protected]

Visit Us

RxCelerate Ltd
Dorothy Hodgkin Building,
Babraham Research Campus,
Babraham, Cambridge CB22 3FH

rxcelerate