The Blood Proteoform Atlas: A Reference Map of Proteoforms in Human Hematopoietic Cells
Bibliographic info
- Authors: Rafael D. Melani†, Vincent R. Gerbasi†, Lissa C. Anderson†, Jacek W. Sikora, Timothy K. Toby, Josiah E. Hutton, David S. Butcher, Fernanda Negrão, Henrique S. Seckler, Kristina Srzentic, Luca Fornelli, Jeannie M. Camarillo, Richard D. LeDuc, Anthony J. Cesnik, Emma Lundberg, Joseph B. Greer, Ryan T. Fellers, Matthew T. Robey, Caroline J. DeHart, Eleonora Forte, Christopher L. Hendrickson, Susan E. Abbatiello, Paul M. Thomas, Andy I. Kokaji, Josh Levitsky*, Neil L. Kelleher* († equal contribution; * corresponding authors)
- Journal: Science, 2022, vol. 375, pp. 411–418; DOI: 10.1126/science.aaz5284
- Institutions: Northwestern University (Kelleher lab); NHMFL / Florida State University; Stanford University
Key question
Can deep top-down proteomics (TDP) create a comprehensive reference map (“Blood Proteoform Atlas”) of the primary structures of proteoforms across all human hematopoietic cell types, and are proteoforms more cell-type-specific than protein-level information?
Methods
- Sample type: 21 human hematopoietic cell types — blood cells (T cells, B cells, NK cells, monocytes, macrophages, dendritic cells, neutrophils, pre-B cells) + plasma; sorted by FACS
- Technique: Top-down proteomics (TDP) — deep, high-resolution intact protein LC-MS; Orbitrap and 21T FT-ICR instruments
- Scale: ~30,000 unique proteoforms from 1,690 genes — nearly 10× more than previous largest TDP study
- Quantification: Label-free TDP for B cell vs T cell comparison; FACS-sorted B cell subtypes
- Bioinformatics: ProForma proteoform notation; PFR identifiers (e.g., PFR1033 = histone H4); t-SNE plots; accumulation curves; hierarchical clustering
Main findings
Scale achievement
- ~30,000 unique proteoforms identified — landmark achievement; proves deep TDP is feasible in human cells
- Expressed from 1,690 genes — substantial fraction of expressed human proteome in blood
- 8.3% coverage of total human proteome; 16% of predicted proteome <30 kDa
- Accumulation curve shows ~80% of possible protein IDs were captured — near-saturation for identified genes
Proteoforms vs proteins for cell type discrimination
- Proteoforms are more cell-type-specific than proteins:
- Average protein found in 6.51 cell types; average proteoform found in only 2.19 cell types
- Mean unique proteoforms per cell type: 1,346 vs only 76 proteins
- Clustering distance for proteoforms: one order of magnitude higher than for proteins
- 58% of proteoforms found in only ONE cell type
- t-SNE clustering: both protein and proteoform data cluster cell types similarly, but proteoform level provides higher specificity
- Hematopoietic differentiation hierarchy correctly recapitulated at proteoform level
Quantitative TDP
- Label-free comparison B cells (CD19+) vs T cells (CD3+): proteoform-level quantification successful
- B cell subtypes (pre-B-I, pre-B-II, pre-B-III, memory B, naïve B): differentiated by proteoform profiles
- Example: Histone H4 (PFR1033, UniProt P62805) — multiple modification states distinguishable
Key PTM examples detected
- Histone modifications (acetylation, methylation, trimethylation) at cell-type-specific patterns
- N-terminal acetylation; phosphorylation patterns
- Multiple isoform combinations (splicing + PTMs)
Clinical context and connection to our work
- Relevance to PTM-CQFD project: BPA provides the benchmark for what deep TDP can achieve in blood cells; PTM-CQFD aims to do the analogous thing in human serum (extracellular proteins, not cell proteomes)
- Key distinction: BPA focuses on intracellular proteins in sorted cell populations; PTM-CQFD focuses on secreted/circulating proteins (plasma/serum) — different matrix, different challenges
- Limitation note (as cited in PTM-CQFD project application): BPA approach requires cell sorting, large amounts of material, specialized equipment — not directly translatable to clinical serum samples. PTM-CQFD addresses the clinically accessible matrix gap.
Limitations
- Cell type specificity demonstrated but not yet linked to clinical outcomes
- Most proteoforms <30 kDa (smaller proteins better covered by TDP)
- Not a serum/plasma study — clinical accessibility requires different approach (our group’s expertise)
- No pharmacological or disease cohort — reference map only
Connections
- Top-down proteomics — the Kelleher lab is the global leader in TDP methodology; BPA is the methodological benchmark
- PTM-CQFD project — explicitly cited in the ImpactHealth application as the US atlas to complement with European serum-focused approach
- HSA — not directly covered in BPA (serum protein, not intracellular) but establishes the TDP ecosystem
- Bottom-up proteomics — BPA demonstrates why TDP is superior for proteoform resolution (BU cannot assign proteoforms accurately)
Take home notes
- The BPA is the Kelleher lab’s landmark Science paper — defines the field of proteoform biology in blood. It is the key international benchmark cited in the PTM-CQFD grant application.
- The core argument from the BPA for our work: proteoforms are fundamentally different from proteins and are far more cell/state-specific — this validates the entire top-down isoform approach of ALBOM and PTM-CQFD.
- The fact that 58% of proteoforms are unique to ONE cell type is astonishing — if the same pattern holds for disease states, then PTM profiles could be exquisitely specific disease indicators.
- Computational bottleneck: 30,000 proteoforms require sophisticated bioinformatics. PTM-CQFD explicitly plans AI integration — a direct analog to the data analysis challenge Kelleher faced.
- One critical difference: BPA works with sorted cell populations (requiring leukapheresis/FACS). Our serum approach requires only a blood draw → orders of magnitude more clinically deployable.