The Blood Proteoform Atlas: A Reference Map of Proteoforms in Human Hematopoietic Cells

Bibliographic info

Authors: Rafael D. Melani†, Vincent R. Gerbasi†, Lissa C. Anderson†, Jacek W. Sikora, Timothy K. Toby, Josiah E. Hutton, David S. Butcher, Fernanda Negrão, Henrique S. Seckler, Kristina Srzentic, Luca Fornelli, Jeannie M. Camarillo, Richard D. LeDuc, Anthony J. Cesnik, Emma Lundberg, Joseph B. Greer, Ryan T. Fellers, Matthew T. Robey, Caroline J. DeHart, Eleonora Forte, Christopher L. Hendrickson, Susan E. Abbatiello, Paul M. Thomas, Andy I. Kokaji, Josh Levitsky*, Neil L. Kelleher* († equal contribution; * corresponding authors)
Journal: Science, 2022, vol. 375, pp. 411–418; DOI: 10.1126/science.aaz5284
Institutions: Northwestern University (Kelleher lab); NHMFL / Florida State University; Stanford University

Key question

Can deep top-down proteomics (TDP) create a comprehensive reference map (“Blood Proteoform Atlas”) of the primary structures of proteoforms across all human hematopoietic cell types, and are proteoforms more cell-type-specific than protein-level information?

Methods

Sample type: 21 human hematopoietic cell types — blood cells (T cells, B cells, NK cells, monocytes, macrophages, dendritic cells, neutrophils, pre-B cells) + plasma; sorted by FACS
Technique: Top-down proteomics (TDP) — deep, high-resolution intact protein LC-MS; Orbitrap and 21T FT-ICR instruments
Scale: ~30,000 unique proteoforms from 1,690 genes — nearly 10× more than previous largest TDP study
Quantification: Label-free TDP for B cell vs T cell comparison; FACS-sorted B cell subtypes
Bioinformatics: ProForma proteoform notation; PFR identifiers (e.g., PFR1033 = histone H4); t-SNE plots; accumulation curves; hierarchical clustering

Main findings

Scale achievement

~30,000 unique proteoforms identified — landmark achievement; proves deep TDP is feasible in human cells
Expressed from 1,690 genes — substantial fraction of expressed human proteome in blood
8.3% coverage of total human proteome; 16% of predicted proteome <30 kDa
Accumulation curve shows ~80% of possible protein IDs were captured — near-saturation for identified genes

Proteoforms vs proteins for cell type discrimination

Proteoforms are more cell-type-specific than proteins:
- Average protein found in 6.51 cell types; average proteoform found in only 2.19 cell types
- Mean unique proteoforms per cell type: 1,346 vs only 76 proteins
- Clustering distance for proteoforms: one order of magnitude higher than for proteins
- 58% of proteoforms found in only ONE cell type
t-SNE clustering: both protein and proteoform data cluster cell types similarly, but proteoform level provides higher specificity
Hematopoietic differentiation hierarchy correctly recapitulated at proteoform level

Quantitative TDP

Label-free comparison B cells (CD19+) vs T cells (CD3+): proteoform-level quantification successful
B cell subtypes (pre-B-I, pre-B-II, pre-B-III, memory B, naïve B): differentiated by proteoform profiles
Example: Histone H4 (PFR1033, UniProt P62805) — multiple modification states distinguishable

Key PTM examples detected

Histone modifications (acetylation, methylation, trimethylation) at cell-type-specific patterns
N-terminal acetylation; phosphorylation patterns
Multiple isoform combinations (splicing + PTMs)

Clinical context and connection to our work

Relevance to PTM-CQFD project: BPA provides the benchmark for what deep TDP can achieve in blood cells; PTM-CQFD aims to do the analogous thing in human serum (extracellular proteins, not cell proteomes)
Key distinction: BPA focuses on intracellular proteins in sorted cell populations; PTM-CQFD focuses on secreted/circulating proteins (plasma/serum) — different matrix, different challenges
Limitation note (as cited in PTM-CQFD project application): BPA approach requires cell sorting, large amounts of material, specialized equipment — not directly translatable to clinical serum samples. PTM-CQFD addresses the clinically accessible matrix gap.

Limitations

Cell type specificity demonstrated but not yet linked to clinical outcomes
Most proteoforms <30 kDa (smaller proteins better covered by TDP)
Not a serum/plasma study — clinical accessibility requires different approach (our group’s expertise)
No pharmacological or disease cohort — reference map only

Connections

Top-down proteomics — the Kelleher lab is the global leader in TDP methodology; BPA is the methodological benchmark
PTM-CQFD project — explicitly cited in the ImpactHealth application as the US atlas to complement with European serum-focused approach
HSA — not directly covered in BPA (serum protein, not intracellular) but establishes the TDP ecosystem
Bottom-up proteomics — BPA demonstrates why TDP is superior for proteoform resolution (BU cannot assign proteoforms accurately)

Take home notes

The BPA is the Kelleher lab’s landmark Science paper — defines the field of proteoform biology in blood. It is the key international benchmark cited in the PTM-CQFD grant application.
The core argument from the BPA for our work: proteoforms are fundamentally different from proteins and are far more cell/state-specific — this validates the entire top-down isoform approach of ALBOM and PTM-CQFD.
The fact that 58% of proteoforms are unique to ONE cell type is astonishing — if the same pattern holds for disease states, then PTM profiles could be exquisitely specific disease indicators.
Computational bottleneck: 30,000 proteoforms require sophisticated bioinformatics. PTM-CQFD explicitly plans AI integration — a direct analog to the data analysis challenge Kelleher faced.
One critical difference: BPA works with sorted cell populations (requiring leukapheresis/FACS). Our serum approach requires only a blood draw → orders of magnitude more clinically deployable.

Albuminomics

Explorer

The Blood Proteoform Atlas: A Reference Map of Proteoforms in Human Hematopoietic Cells

The Blood Proteoform Atlas: A Reference Map of Proteoforms in Human Hematopoietic Cells

Bibliographic info

Key question

Methods

Main findings

Scale achievement

Proteoforms vs proteins for cell type discrimination

Quantitative TDP

Key PTM examples detected

Clinical context and connection to our work

Limitations

Connections

Take home notes

Graph View

Table of Contents