Human Serum Albumin Profiling by Top-Down Analysis Enables Multi-Class Liver Fibrosis Staging: A Cross-Platform Validation Study
Bibliographic info
- Authors: Souleiman El Balkhi*, Racym Berrah*, François Ludovic Sauvage, Léa Le Du, Mohamad Ali Rahali, Roy Lakis, Pierre Marquet, Franck Saint-Marcoux, Véronique Loustaud-Ratti, Paul Carrier (*equal contribution)
- Journal: Scientific Reports (2026) — PUBLISHED, Article in Press
- DOI: 10.1038/s41598-026-57614-y
- Received: 9 April 2026 · Accepted: 8 June 2026 · Open Access (CC BY-NC-ND 4.0)
- Funding: local funding from CHU Limoges. Authors declare no conflicts of interest.
- Institution: P&T UMR1248 INSERM / CHU Limoges, France
- Manuscript file:
raw/my_work/ALBOM for SciRep0506_clean.docx; published PDFraw/... /s41598-026-57614-y_reference.pdf - Longitudinal / multicentric follow-up study: MALAHBAR (NCT06318949) — ongoing; 560 patients enrolled across 8 French CHUs (target >700); expected to validate the SEB test and isoform signature in 2027
Key question
Can the profile of circulating HSA isoforms, quantified by top-down LC-MS, non-invasively stage chronic liver disease across its full spectrum (F0/F1 → F4_C), and is this signature reproducible across LC-HR-MS platforms from different manufacturers?
Methods
Study design
- Type: Prospective, single-centre, cross-sectional
- Period: January 2021 – January 2023
- Centre: Department of Hepatology, CHU Limoges, France
Cohort
| Group | n | Notes |
|---|---|---|
| Healthy controls | 82 | No clinical or biological evidence of liver disease |
| CLD total | 172 | All fibrosis stages |
| — F0/F1 | 36 (20.9%) | Early fibrosis |
| — F2 | 23 (13%) | Significant fibrosis |
| — F3 | 30 (17%) | Advanced fibrosis |
| — F4_A | 37 (21.5%) | Compensated cirrhosis (Child-Pugh A) |
| — F4_B | 26 (15.1%) | Decompensated cirrhosis (Child-Pugh B, MELD ~15) |
| — F4_C | 20 (11.6%) | Decompensated cirrhosis (Child-Pugh C, MELD ~21) |
Etiology by fibrosis stage (from Table 1):
| Etiology | F0/F1 | F2 | F3 | F4_A | F4_B | F4_C | Total |
|---|---|---|---|---|---|---|---|
| MASH | 13 | 13 | 22 | 12 | 3 | 0 | 62 (36%) |
| Alcohol (ALD) | 1 | 0 | 0 | 9 | 16 | 14 | 40 (23%) |
| HBV | 10 | 1 | 2 | 2 | 0 | 0 | 15 (9%) |
| HCV | 2 | 3 | 1 | 5 | 0 | 0 | 11 (6%) |
| AIH | 3 | 2 | 4 | 1 | 0 | 1 | 11 (6%) |
| ALD+MASH | 0 | 0 | 0 | 2 | 5 | 4 | 11 (6%) |
| PBC | 3 | 0 | 0 | 0 | 0 | 0 | 3 |
| PSC | 0 | 2 | 0 | 0 | 0 | 0 | 2 |
| Cryptogenic | 0 | 0 | 1 | 3 | 0 | 0 | 4 |
| Other | 4 | 2 | 0 | 3 | 2 | 1 | 12 |
⚠️ Critical etiology–stage correlation: MASH dominates early/intermediate fibrosis (F0/F1–F3); ALD + ALD+MASH dominates decompensated cirrhosis (F4_B 85%, F4_C 90%). This confounds interpretation of isoform patterns by etiology.
Clinical decompensation in F4 patients (from Table 1):
- Average MELD: F4_A = 8 (compensated) | F4_B = 15 | F4_C = 21
- Ascites: F4_A 0 | F4_B 14 | F4_C 16 patients
- Hepatic encephalopathy (grade II/III): F4_A 1 | F4_B 7 | F4_C 10 patients
Staging methods used (from Table 2 — n patients, proportion):
| Stage | FibroScan | Liver biopsy | Both |
|---|---|---|---|
| F0/F1 | 35 (97%) | 4 (11%) | 3 (8%) |
| F2 | 23 (100%) | 5 (22%) | 5 (22%) |
| F3 | 28 (93%) | 12 (40%) | 10 (33%) |
| F4_A | 25 (68%) | 5 (13%) | 2 (5%) |
| F4_B | 3 (11%) | 6 (23%) | 1 (4%) |
| F4_C | 4 (20%) | 8 (40%) | 1 (5%) |
⚠️ F4_B and F4_C patients could not reliably undergo FibroScan (ascites, decompensation) — biopsy or clinical consensus was primary. This highlights why alternative non-invasive markers are most needed at these stages.
Analytical method
- Core technique: Top-down LC-HR-MS (intact HSA isoform profiling)
- Sample prep: 1:50 (v/v) plasma dilution in 0.9% NaCl; equine myoglobin (4 g/L final) as internal standard for mass recalibration and quantification
- Chromatography: C4 reverse-phase column; gradient elution
- Ionization: ESI-QTOF
- Mass range: 66,000–68,000 Da (deconvoluted)
- Quantification: absolute (g/L), using internal calibration method validated in Lakis et al. [ref 22]
- Platforms:
- Platform 1 (P1): Bruker timsTOF Pro2
- Platform 2 (P2): Sciex TripleTOF 5600+
Fibrosis staging method
Hierarchical approach: primary = transient elastography (FibroScan); confirmed by liver biopsy (METAVIR) when available; consensus diagnosis for cases where neither was feasible.
Machine learning
- Preprocessing: TIC normalization → Probabilistic Quotient Normalization (PQN)
- Feature selection: permutation importance RF; k=75 spectral features (66,000–68,000 Da) by 5-fold cross-validated QWK
- Model: Hierarchical OrdinalForest (HOF) — best of 3 architectures tested (RF, HRF, HOF)
- Combined model: 75 spectral features + 4 clinical variables (total protein, albumin, INR, bilirubin)
- Software: R v4.4.2 (ordinalForest, ranger, yardstick packages); RStudio 2025.05.1+513
Main findings
1. Native HSA declines with CLD severity
| Stage | Native HSA (g/L) | AUC vs controls |
|---|---|---|
| Controls | 12.2 | — |
| F0/F1 | 10.6 | 0.67 |
| F2 | 9.9 | — |
| F3 | 10.6 | — |
| F4_A | 10.2 | — |
| F4_B | 4.1 | 0.99 |
| F4_C | 4.2 | 0.89 |
Key finding: native HSA is reliable for decompensated cirrhosis (F4_B: AUC=0.99) but limited for early fibrosis.
2. Ten isoforms identified — three pattern types
Pattern A — Progressive decrease (truncation / irreversible oxidation)
- HSA-DA (N-terminal –Asp truncation): ↓ across all stages vs controls (~0.2 g/L in controls); earliest-onset change ⚠️ unexpected — may reflect accelerated clearance
- HSA-L (C-terminal truncation): no significant change
- HSA-DA+CYS (truncated-cysteinylated): ↓ F4_B and F4_C
- HSA+SO₃H (irreversible sulfonylation): ↓ F4_B and F4_C
Pattern B — Biphasic (moderate oxidative stress / glycation)
- HSA+CYS (cysteinylation Cys34): controls 8.7 g/L → peak F4_A 11.1 g/L → F4_B 7.9 g/L → F4_C 6.8 g/L
- HSA+GLYC (mono-glycation): similar biphasic; peak F3/F4_A
- HSA+CYS+GLYC (cysteinylated + glycated): peak 1.9 g/L at F3 → declines F4_B/C
- HSA+2GLYC (doubly glycated): biphasic; peak at F4_A
Pattern C — Progressive increase (multiply-modified, end-stage marker)
- HSA+CYS+2GLYC (doubly glycated + cysteinylated): ~undetectable in controls → F4_A 0.12 g/L → F4_B 0.18 g/L → F4_C 0.2 g/L ← marker of cumulative end-stage damage
3. Isoform ratios amplify diagnostic signal
Normalizing to native HSA resolves the apparent paradox of biphasic absolute concentrations:
| Ratio | Best discrimination | Sens | Spec |
|---|---|---|---|
| HSA+CYS/Native | F4_C vs controls | 65% | 99% |
| HSA+GLYC/Native | F4_B vs controls | 85% | 100% |
| HSA+GLYC/Native | F4_C vs controls | 70% | 99% |
| HSA+GLYC/Native | Controls vs F2 (earliest) | — | — |
| HSA+CYS+GLYC/Native | F4_B vs controls | 77% | 99% |
| HSA+CYS+GLYC/Native | F4_C vs controls | 70% | 99% |
4. PCA confirms stage-specific spectral fingerprint
PCA on the 66,000–67,500 Da mass region shows progressive separation of patient clusters from controls with each advancing stage (F0/F1 → F4_C), confirming the HSA molecular fingerprint evolves with disease.
5. OrdinalForest classifier outperforms FIB-4
| Metric | Platform 1 (Bruker) | Platform 2 (Sciex) |
|---|---|---|
| QWK | 0.862 (95% CI: 0.735–0.923) | 0.916 (95% CI: 0.822–0.964) |
| Accuracy (3-class triage) | 81.5% | — |
- FIB-4 comparator: 59.3% accuracy (3-class) → +26 percentage points for LC-TOF + Clinical model
- F4_C: near-perfect (4/4 both platforms); F4_A: most misclassified (2/7 P1) — reflects biological continuum of compensated cirrhosis
- FIB-4 gray zone (1.30–2.67): 62.5% of in-zone patients correctly triaged by ALBOM model ⚠️ preliminary (n=8)
6. Cross-platform reproducibility confirmed
- McNemar’s test on paired predictions: p = 0.149 → no significant difference between platforms
- Jaccard Similarity Index of errors: 0.696 → ~70% of misclassifications identical → errors are biologically driven, not instrument-specific
- Both classifiers: “substantial to near-perfect agreement” range (QWK overlapping CIs)
- ⚠️ Platform 1 baseline correction algorithm suppresses broad high-mass peaks (>67,500 Da) → partial attenuation of diagnostic signal for glycated species in F4_B/C — actionable optimization target
PTMs reported
| Protein | Isoform | PTM(s) | Mass shift (Da) | Pattern |
|---|---|---|---|---|
| HSA | Native | None (Cys34 free) | 0 | Progressive ↓ |
| HSA | HSA-DA | N-term –Asp truncation | −115 | ↓ all stages |
| HSA | HSA-L | C-term truncation | variable | No change |
| HSA | HSA+CYS | Cysteinylation (Cys34) | +119 | Biphasic |
| HSA | HSA+SO₃H | Sulfonylation (Cys34) | +48 | ↓ F4_B/C |
| HSA | HSA+GLYC | Glycation (Lys) | +162 | Biphasic |
| HSA | HSA-DA+CYS | Truncation + cysteinylation | −115 +119 | ↓ F4_B/C |
| HSA | HSA+CYS+GLYC | Cysteinylation + glycation | +281 | Biphasic |
| HSA | HSA+2GLYC | Double glycation | +324 | Biphasic |
| HSA | HSA+CYS+2GLYC | Cysteinylation + double glycation | +443 | Progressive ↑ (end-stage marker) |
Clinical context
- Disease: Liver fibrosis / Cirrhosis (full spectrum F0→F4_C)
- Clinical problem: Liver biopsy is invasive, sampling-dependent, not repeatable → need non-invasive staging
- Existing tools: Transient elastography (physical surrogate), FIB-4 (score), FibroTest, FibroMeter (panels)
- Our approach: HSA molecular fingerprint as direct biochemical window into liver function — not just a structural proxy
- Paradigm shift proposed: From anatomical/structural staging (METAVIR histology, elastometry stiffness) → functional and molecular staging via HSA PTM profile
Limitations
- Cross-sectional design: no intra-individual longitudinal tracking (MALAHBAR NCT06318949 underway)
- FibroScan limitations for F2/F3 separation — explains cluster dispersion in intermediate stages
- Etiology distribution not uniform across severity groups (MASH + ALD predominate) — confounder risk
- Platform 1 baseline correction suppresses high-mass glycated species >67,500 Da — must be controlled
- Gray zone FIB-4 analysis is hypothesis-generating only (n=8)
Figures
Main figures
| Figure | File | Description |
|---|---|---|
| Fig 1 | raw/assets/Fig 1.jpg | Native HSA concentration (dot plots, both platforms) + ROC curves by stage |
| Fig 2 | raw/assets/Fig 2.jpg | Nine modified isoform concentrations across CLD stages |
| Fig 3 | raw/assets/Fig 3.jpg | Normalized isoform ratios (CYS/Native, GLYC/Native, CYS+GLYC/Native) + ROC curves |
| Fig 4 | raw/assets/Fig_4_300dpi.png | PCA of full spectral profile (66,000–67,500 Da); pairwise control vs. each stage |
| Fig 5 | raw/assets/Fig_5_AB_fixed.png | Confusion matrices for OrdinalForest (Platform 1 and Platform 2) |
| Fig 6 | (not in assets) | Patient triage scatter (FIB-4 vs ALBOM, colored by correct/incorrect) |
Supplemental figures (from raw/my_work/Supplemental data.docx)
| Figure | Description |
|---|---|
| Fig S1 | Classical markers by stage: routine albumin (g/L), FIB-4, AST (U/L), total bilirubin (µmol/L), ALT (U/L) |
| Fig S2 | ROC curves for classical markers (routine albumin, AST, bilirubin) and native HSA by LC-HR-MS vs controls |
| Fig S3 | Feature importance lollipop chart — top 30 OrdinalForest predictors. Top 4 = clinical variables (total protein, albumin, INR, bilirubin); spectral peaks cluster in two sub-regions: ~66,230–66,600 Da (native/cysteinylated) and ~67,024–67,457 Da (poly-glycated) |
| Fig S4 | Bootstrap QWK density plots (1,000 iterations) — Bruker vs Sciex; overlapping distributions confirm platform independence |
| Fig S5 | Cross-platform scatter — patient-level predictions Bruker vs Sciex; alignment along identity line; Jaccard=0.696, McNemar p=0.149 |
Classical marker values (from Supplemental data)
| Marker | Controls (n=19) | F0/F1 | F2 | F3 | F4_A | F4_B | F4_C |
|---|---|---|---|---|---|---|---|
| Routine albumin (g/L) | 45 | 47.5 | 44.6 | 42.7 | 40.7 | 29.9 | 26.4 |
| FIB-4 | Low | Low | ↑ | ↑ | ↑↑ | ↑↑↑ | ↑↑↑↑ |
| AST (U/L) | Low | Low | Low | Low | ↑↑ (sig vs F0/F1) | ↑↑↑ | ↑↑↑ |
| Bilirubin (µmol/L) | Low | Low | Low | Low | Low | ↑↑↑↑ | ↑↑↑↑↑ |
| ALT | — | — | — | — | — | — | Not discriminatory |
Routine albumin becomes abnormal only in F4 (all sub-classes), significantly later and less granularly than native HSA by LC-HR-MS which shows AUC=0.99 for F4_B.
Connections
- HSA — sole protein studied; 10 isoforms quantified
- ALBOM study — this manuscript IS the ALBOM study publication
- Glycation — key PTM; biphasic pattern; ratio to native amplifies diagnostic signal
- Cysteinylation — key PTM at Cys34; biphasic pattern
- Liver fibrosis — disease context; CLD staging F0→F4_C
- Top-down proteomics — core analytical method; two platforms (Bruker timsTOF Pro2, Sciex TripleTOF 5600+)
- CQFD-PTM pipeline — quantification pipeline underlying the HSA isoform quantification
- OrdinalForest classification — ML model used for multivariate staging
- hsa-isoforms-cld — biomarker summary page
- DILI — related liver disease context (drug-induced liver injury is a cause of CLD)
My notes
- The biphasic paradox is the key conceptual contribution: absolute concentrations of CYS and GLYC isoforms peak in compensated cirrhosis then fall in decompensation, because the substrate (native HSA) is depleted. Ratios (isoform/native) resolve this and reveal the true signal. Any future study comparing absolute vs. ratio metrics should cite this.
- HSA+CYS+2GLYC is the end-stage marker to watch: monotonically increasing, near-zero in controls, measurable in all F4 sub-classes. Excellent biological specificity.
- HSA-DA anomaly: N-terminally truncated form is lower in patients than controls at all stages — counterintuitive for a damage product. Worth investigating: is this a clearance effect, or is the control population exceptional?
- FIB-4 gray zone: the 62.5% classification rate in the indeterminate zone is exciting but n=8 is underpowered. MALAHBAR will be key.
- Platform 1 caveat: the Bruker baseline correction algorithm geometrically misidentifies broad poly-glycated peaks as baseline drift. This is a fixable technical artifact, not a biological limitation.
- Etiology confound is real: MASH patients dominate F0/F1–F3; ALD dominates F4_B/C. If MASH and ALD have different isoform profiles (plausible given different oxidative stress mechanisms), then the “stage effect” may partly be an “etiology effect” in the cirrhosis sub-classes. Etiology-stratified analysis is a key gap.
- FibroScan failure in decompensated patients: F4_B/C patients mostly staged by biopsy or clinical consensus (FibroScan only 11–20%) — exactly the group where a non-invasive blood test would add the most value.
- Feature importance (S3): the two spectral sub-regions driving the classifier are ~66,230–66,600 Da (native + cysteinylated isoforms, early-to-mid disease signal) and ~67,024–67,457 Da (poly-glycated adducts, advanced disease signal). This is consistent with the biological 3-pattern model.