Human Serum Albumin Profiling by Top-Down Analysis Enables Multi-Class Liver Fibrosis Staging: A Cross-Platform Validation Study

Bibliographic info

  • Authors: Souleiman El Balkhi*, Racym Berrah*, François Ludovic Sauvage, Léa Le Du, Mohamad Ali Rahali, Roy Lakis, Pierre Marquet, Franck Saint-Marcoux, Véronique Loustaud-Ratti, Paul Carrier (*equal contribution)
  • Journal: Scientific Reports (2026) — PUBLISHED, Article in Press
  • DOI: 10.1038/s41598-026-57614-y
  • Received: 9 April 2026 · Accepted: 8 June 2026 · Open Access (CC BY-NC-ND 4.0)
  • Funding: local funding from CHU Limoges. Authors declare no conflicts of interest.
  • Institution: P&T UMR1248 INSERM / CHU Limoges, France
  • Manuscript file: raw/my_work/ALBOM for SciRep0506_clean.docx; published PDF raw/... /s41598-026-57614-y_reference.pdf
  • Longitudinal / multicentric follow-up study: MALAHBAR (NCT06318949) — ongoing; 560 patients enrolled across 8 French CHUs (target >700); expected to validate the SEB test and isoform signature in 2027

Key question

Can the profile of circulating HSA isoforms, quantified by top-down LC-MS, non-invasively stage chronic liver disease across its full spectrum (F0/F1 → F4_C), and is this signature reproducible across LC-HR-MS platforms from different manufacturers?

Methods

Study design

  • Type: Prospective, single-centre, cross-sectional
  • Period: January 2021 – January 2023
  • Centre: Department of Hepatology, CHU Limoges, France

Cohort

GroupnNotes
Healthy controls82No clinical or biological evidence of liver disease
CLD total172All fibrosis stages
— F0/F136 (20.9%)Early fibrosis
— F223 (13%)Significant fibrosis
— F330 (17%)Advanced fibrosis
— F4_A37 (21.5%)Compensated cirrhosis (Child-Pugh A)
— F4_B26 (15.1%)Decompensated cirrhosis (Child-Pugh B, MELD ~15)
— F4_C20 (11.6%)Decompensated cirrhosis (Child-Pugh C, MELD ~21)

Etiology by fibrosis stage (from Table 1):

EtiologyF0/F1F2F3F4_AF4_BF4_CTotal
MASH131322123062 (36%)
Alcohol (ALD)1009161440 (23%)
HBV101220015 (9%)
HCV23150011 (6%)
AIH32410111 (6%)
ALD+MASH00025411 (6%)
PBC3000003
PSC0200002
Cryptogenic0013004
Other42032112

⚠️ Critical etiology–stage correlation: MASH dominates early/intermediate fibrosis (F0/F1–F3); ALD + ALD+MASH dominates decompensated cirrhosis (F4_B 85%, F4_C 90%). This confounds interpretation of isoform patterns by etiology.

Clinical decompensation in F4 patients (from Table 1):

  • Average MELD: F4_A = 8 (compensated) | F4_B = 15 | F4_C = 21
  • Ascites: F4_A 0 | F4_B 14 | F4_C 16 patients
  • Hepatic encephalopathy (grade II/III): F4_A 1 | F4_B 7 | F4_C 10 patients

Staging methods used (from Table 2 — n patients, proportion):

StageFibroScanLiver biopsyBoth
F0/F135 (97%)4 (11%)3 (8%)
F223 (100%)5 (22%)5 (22%)
F328 (93%)12 (40%)10 (33%)
F4_A25 (68%)5 (13%)2 (5%)
F4_B3 (11%)6 (23%)1 (4%)
F4_C4 (20%)8 (40%)1 (5%)

⚠️ F4_B and F4_C patients could not reliably undergo FibroScan (ascites, decompensation) — biopsy or clinical consensus was primary. This highlights why alternative non-invasive markers are most needed at these stages.

Analytical method

  • Core technique: Top-down LC-HR-MS (intact HSA isoform profiling)
  • Sample prep: 1:50 (v/v) plasma dilution in 0.9% NaCl; equine myoglobin (4 g/L final) as internal standard for mass recalibration and quantification
  • Chromatography: C4 reverse-phase column; gradient elution
  • Ionization: ESI-QTOF
  • Mass range: 66,000–68,000 Da (deconvoluted)
  • Quantification: absolute (g/L), using internal calibration method validated in Lakis et al. [ref 22]
  • Platforms:
    • Platform 1 (P1): Bruker timsTOF Pro2
    • Platform 2 (P2): Sciex TripleTOF 5600+

Fibrosis staging method

Hierarchical approach: primary = transient elastography (FibroScan); confirmed by liver biopsy (METAVIR) when available; consensus diagnosis for cases where neither was feasible.

Machine learning

  • Preprocessing: TIC normalization → Probabilistic Quotient Normalization (PQN)
  • Feature selection: permutation importance RF; k=75 spectral features (66,000–68,000 Da) by 5-fold cross-validated QWK
  • Model: Hierarchical OrdinalForest (HOF) — best of 3 architectures tested (RF, HRF, HOF)
  • Combined model: 75 spectral features + 4 clinical variables (total protein, albumin, INR, bilirubin)
  • Software: R v4.4.2 (ordinalForest, ranger, yardstick packages); RStudio 2025.05.1+513

Main findings

1. Native HSA declines with CLD severity

StageNative HSA (g/L)AUC vs controls
Controls12.2
F0/F110.60.67
F29.9
F310.6
F4_A10.2
F4_B4.10.99
F4_C4.20.89

Key finding: native HSA is reliable for decompensated cirrhosis (F4_B: AUC=0.99) but limited for early fibrosis.

2. Ten isoforms identified — three pattern types

Pattern A — Progressive decrease (truncation / irreversible oxidation)

  • HSA-DA (N-terminal –Asp truncation): ↓ across all stages vs controls (~0.2 g/L in controls); earliest-onset change ⚠️ unexpected — may reflect accelerated clearance
  • HSA-L (C-terminal truncation): no significant change
  • HSA-DA+CYS (truncated-cysteinylated): ↓ F4_B and F4_C
  • HSA+SO₃H (irreversible sulfonylation): ↓ F4_B and F4_C

Pattern B — Biphasic (moderate oxidative stress / glycation)

  • HSA+CYS (cysteinylation Cys34): controls 8.7 g/L → peak F4_A 11.1 g/L → F4_B 7.9 g/L → F4_C 6.8 g/L
  • HSA+GLYC (mono-glycation): similar biphasic; peak F3/F4_A
  • HSA+CYS+GLYC (cysteinylated + glycated): peak 1.9 g/L at F3 → declines F4_B/C
  • HSA+2GLYC (doubly glycated): biphasic; peak at F4_A

Pattern C — Progressive increase (multiply-modified, end-stage marker)

  • HSA+CYS+2GLYC (doubly glycated + cysteinylated): ~undetectable in controls → F4_A 0.12 g/L → F4_B 0.18 g/L → F4_C 0.2 g/L ← marker of cumulative end-stage damage

3. Isoform ratios amplify diagnostic signal

Normalizing to native HSA resolves the apparent paradox of biphasic absolute concentrations:

RatioBest discriminationSensSpec
HSA+CYS/NativeF4_C vs controls65%99%
HSA+GLYC/NativeF4_B vs controls85%100%
HSA+GLYC/NativeF4_C vs controls70%99%
HSA+GLYC/NativeControls vs F2 (earliest)
HSA+CYS+GLYC/NativeF4_B vs controls77%99%
HSA+CYS+GLYC/NativeF4_C vs controls70%99%

4. PCA confirms stage-specific spectral fingerprint

PCA on the 66,000–67,500 Da mass region shows progressive separation of patient clusters from controls with each advancing stage (F0/F1 → F4_C), confirming the HSA molecular fingerprint evolves with disease.

5. OrdinalForest classifier outperforms FIB-4

MetricPlatform 1 (Bruker)Platform 2 (Sciex)
QWK0.862 (95% CI: 0.735–0.923)0.916 (95% CI: 0.822–0.964)
Accuracy (3-class triage)81.5%
  • FIB-4 comparator: 59.3% accuracy (3-class) → +26 percentage points for LC-TOF + Clinical model
  • F4_C: near-perfect (4/4 both platforms); F4_A: most misclassified (2/7 P1) — reflects biological continuum of compensated cirrhosis
  • FIB-4 gray zone (1.30–2.67): 62.5% of in-zone patients correctly triaged by ALBOM model ⚠️ preliminary (n=8)

6. Cross-platform reproducibility confirmed

  • McNemar’s test on paired predictions: p = 0.149 → no significant difference between platforms
  • Jaccard Similarity Index of errors: 0.696 → ~70% of misclassifications identical → errors are biologically driven, not instrument-specific
  • Both classifiers: “substantial to near-perfect agreement” range (QWK overlapping CIs)
  • ⚠️ Platform 1 baseline correction algorithm suppresses broad high-mass peaks (>67,500 Da) → partial attenuation of diagnostic signal for glycated species in F4_B/C — actionable optimization target

PTMs reported

ProteinIsoformPTM(s)Mass shift (Da)Pattern
HSANativeNone (Cys34 free)0Progressive ↓
HSAHSA-DAN-term –Asp truncation−115↓ all stages
HSAHSA-LC-term truncationvariableNo change
HSAHSA+CYSCysteinylation (Cys34)+119Biphasic
HSAHSA+SO₃HSulfonylation (Cys34)+48↓ F4_B/C
HSAHSA+GLYCGlycation (Lys)+162Biphasic
HSAHSA-DA+CYSTruncation + cysteinylation−115 +119↓ F4_B/C
HSAHSA+CYS+GLYCCysteinylation + glycation+281Biphasic
HSAHSA+2GLYCDouble glycation+324Biphasic
HSAHSA+CYS+2GLYCCysteinylation + double glycation+443Progressive ↑ (end-stage marker)

Clinical context

  • Disease: Liver fibrosis / Cirrhosis (full spectrum F0→F4_C)
  • Clinical problem: Liver biopsy is invasive, sampling-dependent, not repeatable → need non-invasive staging
  • Existing tools: Transient elastography (physical surrogate), FIB-4 (score), FibroTest, FibroMeter (panels)
  • Our approach: HSA molecular fingerprint as direct biochemical window into liver function — not just a structural proxy
  • Paradigm shift proposed: From anatomical/structural staging (METAVIR histology, elastometry stiffness) → functional and molecular staging via HSA PTM profile

Limitations

  1. Cross-sectional design: no intra-individual longitudinal tracking (MALAHBAR NCT06318949 underway)
  2. FibroScan limitations for F2/F3 separation — explains cluster dispersion in intermediate stages
  3. Etiology distribution not uniform across severity groups (MASH + ALD predominate) — confounder risk
  4. Platform 1 baseline correction suppresses high-mass glycated species >67,500 Da — must be controlled
  5. Gray zone FIB-4 analysis is hypothesis-generating only (n=8)

Figures

Main figures

FigureFileDescription
Fig 1raw/assets/Fig 1.jpgNative HSA concentration (dot plots, both platforms) + ROC curves by stage
Fig 2raw/assets/Fig 2.jpgNine modified isoform concentrations across CLD stages
Fig 3raw/assets/Fig 3.jpgNormalized isoform ratios (CYS/Native, GLYC/Native, CYS+GLYC/Native) + ROC curves
Fig 4raw/assets/Fig_4_300dpi.pngPCA of full spectral profile (66,000–67,500 Da); pairwise control vs. each stage
Fig 5raw/assets/Fig_5_AB_fixed.pngConfusion matrices for OrdinalForest (Platform 1 and Platform 2)
Fig 6(not in assets)Patient triage scatter (FIB-4 vs ALBOM, colored by correct/incorrect)

Supplemental figures (from raw/my_work/Supplemental data.docx)

FigureDescription
Fig S1Classical markers by stage: routine albumin (g/L), FIB-4, AST (U/L), total bilirubin (µmol/L), ALT (U/L)
Fig S2ROC curves for classical markers (routine albumin, AST, bilirubin) and native HSA by LC-HR-MS vs controls
Fig S3Feature importance lollipop chart — top 30 OrdinalForest predictors. Top 4 = clinical variables (total protein, albumin, INR, bilirubin); spectral peaks cluster in two sub-regions: ~66,230–66,600 Da (native/cysteinylated) and ~67,024–67,457 Da (poly-glycated)
Fig S4Bootstrap QWK density plots (1,000 iterations) — Bruker vs Sciex; overlapping distributions confirm platform independence
Fig S5Cross-platform scatter — patient-level predictions Bruker vs Sciex; alignment along identity line; Jaccard=0.696, McNemar p=0.149

Classical marker values (from Supplemental data)

MarkerControls (n=19)F0/F1F2F3F4_AF4_BF4_C
Routine albumin (g/L)4547.544.642.740.729.926.4
FIB-4LowLow↑↑↑↑↑↑↑↑↑
AST (U/L)LowLowLowLow↑↑ (sig vs F0/F1)↑↑↑↑↑↑
Bilirubin (µmol/L)LowLowLowLowLow↑↑↑↑↑↑↑↑↑
ALTNot discriminatory

Routine albumin becomes abnormal only in F4 (all sub-classes), significantly later and less granularly than native HSA by LC-HR-MS which shows AUC=0.99 for F4_B.


Connections

  • HSA — sole protein studied; 10 isoforms quantified
  • ALBOM study — this manuscript IS the ALBOM study publication
  • Glycation — key PTM; biphasic pattern; ratio to native amplifies diagnostic signal
  • Cysteinylation — key PTM at Cys34; biphasic pattern
  • Liver fibrosis — disease context; CLD staging F0→F4_C
  • Top-down proteomics — core analytical method; two platforms (Bruker timsTOF Pro2, Sciex TripleTOF 5600+)
  • CQFD-PTM pipeline — quantification pipeline underlying the HSA isoform quantification
  • OrdinalForest classification — ML model used for multivariate staging
  • hsa-isoforms-cld — biomarker summary page
  • DILI — related liver disease context (drug-induced liver injury is a cause of CLD)

My notes

  • The biphasic paradox is the key conceptual contribution: absolute concentrations of CYS and GLYC isoforms peak in compensated cirrhosis then fall in decompensation, because the substrate (native HSA) is depleted. Ratios (isoform/native) resolve this and reveal the true signal. Any future study comparing absolute vs. ratio metrics should cite this.
  • HSA+CYS+2GLYC is the end-stage marker to watch: monotonically increasing, near-zero in controls, measurable in all F4 sub-classes. Excellent biological specificity.
  • HSA-DA anomaly: N-terminally truncated form is lower in patients than controls at all stages — counterintuitive for a damage product. Worth investigating: is this a clearance effect, or is the control population exceptional?
  • FIB-4 gray zone: the 62.5% classification rate in the indeterminate zone is exciting but n=8 is underpowered. MALAHBAR will be key.
  • Platform 1 caveat: the Bruker baseline correction algorithm geometrically misidentifies broad poly-glycated peaks as baseline drift. This is a fixable technical artifact, not a biological limitation.
  • Etiology confound is real: MASH patients dominate F0/F1–F3; ALD dominates F4_B/C. If MASH and ALD have different isoform profiles (plausible given different oxidative stress mechanisms), then the “stage effect” may partly be an “etiology effect” in the cirrhosis sub-classes. Etiology-stratified analysis is a key gap.
  • FibroScan failure in decompensated patients: F4_B/C patients mostly staged by biopsy or clinical consensus (FibroScan only 11–20%) — exactly the group where a non-invasive blood test would add the most value.
  • Feature importance (S3): the two spectral sub-regions driving the classifier are ~66,230–66,600 Da (native + cysteinylated isoforms, early-to-mid disease signal) and ~67,024–67,457 Da (poly-glycated adducts, advanced disease signal). This is consistent with the biological 3-pattern model.