Social disadvantage accelerates aging | Nature Medicine

Study design and participants
We analyzed pseudonymized individual-level data from four prospective cohort studies: the UK Biobank, FPS, Whitehall II and ARIC, with appropriate approvals for access.
The UK Biobank study is a nationwide, prospective cohort study of half of a million participants aged 38–73 years, living in the United Kingdom (https://www.ukbiobank.ac.uk)42. Baseline clinical examinations, including measures of social disadvantage, were conducted between 2006 and 2010. Participant follow-up via linked electronic health records was conducted until 2021. Our analyses were conducted using the UK Biobank Resource under application numbers 60565 and 22627. Data collection in the UK Biobank was approved by the North West Multi-Centre Research Ethics Committee. All participants provided written informed consent before their involvement in the study.
The FPS is a prospective occupational cohort study3. Study participants included 286,475 men and women aged 40 or older, who were followed up via electronic health records of national health registries until 2016. These registries provide dates and diagnoses of in-patient care in hospitalizations and deaths for all residents in the country. Participants provided electronic informed consent for both baseline assessments and register linkage. The FPS has received ethical approval from the ethical committee of the Helsinki and Uusimaa hospital district (HUS/1210/2016).
The Whitehall study is a prospective occupational cohort study of 10,308 London-based office staff, aged 35–55, with a baseline examination conducted between 1985 and 198843. Blood samples for proteomic analyses were collected for 6,545 participants between 1997 and 1999, all of whom were linked to NHS electronic health records. Written informed consent from participants was obtained at each contact, and ethics approval was obtained from the University College London Hospital Committee on the Ethics of Human Research (reference number 85/0938).
The ARIC study is a population-based cohort study of 15,792 participants aged 45–64 years at baseline (1987–1989), selected through probability sampling from 4 US communities (Washington County, Maryland; Forsyth County, North Carolina; the Minneapolis, Minnesota, suburbs; Jackson, Mississippi)44. In proteomic analyses, self-identified non-Black and non-White participants and self-identified Black participants at Washington County and Minneapolis study sites were excluded owing to small sample sizes (n = 78 at visit 2 and n = 29 at visit 5). Blood samples for proteomic analyses were available for 11,798 participants during visit 2 (1990–1992) and for 5,195 participants during visit 5 (2011–2013). Mortality was ascertained through multiple data sources up until 31 December 2021. The study was approved by each field center’s institutional review board, and all participants provided written informed consent before the study.
A flowchart of sample selection and baseline characteristics of the participants are presented in Extended Data Figs. 1 and 2, and Supplementary Table 1.
Measurement of social disadvantage
Social disadvantage at baseline was assessed using several indicators: education, father’s occupational status, neighborhood deprivation, occupational position and household income. In the UK Biobank and the Whitehall study, educational attainment was self-reported and classified into three categories: high (college or university degree), intermediate (A levels and AS levels or equivalent, O levels and GCSEs or equivalent, CSEs or equivalent, NVQ or HND or HNC or equivalent, other professional qualifications, for example, nursing, teaching), low (none of the above). In the FPS, information about education was obtained from Statistics Finland via record linkage, including the following three categories: high (tertiary qualification, college or university), intermediate (all other educational qualifications), low (no qualifications, compulsory schooling). In ARIC, self-reported education was categorized as high (any college), intermediate (high school/GED/vocational school) and low (less than high school).
In the UK Biobank and the Whitehall study, neighborhood deprivation was assessed using the Townsend index45 linked to the participants’ residential address. This index is calculated from census data on four variables: unemployment (as a percentage of those aged 16 and over who are economically active), non-car ownership (as a percentage of all households), non-home ownership (as a percentage of all households) and household overcrowding. We divided the neighborhood deprivation score into three categories: high (top quartile), intermediate (middle quartiles) and low (bottom quartile).
For each FPS participant, the residential address was linked to a Statistics Finland score for neighborhood deprivation. The score is derived from three components: the proportion of adults with low education, the unemployment rate and the proportion of people living in rented housing within each 250 m by 250 m grid area3. As previously, we categorized these data as low (an area deprivation score lower than the national mean), intermediate (from the national mean to the national mean plus 0.5 s.d.) and high (higher than 0.5 s.d.)3.
In the ARIC study, neighborhood deprivation was measured using the National Level Area Deprivation Index, which was derived by linking geocoding participant’s geocoded address coordinates with the US Census block group boundaries. Each census block was assigned a rank ranging from 1 to 100, calculated from 18 census-based indicators of socioeconomic disadvantage using the Singh method46. We divided the index into tertiles, to represent low, intermediate and high levels of neighborhood deprivation.
In the FPS, information about occupational position was coded based on the International Standard Classification of Occupations (ISCO)47 and categorized into three occupational position groups: high (non-manual occupations, ISCO classes 1 and 2, for example, physicians, lawyers), intermediate (non-manual occupations, ISCO classes 3 and 4, for example, registered nurses) and low (service and manual occupations, ISCO classes 5–9, for example, cleaners, maintenance workers). In the Whitehall study, occupational position was obtained from the British civil service occupational grade at baseline, a 3-level variable representing high (administrative), intermediate (professional or executive) and low (clerical or support) grades43. In addition, father’s occupational class was assessed with the question ‘What is/was your father’s main job, what kind of work does/did he do in it’. Responses were coded as manual or non-manual.
PRS for education
The UK Biobank genetic data include genotypes for 488,377 participants, assayed using two very similar genotyping arrays: 807,411 markers using the Applied Biosystems UK BiLEVE Axiom Array by Affymetrix (now part of Thermo Fisher Scientific, for a subset of 49,950 participants) and 825,927 markers using the closely related Applied Biosystems UK Biobank Axiom Array (438,427 participants; shares 95% of marker content with the UK BiLEVE Axiom Array)48. In the Whitehall study, genotyping data include the Illumina Human Drug Core Array, which features a whole-genome single-nucleotide polymorphism scaffold with enhanced coverage of 200,000 custom markers in 4,500 drugged or druggable genes49.
We constructed a polygenic index for education, as defined in ref. 21, to examine whether the associations between social disadvantage, hallmark-related diseases and age-related proteins are attributable to genetic effects across all these constructs.
Follow-up for ARDs and mortality
UK Biobank and Whitehall participants were linked to the UK NHS Hospital Episode Statistics database for hospital admissions and the NHS Central Registry for mortality. Electronic health records, including dates and the International Classification of Diseases diagnostic codes of hospitalizations and deaths, were retrieved until 2021 in the UK Biobank study and until 2019 in the Whitehall study. The NHS provides most of the healthcare in the country, including in- and out-patient care, and record linkage was undertaken using a unique NHS identifier held by all UK residents. FPS participants were linked by their unique identification number to national hospital discharge (recorded by the Finnish Institute for Health and Welfare) and mortality (recorded by Statistics Finland) registries. These electronic health records included cause (International Classification of Diseases codes) and date of hospitalization or mortality, or both, until 2016.
In the UK Biobank, FPS and Whitehall study, we measured the 4 primary and 5 compensatory and integrative hallmarks of aging indirectly, based on a person’s vulnerability to specific hallmark-related diseases as defined in refs. 4,22. This list comprises a total of 85 diseases, including those 30 diseases most strongly related to each hallmark. These diseases encompass conditions linked to a single hallmark and also those shared by two or more hallmarks. The presence of a specific hallmark of aging is indicated if the participant has one or more diseases related to that hallmark. The 83 diseases and their diagnostic codes for each aging hallmark that were available for this study are listed in Supplementary Table 1 (ref. 23).
An extended model of aging hallmarks includes 3 additional hallmarks: disabled macroautophagy (a special case of LOP in the original model), chronic inflammation (a component of AIC in the original model) and dysbiosis (also part of AIC in the original model)33. A predefined validated list of diseases related to these 3 new hallmarks was not available at the time of our study.
In the ARIC study, mortality was ascertained through contact with participant proxies via telephone, hospital records, death certificates or vital statistics from the National Death Index until 31 December 2021. As unified national hospitalization registries with comprehensive coverage, such as those in the United Kingdom and Finland, are not available in the United States, similarly high-quality data on hallmark-related diseases were not available in the ARIC study.
Statistical power and reproducibility
Sample sizes were determined based on available data in the cohorts. As shown in Supplementary Figs. 1 and 2, participants were excluded from the analyses owing to missing data on social disadvantage or hallmark-related diseases. A total of 492,257 participants were included in the analysis of the UK Biobank and 286,475 participants in the analysis of the FPS. In prospective analyses, participants with hallmark-related diseases at or before baseline were additionally excluded. The number of included participants varied between 430,307 and 452,305 in the UK Biobank and between 267,046 and 281,348 in the FPS, depending on the hallmark under investigation (Supplementary Table 5).
Statistical power varied depending on the outcome. For any incident hallmark-related disease in the UK Biobank, we were able to detect a 3–4% difference between high- and low-education groups (hazard ratios 1.03–1.04) at 90% power and an alpha level of P = 0.05. The corresponding range for hazard ratios was 6–7% (hazard ratios 1.06–1.07) in the FPS. For specific hallmark-related diseases, the minimally detectable difference in hazard ratios ranged from 1.05 (cataract and osteoarthritis) to 2.65 (essential tremor and scleroderma) in the UK Biobank and from 1.10 (cataract and osteoarthritis) to 3.75 (essential tremor and Barrett’s esophagus) in the FPS.
Reproducibility was assessed using multiple indicators of social disadvantage across four independent cohort studies from the United Kingdom, Finland and the United States. Further examination of reproducibility used methodological triangulation, that is, examining the consistency of the results across alternative epidemiological approaches and sensitivity analyses. The main finding of the social disadvantage–aging hallmark association was consistent across the three independent cohort studies, four alternative indicators of social disadvantage and in multiple alternative analyses.
Measurement of plasma proteins
Plasma ethylenediaminetetraacetic acid (EDTA) samples drawn at baseline, coinciding with the assessment of social disadvantage, were stored in 0.25-ml aliquots at −80 °C. Proteins were analyzed using the SomaScan version 4.0 and 4.1 assays by SomaLogic, including up to over 7,000 unique proteins. We used the SOMAmer-based capture array, which quantifies the relative concentration of plasma proteins or protein complexes. The SomaScan platform uses short single-stranded DNA with chemically modified nucleotides (modified aptamers) that act as protein-binding reagents with defined three-dimensional structures and unique nucleotide sequences. These aptamers are identifiable and quantifiable using DNA detection technology. This measurement was blinded to participant characteristics, including social disadvantage or hallmark-related disease.
SomaLogic normalization and quality control
All samples underwent standard SomaLogic normalization, calibration and quality control. To control for batch effects during assay quantification, pooled reference standards and buffer standards are included on each plate. To control for both within-plate and across-plate technical variation, samples are normalized within and across plates using median signal intensities in reference standards. Samples are further normalized to a pooled reference using an adaptive maximum likelihood procedure. If signal intensities deviate significantly from the expected range, samples are additionally flagged by SomaLogic and these samples were excluded from the analysis. SomaLogic provides data on the resulting expression values (the ‘raw’ data).
The scale factor acceptance criterion per plate is between 0.4 and 2.5, and this criterion was passed for all 48 plates for Whitehall EDTA samples. The ‘calibrator percent-in-tails’ refers to the percentage of plate calibration scale factors with values outside the expected range, 0.6–1.4. The alert criterion of 10% was slightly exceeded in 2 of the 48 plates (10.8% and 10.9%, respectively), the values for other plates ranging between 1.7% and 9.6%. The ‘QC percent-in-tails’ refers to the percentage of SOMAmer reagents in the QC control that are outside the accepted accuracy range, 0.8–1.2, when compared with the reference. All plates in the Whitehall study passed the acceptance criteria.
We applied the version 4.0 → version 4.1 multiplication scaling factors provided by SomaLogic to the raw version 4.0 assay expression values to allow direct comparisons across samples analyzed using version 4.0 and version 4.1. The raw data were transformed to a normal distribution by inverse rank-based normal transformation before analysis, as the assay has an expected log-normal distribution.
SomaLogic probe validation
SomaLogic SomaScan assay technology has been widely used in biomedical research, the list of publications comprising ~700 peer-reviewed papers ( The performance of the SomaScan assay and the modified aptamer binding has been described in detail elsewhere50,51. Briefly, there is minimal replicate sample variability (coefficient of variation). The specificity of aptamer reagents is good and has been confirmed in several ways. The median intra- and inter-assay coefficients of variation for SomaScan are ~5%, and assay sensitivity is comparable to that of typical immunoassays, with a median lower limit of detection in the femtomolar range. The majority of SomaScan protein measurements are stable, and a subset of proteins has been validated using laboratory-developed tests. These validated proteins have been delivered from SomaLogic’s Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory to physicians and patients in the context of medical management.
All 7,524 probes on the version 4.1 assay undergo rigorous primary validation of binding and sensitivity to the target protein, including determination of the equilibrium binding affinity dissociation constant, pull-down assay of cognate protein from buffer, demonstration of dose response in the SomaScan assay and estimation of endogenous cognate protein signals in human plasma above the limit of detection. A total of 70% of the SomaScan probes have at least one orthogonal source of validation from mass spectrometry, antibody-based measurements, cis-protein quantitative trait loci analysis, absence of binding with the nearest neighbor (that is, no detected signal from the protein that is most closely related in sequence to the cognate protein) or correlation with mRNA levels in cell lines.
Measurement of organ-specific aging signatures
We computed a total of 9 organ-age gaps, accounting for cohort characteristics, using the ‘organage package’24 in Python ( This package requires SomaScan data version 4.0 or version 4.1, and age and sex as inputs, to compute z-scores (mean = 0, s.d. = 1) for organismal and organ-specific age gaps, that is, the biological age of an individual’s organs or body relative to those of same-aged peers. The resulting variables relate to the organ-specific ages of the arteries (14 proteins), brain (202 proteins), heart (10 proteins), immune function (173), intestine (33 proteins), kidneys (12 proteins), liver (113 proteins), lungs (9 proteins) and pancreas (34 proteins), in addition to an overall organismal age (3,907 proteins). The list of proteins included in each age gap is provided in Supplementary Table 9.
Identification of proteins related to hallmarks of aging
We identified plasma proteins related to each hallmark of aging using the Human Proteomic Atlas (www.proteinatlas.org) with the expanded 65-term taxonomy in ref. 4:
-
Genomic instability, mtDNA mutations, mtDNA damage, transposable elements, DNA damage, DNA repair deficiencies, mutations, DNA breaks, ssDNA breaks, dsDNA breaks, chromosome breakage;
-
Telomere attrition, decreased telomere length, decreased leukocyte telomere length;
-
Epigenetic alterations, gene transcription, coding-RNA, noncoding RNA, microRNA, DNA methylation, histone modifications, histone acetylation, histone methylation;
-
Loss of proteostasis, endoplasmic reticulum stress, unfolded protein response, proteolysis, proteasome, autophagy, protein aggregation, chaperone;
-
Deregulated nutrient sensing, insulin resistance, dyslipidaemia, nutrient sensing pathways, sirtuin 1, Insulin/insulin-like growth factor-1 signaling, mTORC1, AMP-activated protein kinase;
-
Mitochondrial dysfunction, mitochondrial toxicity, reactive oxygen species, mitochondrial bioenergetics, electron transport chain, Krebs cycle, mitochondrial dynamics, mitochondrial turnover, mitochondrial degradation, mitochondrial biogenesis;
-
Cellular senescence, senescence markers, senescence-associated secretory phenotype (SASP), immune-senescence;
-
Stem cell exhaustion, stem cell differentiation, progenitor cell, stem cell self-renewal;
-
Altered intercellular communication, inflammatory signaling, inflammaging, inflammation, neural signaling, neurotransmitters, hormonal signaling, hormones.
Details of the search are provided in Supplementary Table 12. To confirm the link with age and controlling for multiple testing, we included only those proteins that were additionally associated with chronological age after Bonferroni adjustment (P < 6.58 × 10−6), a total of 1,044 proteins (Supplementary Table 13). Of these, 43 were related to SCE, 61 to AIC, 253 to DNS, 11 to CS, 159 to MD, 274 to EA, 1 to TA, 438 to LOP and 296 to GI (Supplementary Table 14).
Reproducibility of proteomic findings
To confirm the reproducibility of our findings on protein–social disadvantage associations, we repeated the discovery analysis using low occupational position as the indicator of social disadvantage instead of low education; only findings that were replicated across the different indicators of social disadvantage are reported. If multiple aptamers were available to determine a protein hit, we confirmed that the findings were consistent across all the aptamers.
To assess the reproducibility of the associations between proteins and social disadvantage, as well as between proteins and mortality, findings from the UK Whitehall study were validated using an independent dataset from the United States, the ARIC cohort study. Extending the Whitehall findings, the main analyses were repeated separately for individuals with protein measurements taken during midlife and those with measurements taken during old age.
Statistical approach
In the examination of hallmark-related morbidities, data from the UK Biobank and the FPS were analyzed separately. In a retrospective analysis, we computed the rate of hallmark-specific diseases per 100 person-years for each aging hallmark and examined their rate ratios and 95% CIs by each index of social disadvantage using Poisson regression adjusted for age, sex and ethnicity. We modeled these rates and rate ratios for age 70 in the UK Biobank and for age 55 in the FPS. These were the mean ages at the end of follow-up in these studies.
In prospective analyses of participants with no hallmark-related diseases at baseline, we used Cox proportional hazards models to examine the age-, sex- and ethnicity-adjusted associations of social disadvantage indicators with the first onset of hallmark-related disease within each aging hallmark. To control for genetic confounding, we included PRS for education as an additional covariate in the model.
We used the same approach to examine the age-, sex- and ethnicity-adjusted associations of social disadvantage indicators with the onset of the second ARD among individuals who had already developed one ARD. In addition, we examined the corresponding associations with the onset of the third ARD among those who had already two ARDs.
We calculated clustering coefficients (range 0–1; a higher coefficient indicates stronger connections between the diseases) for groups of diseases that related to a specific hallmark of aging by using the Barrat method of global network transitivity52.
We computed phi coefficients to produce a correlation matrix across nine dichotomous aging hallmark variables. For each hallmark, this variable was defined as having at least one versus none of the hallmark-related diseases after baseline.
To examine whether the associations between social disadvantage indicators and the first onset of hallmark-related disease within each hallmark were driven by specific hallmark-related diseases, we analyzed the associations of social disadvantage indicators with all 83 hallmark-related diseases in separate Cox models adjusted for age, sex and ethnicity. To identify the strongest associations, we computed the mean hazard ratio across the two studies and all indicators of social disadvantage weighted by the number of disease cases.
We computed a total of nine organ age gaps, accounting for cohort characteristics, using the ‘organage package’ in Python (https://github.com/hamiltonoh/organage)24. This package requires SomaScan data version 4.0 or version 4.1, and age and sex as inputs, to compute z-scores (mean = 0, s.d. = 1) for organismal and organ-specific age gaps, that is, the biological age of an individual’s organs or body relative to those of same-age peers.
Before protein discovery analyses were conducted in the Whitehall study, proteins were transformed to a normal distribution using inverse rank-based normal transformation. We included those hallmark-related proteins that were associated with chronological age at proteome-wide significance (0.05/30,000, P = 1.67 × 10−6) after adjustment for sex and ethnicity. We then examined associations between hallmark-related proteins and indicators of social disadvantage (low occupational status and low education) using logistic regression analysis adjusted for chronological age, sex and ethnicity (White versus non-White) and corrected for multiple testing using the Bonferroni method (P = 0.05/1,044, P = 4.78 × 10−5).
We examined the associations between social disadvantage and the first onset of hallmark-related diseases in the Whitehall study, as was done in the UK Biobank and FPS, including testing for genetic confounding. To investigate the association of change in social disadvantage between early and later adulthood with the levels of proteins, we compared individuals with low education and intermediate or high occupational status (upward social trajectory) with those with low education and low occupational status (persistently low trajectory), as well as individuals with high education and intermediate or low occupational status (downward trajectory) with those with high education and high occupational status (persistently high trajectory). Accumulation of risk was examined using age-, sex- and ethnicity-adjusted Cox models, treating ‘adult life course social standing’ as a five-category exposure and hallmark-related diseases as the outcome.
To confirm the association between hallmark-related proteins and total mortality in both Whitehall and ARIC, we performed a Cox proportional hazards analysis for proteins associated with age and social disadvantage, testing whether these proteins were also linked to mortality after adjustment for age, sex and ethnicity. We report only hallmark-related proteins that were consistently associated with chronological age, mortality and social disadvantage during early (low education) and later life (low occupational status).
To identify protein mediators of the associations between social disadvantage and hallmark-related diseases, we estimated the proportion of this association mediated by social disadvantage-related proteins in a subgroup of people with no hallmark-related diseases at baseline using Cox regression models. Specifically, we used an inverse odds ratio-weighted method to estimate the extent to which the exposure (here social disadvantage) and the mediator (proteins) act as if they are independent of each other, that is, how the exposure directly affects the outcome when excluding the mediator pathway53. The inverse odds ratio-weighted method allows simultaneous assessment of multiple mediators, as well as individual mediators, by regressing the set of mediators on the exposure. This approach is particularly well suited for causal mediation analysis in complex models such as ours.
Analyses were performed using R and SAS (9.4), Stata (17.0) and RStudio (2023.03.1). To identify biological processes enriched by the identified 14 proteins, we used the clusterProfiler package and Gene Ontology (GO)-term enrichment analysis in R (ref. 54). This approach uses a hypergeometric test and false discovery rate correction for enrichment analyses. To identify a protein interaction network for the 14 proteins, we used the STRINGdb package in R (ref. 55). Interactions were searched across the entire String database with a minimum required interaction score of 0.4. Graphs were generated in Excel and the BioRender platform (https://www.biorender.com).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
link