Decomposing socioeconomic differences in self-rated health and healthcare expenditure by chronic conditions and social determinants | International Journal for Equity in Health
Data
In this cross-sectional study, we decomposed SES-differences in both SRH and HCE among respondents of the Dutch Health Monitor 2022. The Dutch Health Monitor (DHM) is conducted every four years (with an extra round in 2022) by Statistics Netherlands (CBS), the Dutch National Institute for Public Health and the Environment (RIVM), Municipal Health Services (GGD’en), and Dutch National Umbrella Organisation of the Municipal Health Services and Regional Medical Assistance Organisations (GGD GHOR) among a random sample of persons aged 18 years and older living in private households (n = 364,557) [31, 32]. Targeted oversampling in groups with lower expected response rates and mixed-mode data collection (online survey and at home interviews) were employed to maximize representativeness of the sample. The response rate in 2022 was 33%. Survey weights based on sex, age, marital status, country of origin, household size, income, region, and municipality were applied to generate a representative sample of the Dutch population. This study selects the working population (aged 25–64 years) for analysis. We merged DHM survey data with routinely collected registry data at the individual level provided by Statistics Netherlands, Vektis and Dutch National Health Care Institute. Statistics Netherlands functioned as a trusted third party, enabling the secure linkage of datasets while ensuring individuals’ privacy according to Dutch law (Statistics Netherlands Act 2003).
Measures
Self-reported health and healthcare expenditure
The two outcome measures were SRH and total HCE (in 2021 euros) at the individual level. In the DHM, respondents rated their health on a five-point scale (very good, good, okay, bad, very bad), which was later categorized as a binary outcome (0 = very bad, bad, okay; 1 = good, very good).
Individual healthcare expenditures, provided by Vektis, included all expenditures that have been reimbursed by the health insurance company for all types of care covered by the mandatory benefits package, including actual reimbursed cost, and cost of mandatory or voluntary deductibles paid by the insured. Expenditures excluded out-of-pocket payments and non-submitted invoices [33]. The broad benefits basket of mandatory health insurance covers the majority of essential medical care, medicines and medical devices or aids [25].
Socioeconomic status
We used disposable household income, standardized for household composition [34], as the main indicator for SES because it directly reflects the economic and material resources available to a household and is most reflective of recent socioeconomic changes [35]. Household income was categorized into quintiles for all Dutch citizens. This implies that the income quintiles in the DHM population may be unequally sized (see Table 1). We used educational level and financial welfare as alternative SES-indicators to check robustness. According to Statistics Netherlands, the highest completed level of education was categorized as low (lower vocational educational level, lower secondary educational level or less), moderate (intermediate vocational educational level or higher secondary educational level) or high (higher vocational educational level or university) [36]. Missing data for education was supplemented with self-reported education data from DHM for 28% of the study population. Household financial welfare derived from the Dutch Tax Administration tax filings [37], combined standardized household income and net household wealth (assets minus debts). Households were ranked by cumulative income and wealth shares within the entire Dutch population, then grouped into quintiles. Because the DHM is a sample of the Dutch national population, welfare quintile sizes in the study population may not be equality sized.
Drug use for chronic conditions
Following Huber, Szucs [29], a total of 22 chronic conditions were identified using drug registry data from the Dutch National Health Care Institute. The drug registry data included all prescribed medications, issued to individuals and reimbursed under the mandatory health insurance, through public pharmacies at the Anatomical Therapeutic Chemical (ATC) 4th classification level (chemical, pharmacological, or therapeutic subgroups) [38]. If the classification of Huber, Szucs [29] was more precise than the available ATC-4 level, all drugs within the corresponding ATC-4 group were included. The 20 chronic conditions included in the analysis were: acid-related disorders, bone diseases, cancer, cardiovascular diseases, diabetes mellitus, epilepsy, glaucoma, gout and hyperuricemia, HIV, hyperlipidemia, intestinal inflammatory diseases, iron deficiency anemia, migraines, pain, Parkinson’s disease, psychological disorders, psychoses, respiratory illness, rheumatologic conditions and thyroid disorders. Dementia and tuberculosis were excluded due to too insufficient case numbers within the study population.
Social determinants
Figure 1 shows the operationalization of the social determinants using the WHO framework. We strived for comparability with this framework by selecting appropriate indicators, balancing availability, relevance to the Dutch context, avoiding collinearity and statistically significance in the univariate decomposition model [21]. Collinearity was determined using Cramer’s V statistic, with a maximum allowable correlation of 0.7. If the Cramer’s V statistic was larger than 0.7, we retained the indicator that was statistically significant in the univariate model or deemed most relevant.

Overview of the social determinants used to operationalize the WHO framework [21]
Health services
Health service determinants included the distance to the nearest general practitioner (GP), the voluntary deductible level and having postponed healthcare during the COVID-19 pandemic. The distance to the nearest GP indicates whether the distance to the nearest GP by road was less than or greater than 1 km in 2022 [39]. The voluntary deductible level (€0/100–400/500) indicates the voluntary deductible a person agreed to for their health insurance in addition to the mandatory deductible of €385 in 2021 [33]. Insurance companies may offer a lower premium for the mandatory basic health insurance in exchange for a higher voluntary deductible level. In the DHM, respondents indicated whether they had experienced one or more appointments with a healthcare provider being postponed or cancelled during the COVID-19 pandemic (yes/no).
Income security & social protection
Income security & social protection included source of income, home ownership, problematic debts and self-reported difficulties in making ends meet. The primary source of household income in 2022 was categorized as wages from employment, director shareholder income, self-employed, financial assistance or property or rents [34]. Homeownership indicated whether a person owned a home in 2022 [34]. Problematic debts indicated whether a person belonged to a household with registered problematic debts in 2021 following the definition of Statistic Netherlands [40]. Finally, in the DHM, respondents were asked whether they had any (major) difficulties in making ends meet (yes/no).
Living conditions
Living conditions included energy poverty, crowdedness, green space, noise pollution, air pollution and neighborhood safety. Energy poverty indicated whether the household’s energy bill exceeded more than 10% of the household’s disposable income in 2021 (yes/no) [41]. Crowdedness indicated the ratio of home’s living area per household member in 2021 [41]. A household is considered overcrowded when the house is smaller than 20 m2 per household member (yes/no). Green space indicated whether an individual lived in an area with a low (less than 20%), moderate (20–50%) or high level (more than 50%) of green space, as measured in 2020 [42]. Noise pollution indicated whether an individual lived in an area where less than 2%, 2–5%, 5–10%, or more than 10% of the population experienced more than 60 dB noise pollution within 24 h, as measured in 2016 [43]. Air pollution indicated the average level of particulate matter (PM2.5) in the air per neighborhood in 2019. Air pollution was categorized as very good (PM2.5 < 10), good (10 ≥ PM2.5 < 20), moderate (20 ≥ PM2.5 < 25) and poor (PM2.5 ≥ 25) [44]. Neighborhood safety measures were obtained from the Leefbaarometer 2020 [45]. Low safety and high safety were defined as respectively 0.25 standard deviations below or above the mean safety score (average safety).
Social & human capital
Social & human capital included marital status, household composition, voluntary work, informal caregiving, emotional support, resilience, alcohol drinking, smoking, and exercising. Marital status indicated whether someone was married, divorced, widowed or unmarried in 2022 [46]. Household composition indicated being part of a one-person household, couple with/without children living at home, one parent family or another type of multiple-person household [34]. The DHM provided information on whether a person participated in voluntary work (yes/no), provided informal care (yes/no), missed emotional support (yes/no), reported (very) low resilience (yes/no), was a heavy drinker (at least one day a week 6 units or more for men and 4 units or more for women; yes/no), was currently smoking (yes/no), and met physical activity norms (yes/no).
Employment and working conditions
Employment and working conditions included labor participation and work related stress. Labor participation indicated the number of months of paid work in the last four years (2018–2022) and was categorized as 0% (no labor participation or missing data), 1–24%, 25–49%, 50–74%, 75–99% and 100% [47]. Experience of work related stress was measured within the DHM (yes/no).
Statistical analysis
Oaxaca blinder decomposition
The statistical Oaxaca-Blinder decomposition method allows for the examination of the relative contribution of independent variables to the mean differences of the dependent variable between two groups [48]. The method can estimate to what extent the differences in the mean predicted outcome are due to differences in the mean value of the independent variables between the groups (i.e. differences in X, referred to as “the explained part”) or, alternatively, to what extent these differences are due to a differential effect of the independent variables on the dependent variable between the groups or to unobserved factors (i.e. differences in β, referred to as “the unexplained part”). The Oaxaca-Blinder decomposition can be written as:
$$\begin{aligned}\varDelta\:Y&=\left(E\left({X}_{A}\right)-E\left({X}_{B}\right)\right){\beta\:}^{*}+E\left({X}_{A}\right)\left({\beta\:}_{A}-{\beta\:}^{*}\right)\cr &\quad+E\left({X}_{B}\right)\left({\beta\:}^{*}-{\beta\:}_{B}\right),\end{aligned}$$
where ∆Y refers to the difference in the dependent variable between group A and group B, X is a vector of the independent variable values, βA and βB are vectors of regression coefficients from a regression including, respectively, only group A or B, and β* refers to the pooled regression estimate for group A and B. Here, \(\:\left(E\left({X}_{A}\right)-E\left({X}_{B}\right)\right){\beta\:}^{*}\) refers to the explained part and \(\:E\left({X}_{A}\right)\left({\beta\:}_{A}-{\beta\:}^{*}\right)+E\left({X}_{B}\right)\left({\beta\:}^{*}-{\beta\:}_{B}\right)\) refers to the unexplained part [48,49,50,51].
The Oaxaca-Blinder decomposition is limited in its ability to provide causal interpretation of the estimates. In other words, this method cannot predict how differences in SRH or HCE would change if differences in chronic conditions and/or social determinants were reduced [51]. The unexplained part of the Oaxaca-Blinder decomposition seeks to determine the differential impact of independent variables. However, any transformation of the independent variables (X), for example, through mean centering or the choice of an omitted/base category, results in changes to the intercept and coefficients (β) within the unexplained part, also known as the “transformation problem” [48]. Addressing this issue by transforming effects into deviations from the mean or applying alternative weighting structures remains inherently arbitrary [48, 52, 53], and this renders the estimates of the unexplained part to be inappropriate to interpret. To approximate the differential effects of independent variables on SRH or HCE between the lowest and highest income groups, regression model estimates stratified by income were compared (referred to as “differential associations”).
Application of Oaxaca-Blinder decomposition and regression models
Applied to our study aims, the Oaxaca-Blinder decomposition method decomposes the mean differences in the log odds of SRH and logarithmic HCE between the lowest and highest income quintile. It indicates how much of the differences in the two outcome measures between low and high income groups are due to a different prevalence of chronic diseases and/or a different distribution of social determinants. All categorical variables were normalized to ensure that the results were independent of the choice of the reference category. Applied to our research setting, the Oaxaca-Blinder regression formula for SRH and HCE, respectively, is written as:
$$\begin{aligned}\varDelta\:SRH&=\left(E\left({X}_{LIG}\right)-E\left({X}_{HIG}\right)\right){\beta\:}^{*}+E\left({X}_{LIG}\right)\cr&\quad\left({\beta\:}_{LIG}-{\beta\:}^{*}\right)+E\left({X}_{HIG}\right)\left({\beta\:}^{*}-{\beta\:}_{HIG}\right),\end{aligned}$$
$$\begin{aligned}\varDelta\:HCE&=\left(E\left({X}_{LIG}\right)-E\left({X}_{HIG}\right)\right){\beta\:}^{*}+E\left({X}_{LIG}\right)\cr&\quad\left({\beta\:}_{LIG}-{\beta\:}^{*}\right)+E\left({X}_{HIG}\right)\left({\beta\:}^{*}-{\beta\:}_{HIG}\right),\end{aligned}$$
where ∆SRH and ∆HCE refer to differences in SRH and HCE, respectively, between the lowest income group (LIG) and the highest income group (HIG). X is a vector of values for the social determinants and/or chronic conditions. βLIG and βHIG are vectors of regression coefficients derived from separate regression models for the lowest and highest income groups only, respectively. β* denotes a vector of the pooled regression estimates combining data from both the lowest and highest income groups.
The results were divided into three parts, each containing a Oaxaca-Blinder decomposition and regression models with SRH or HCE as the dependent variable. In the first part of the analysis, we apply the Oaxaca-Blinder method to decompose the contribution of chronic conditions on SES-differences in SRH or HCE. It indicates how much of the differences in the two outcome measures between low and high income groups are due to a different prevalence of chronic diseases. The remaining inequality could be interpreted as “the difference in healthcare expenditure if low and high income groups had the same prevalence for these chronic conditions”, which is then suggested to be due to a differential effect of having chronic conditions on SRH or HCE or unobserved factors that influence SRH or HCE. Regression models with chronic conditions and SRH or HCE stratified by income groups were conducted to provide an indication of which chronic conditions showed a differential association between the lowest and highest income group.
In the second part of the analysis, we apply the Oaxaca-Blinder method to decompose the contribution of social determinants on SES-differences in both SRH and HCE. A logit regression model was fitted for the binary outcome measure SRH, while HCE was log-transformed. These models indicate how much of the differences between low and high income groups in the two outcome measures are due to a different distribution of social determinants. The remaining inequality could be interpreted as “the difference in the log odds of SRH or logarithmic HCE if low and high income groups had the same distribution of social determinants” or unobserved differences. Regression models with social determinants as independent variables were conducted to examine their differential association between the lowest and highest income group.
In the third part of the analysis, we conducted the decomposition analysis with chronic conditions alongside the social determinants to examine the additional value of expanding the WHO model with chronic conditions. Additionally, regression models with both chronic conditions and social determinants were conducted.
The regression models were performed within the DHM sample, stratified by individuals belonging to the lowest and highest income group. For SRH, logistic regression models were conducted with a quasi-binomial distribution and logit link. To correct for the right-skewed distribution of HCE, generalized linear models with Gaussian distribution and identity link function were fitted on the natural logarithm of HCE.
Confidence intervals (CI) were set at 95%. Data preparation and regression analyses were conducted using R version 4.1.2. Oaxaca-Blinder decomposition was conducted using STATA 18.
Sensitivity analyses
To test for robustness across SES-indicators, the analyses were repeated using educational level and financial welfare in addition to income as SES-indicators. Furthermore, to test the consistency of the relative contribution of social determinants among individuals with chronic conditions, we stratified the Oaxaca-Blinder decomposition analyses of differences in SRH between low and high income groups by the social determinants into five specific chronic conditions: cardiovascular diseases, acid-related disorders, migraines, psychological disorders and rheumatologic conditions. These five chronic conditions represent conditions with smaller and larger income differences, lifestyle-related conditions and mental health-related conditions, while having a sufficiently large prevalence to perform analyses.
link
