Statistical Analysis of Air Quality Impact on Respiratory Disease Prevalence

Title

Statistical Analysis of Air Quality Impact on Respiratory Disease Prevalence

Authors

1. Dr. Khimya Amlani, Vidyalankar Institute of Technology, Mumbai, Professor, India

Abstract

India’s deteriorating urban air quality has generated a mounting public health emergency that demands rigorous quantitative analysis. This paper reports findings from a five-year, multi-city epidemiological study examining how ambient air pollutants—principally fine particulate matter (PM₂.₅)—relate to respiratory disease burden across Mumbai, Delhi, Kolkata, Chennai, and Bangalore between 2020 and 2025. Four complementary statistical methods are applied: multiple linear regression (MLR), polynomial regression, binary logistic regression, and two-way analysis of variance (ANOVA) with post-hoc comparisons. A stepwise variable selection procedure, validated through ten-fold cross-validation, further refines the predictor set. The central empirical finding is a statistically robust non-linear threshold in the PM₂.₅–respiratory admission relationship, confirmed at approximately 59.8 μg/m³ (95% CI: 56.2–63.4 μg/m³) through segmented regression, the point beyond which hospital admissions rise at an accelerating rate (β = 2.31, p < 0.001). Logistic regression further shows that each 10 μg/m³ increment in PM₂.₅ is associated with 48% higher odds of chronic respiratory disease (OR = 1.48; AUC = 0.88). Scenario modelling projects that a sustained 30% reduction in PM₂.₅ could be associated with approximately 52,000 fewer premature deaths annually. The results carry direct implications for India’s National Clean Air Programme (NCAP) and city-level clinical resource planning.

Keywords

PM₂.₅ respiratory disease multiple linear regression logistic regression ANOVA

PDF

This browser does not support PDFs. Please download the PDF to view it: View the PDF.

Conclusion

Five years of concurrent air quality monitoring and hospital admission data across five major Indian cities converge on a consistent picture: fine particulate matter is, by a clear margin, the pollutant most strongly associated with respiratory disease burden in urban India. The MLR model explains 84.7% of the variance in admission rates (Adjusted R² = .841), with PM₂.₅ producing the largest regression coefficient (β = 2.31; 95% CI: 1.96, 2.66; p < .001). Model diagnostics—VIF values below 3.8, normally distributed residuals (Shapiro–Wilk p = .062), and no heteroscedasticity (Breusch–Pagan p = .123)—support inferential validity.

The most policy-consequential finding is the formally derived PM₂.₅ threshold at 59.8 μg/m³ (95% CI: 56.2, 63.4), confirmed through both polynomial (Equation 6) and segmented regression. Beyond this level, the concentration–response slope steepens considerably. Northern Indian cities operate so far above this level in winter that the NCAP’s current 20–30% reduction target would still leave Delhi well within the supra-threshold zone. Achieving health-relevant pollution levels demands reductions of 65–70% from peak concentrations, requiring a fundamentally more ambitious policy portfolio.

Logistic regression (Equation 3) adds a patient-level dimension: each 10 μg/m³ PM₂.₅ increment is associated with 48% higher odds of chronic respiratory disease (OR = 1.48; AUC = 0.88)—compounding with smoking (OR = 3.14), older age, and industrial occupation. Ten-fold cross-validation confirmed acceptable generalisation (mean R² = .831), and Hosmer–Lemeshow calibration affirmed goodness of fit (χ² = 8.6, p = .38).

These results provide Indian environmental health authorities and the NCAP with a set of quantitative thresholds, validated effect sizes, and confidence intervals on which to base more precisely targeted abatement and clinical planning decisions. The statistical associations reported here are robust and consistent with instrumental variable studies in the literature; however, they represent strong epidemiological evidence rather than established causal proof. Longitudinal cohort designs and causal inference methods remain the appropriate next step.


Reference

1. -

Author Contribution

The author takes full responsibility for the entire study process, including design, data collection, analysis, and manuscript writing.

Funding

The research, authorship, and publication of this article were not funded by any specific grants from public, commercial, or non-profit agencies.

Software Information

Not applicable

Conflict of Interest

No conflicts of interest are reported by the authors.

Acknowledge

I extend my gratitude to everyone who contributed their expertise to this study and manuscript, and to the anonymous reviewers for their helpful comments.

Data availability

Not applicable