Data Cleaning
Data Integration
Data Transformation
Statistical Hypothesis Testing
Visualizations & Forecasting
1. Influenza deaths by geography, time, age, and gender from CDC. Dataset can be downloaded here.
2 .Population data by geography from US Census Bureau. Dataset can be downloaded here.
Research Hypothesis: If the population has more vulnerable population, then there will be higher cases of flu deaths
Independent Variable: Population
Dependent Variable: Death rate for age group >65
Using statistical t-Test to test the hypothesis above, the resulting P-value is very small (2.07x10-21) and P-value is < than alpha0.05. This means that the result is not significant and there's very low probability that the results are due to random chance.
This means that death rate for the vulnerable population (age group >65 years old) is more than that of the non-vulnerable (age group <65 years old). This shows that the death rate is affected by the age group.
Since there are more deaths in the ageing population, we could recommend giving more focus to states with higher number of vulnerable populations. This could potentially be allocating more staff to these states, or creating intervention programs on influenza and vaccination awareness, and guidelines for the ageing population.
The chart above shows that the death rates are much higher in the age group 75 and above. Overall, the correlation between age and death rates is 0.64 if all age groups are included. If we only correlate ages 55+against the death rate there is a higher correlation of 0.87-0.89 throughout the years.