Project: Pandemic Data Analysis

The motivation

I was reading up about pandemics when I stumbled upon a very well generated mock dataset. And given I was kind of bored, I started looking into how would you learn about the spread of the virus from this data.

Methodology

The dataset includes the number of confirmed cases over time in each location. So the very first step is finding some way of scoring the spread of the virus among the various cities/counties/states. This score provides a lot of information, because it allows all the other information to be correlated to it. Like what characteristics do high or low infection rate locations share and differ in, what interventions have been effective, and etc.

So the best way to start is by finding the curve that best fits the scatter plot of the confirmed cases. Given that it’s a pandemic, an exponential curve would be best, the score would be how exponential is it. Luckily there’s already a function built for that purpose:

from scipy.optimize import curve_fit

The equation is y = e^x for infection data. Now we can pick optimizing A with either y = Ae^x or y = e^Ax. However, we want to pick only one of them and not optimize y = Ae^Bx, because we want a single score to represent the spread of the infection. Ultimately given infections are exponential, we want to modify that exponential characteristic. So I picked y = e^Ax, and the curve did a very good job at fitting the data. This gave me the core piece of information to handle the rest.

Also unfortunately I didn’t have a computer to work with for months after it killed itself, so I didn’t continue to pursue this research. But the idea was that after getting the score, I would correlate the infection score data with data about each location’s characteristics to figure out which characteristics led it to be more or less susceptible. And once data comes out about what interventions were used in each location, we can then figure out what has been most effective in that respect as well.

With a situation like this, proper data analysis or in other words insight is one of the most valuable weapons in the fight.