Data scrapers
These scripts pull data from various sources for use in Covasim. To run all scrapers, simply type
./run_scrapers1. Corona Data Scraper
To quote the Corona Data Scraper web page,
Corona Data Scraper pulls COVID-19 Coronavirus case data from verified sources.
These are scraped by the loader below, and placed in the data/epi_data/corona-data-scraper-project directory. The data is in CSV format.
Here is a sample of the data.
| key | population | aggregate | cum_positives | cum_death | cum_recovered | cum_active | cum_tests | cum_hospitalized | cum_discharged | date | day | positives | death | tests | hospitalized | discharged | recovered | active | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 57089 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-21 | 0 | 1.0 | ||||||||||||
| 57090 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-22 | 1 | 0.0 | ||||||||||||
| 57091 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-23 | 2 | 0.0 | ||||||||||||
| 57092 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-24 | 3 | 0.0 | ||||||||||||
| 57093 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-25 | 4 | 0.0 | ||||||||||||
| 57094 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-26 | 5 | 0.0 | ||||||||||||
| 57095 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-27 | 6 | 0.0 | ||||||||||||
| 57096 | Roane County, Tennessee, United States | 53382.0 | county | 1.0 | 2020-03-28 | 7 | 0.0 | ||||||||||||
| 57097 | Roane County, Tennessee, United States | 53382.0 | county | 2.0 | 2020-03-29 | 8 | 1.0 | ||||||||||||
| 57098 | Roane County, Tennessee, United States | 53382.0 | county | 2.0 | 2020-03-30 | 9 | 0.0 | ||||||||||||
| 57099 | Roane County, Tennessee, United States | 53382.0 | county | 2.0 | 88.0 | 2020-03-31 | 10 | 0.0 | 88.0 | ||||||||||
| 57100 | Roane County, Tennessee, United States | 53382.0 | county | 2.0 | 91.0 | 2020-04-01 | 11 | 0.0 | 3.0 | ||||||||||
| 57101 | Roane County, Tennessee, United States | 53382.0 | county | 3.0 | 131.0 | 2020-04-02 | 12 | 1.0 | 40.0 | ||||||||||
| 57102 | Roane County, Tennessee, United States | 53382.0 | county | 3.0 | 150.0 | 2020-04-03 | 13 | 0.0 | 19.0 |
Updating:: To update the Corona Data Scraper data,
python data/load_corona_data_scraper_data.pyAs of April 4, 2020, there are apparently 3874 data sets.
2. European Centre for Disease Prevention and Control
To quote the European Centre for Disease Prevention and Control web page,
Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has been collecting the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process is carried out on a daily basis. To insure the accuracy and reliability of the data, this process is being constantly refined. This helps to monitor and interpret the dynamics of the COVID-19 pandemic not only in the European Union (EU), the European Economic Area (EEA), but also worldwide.
The data is stored in CSV format in data/epi_data/european-centre-for-disease-prevention-and-control
Here is a sample of the data:
| day | new_positives | new_death | key | population | date | |
|---|---|---|---|---|---|---|
| 3960 | 0 | 2 | 0 | Greenland | 56025.0 | 2020-03-20 |
| 3959 | 1 | 0 | 0 | Greenland | 56025.0 | 2020-03-21 |
| 3958 | 2 | 0 | 0 | Greenland | 56025.0 | 2020-03-22 |
| 3957 | 3 | 0 | 0 | Greenland | 56025.0 | 2020-03-23 |
| 3956 | 4 | 2 | 0 | Greenland | 56025.0 | 2020-03-24 |
| 3955 | 5 | 0 | 0 | Greenland | 56025.0 | 2020-03-25 |
| 3954 | 6 | 1 | 0 | Greenland | 56025.0 | 2020-03-26 |
| 3953 | 7 | 1 | 0 | Greenland | 56025.0 | 2020-03-27 |
| 3952 | 8 | 3 | 0 | Greenland | 56025.0 | 2020-03-28 |
| 3951 | 9 | 1 | 0 | Greenland | 56025.0 | 2020-03-29 |
| 3950 | 10 | 0 | 0 | Greenland | 56025.0 | 2020-03-30 |
| 3949 | 11 | 0 | 0 | Greenland | 56025.0 | 2020-03-31 |
| 3948 | 12 | 0 | 0 | Greenland | 56025.0 | 2020-04-01 |
| 3947 | 13 | 0 | 0 | Greenland | 56025.0 | 2020-04-02 |
Updating:: To update the Corona Data Scraper data,
python data/load_ecdp_data.pyThis adds data from 10,538 countries and territories (as of April 15, 2020), including Africa, Asia, the Americas, Europe, and Oceania. More details at: https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
3. The COVID Tracking Project
The COVID Tracking Project “obtains, organizes, and publishes high-quality data required to understand and respond to the COVID-19 outbreak in the United States.” The project website is https://covidtracking.com
We transform this data for use in the Covasim parameter format. It is stored in CSV-format in the ata/epi_data/covid-tracking-project directory.
| date | key | cum_hospitalized | cum_in_icu | cum_on_ventilator | death | new_death | new_hospitalized | new_negatives | new_positives | new_tests | day | num_icu | num_on_ventilator | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2210 | 2020-03-04 | NY | 0 | |||||||||||
| 2191 | 2020-03-05 | NY | 0.0 | 0.0 | 28.0 | 16.0 | 44.0 | 1 | ||||||
| 2163 | 2020-03-06 | NY | 0.0 | 0.0 | 16.0 | 11.0 | 27.0 | 2 | ||||||
| 2122 | 2020-03-07 | NY | 0.0 | 0.0 | 0.0 | 43.0 | 43.0 | 3 | ||||||
| 2071 | 2020-03-08 | NY | 0.0 | 0.0 | 0.0 | 29.0 | 29.0 | 4 | ||||||
| 2020 | 2020-03-09 | NY | 0.0 | 0.0 | 0.0 | 37.0 | 37.0 | 5 | ||||||
| 1969 | 2020-03-10 | NY | 0.0 | 0.0 | 0.0 | 31.0 | 31.0 | 6 | ||||||
| 1918 | 2020-03-11 | NY | 0.0 | 0.0 | 0.0 | 43.0 | 43.0 | 7 | ||||||
| 1867 | 2020-03-12 | NY | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8 | ||||||
| 1816 | 2020-03-13 | NY | 0.0 | 0.0 | 2687.0 | 205.0 | 2892.0 | 9 | ||||||
| 1765 | 2020-03-14 | NY | 0.0 | 0.0 | 0.0 | 103.0 | 103.0 | 10 | ||||||
| 1714 | 2020-03-15 | NY | 3.0 | 3.0 | 0.0 | 1764.0 | 205.0 | 1969.0 | 11 | |||||
| 1661 | 2020-03-16 | NY | 7.0 | 4.0 | 0.0 | 0.0 | 221.0 | 221.0 | 12 | |||||
| 1605 | 2020-03-17 | NY | 7.0 | 0.0 | 0.0 | 963.0 | 750.0 | 1713.0 | 13 |
Updating:: To update the COVID Tracking Project data,
python data/load_covid_tracking_project_data.py4. Demographic data scraper
To scrape demographic data, run
python data/load_demographic_data.py