analysis
analysis
Additional analysis functions that are not part of the core Covasim workflow, but which are useful for particular investigations.
Classes
| Name | Description |
|---|---|
| Analyzer | Base class for analyzers. Based on the Intervention class. Analyzers are used |
| Calibration | A class to handle calibration of Covasim simulations. Uses the Optuna hyperparameter |
| Fit | A class for calculating the fit between the model and the data. Note the |
| TransTree | A class for holding a transmission tree. There are several different representations |
| age_histogram | Calculate statistics across age bins, including histogram plotting functionality. |
| daily_age_stats | Calculate daily counts by age, saving for each day of the simulation. Can |
| daily_stats | Print out daily statistics about the simulation. Note that this analyzer takes |
| nab_histogram | Store histogram of log_{10}(NAb) distribution |
| snapshot | Analyzer that takes a “snapshot” of the sim.people array at specified points |
Analyzer
analysis.Analyzer(label=None)Base class for analyzers. Based on the Intervention class. Analyzers are used to provide more detailed information about a simulation than is available by default – for example, pulling states out of sim.people on a particular timestep before it gets updated in the next timestep.
To retrieve a particular analyzer from a sim, use sim.get_analyzer().
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| label | str | a label for the Analyzer (used for ease of identification) | None |
Methods
| Name | Description |
|---|---|
| apply | Apply analyzer at each time point. The analyzer has full access to the |
| finalize | Finalize analyzer |
| initialize | Initialize the analyzer, e.g. convert date strings to integers. |
| shrink | Remove any excess stored data from the intervention; for use with sim.shrink(). |
| to_json | Return JSON-compatible representation |
apply
analysis.Analyzer.apply(sim)Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| sim | the Sim instance | required |
finalize
analysis.Analyzer.finalize(sim=None)Finalize analyzer
This method is run once as part of sim.finalize() enabling the analyzer to perform any final operations after the simulation is complete (e.g. rescaling)
initialize
analysis.Analyzer.initialize(sim=None)Initialize the analyzer, e.g. convert date strings to integers.
shrink
analysis.Analyzer.shrink(in_place=False)Remove any excess stored data from the intervention; for use with sim.shrink().
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| in_place | bool | whether to shrink the intervention (else shrink a copy) | False |
to_json
analysis.Analyzer.to_json()Return JSON-compatible representation
Custom classes can’t be directly represented in JSON. This method is a one-way export to produce a JSON-compatible representation of the intervention. This method will attempt to JSONify each attribute of the intervention, skipping any that fail.
Returns
| Name | Type | Description |
|---|---|---|
| JSON-serializable representation |
Calibration
analysis.Calibration(
sim,
calib_pars=None,
fit_args=None,
custom_fn=None,
par_samplers=None,
n_trials=None,
n_workers=None,
total_trials=None,
name=None,
db_name=None,
keep_db=None,
storage=None,
label=None,
die=False,
verbose=True,
)A class to handle calibration of Covasim simulations. Uses the Optuna hyperparameter optimization library (optuna.org), which must be installed separately (via pip install optuna).
Note: running a calibration does not guarantee a good fit! You must ensure that you run for a sufficient number of iterations, have enough free parameters, and that the parameters have wide enough bounds. Please see the tutorial on calibration for more information.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| sim (Sim) | the simulation to calibrate | required | |
| calib_pars (dict) | a dictionary of the parameters to calibrate of the format dict(key1=[best, low, high]) | required | |
| fit_args (dict) | a dictionary of options that are passed to sim.compute_fit() to calculate the goodness-of-fit | required | |
| par_samplers (dict) | an optional mapping from parameters to the Optuna sampler to use for choosing new points for each; by default, suggest_float | required | |
| custom_fn (func) | a custom function for modifying the simulation; receives the sim and calib_pars as inputs, should return the modified sim | required | |
| n_trials (int) | the number of trials per worker | required | |
| n_workers (int) | the number of parallel workers (default: maximum | required | |
| total_trials (int) | if n_trials is not supplied, calculate by dividing this number by n_workers) | required | |
| name (str) | the name of the database (default: ‘covasim_calibration’) | required | |
| db_name (str) | the name of the database file (default: ‘covasim_calibration.db’) | required | |
| keep_db (bool) | whether to keep the database after calibration (default: false) | required | |
| storage (str) | the location of the database (default: sqlite) | required | |
| label (str) | a label for this calibration object | required | |
| die (bool) | whether to stop if an exception is encountered (default: false) | required | |
| verbose (bool) | whether to print details of the calibration | required | |
| kwargs (dict) | passed to cv.Calibration() | required |
Returns
| Name | Type | Description |
|---|---|---|
| A Calibration object |
Example::
sim = cv.Sim(datafile='data.csv')
calib_pars = dict(beta=[0.015, 0.010, 0.020])
calib = cv.Calibration(sim, calib_pars, total_trials=100)
calib.calibrate()
calib.plot()
New in version 3.0.3.
Methods
| Name | Description |
|---|---|
| calibrate | Actually perform calibration. |
| make_study | Make a study, deleting one if it already exists |
| parse_study | Parse the study into a data frame – called automatically |
| plot_all | Plot every point in the calibration. Warning, very slow for more than a few hundred trials. |
| plot_best | Plot only the points with lowest mismatch. New in version 3.1.1. |
| plot_sims | Plot sims, before and after calibration. |
| plot_stride | Plot a fixed number of points in order across the results. |
| plot_trend | Plot the trend in best mismatch over time. |
| remove_db | Remove the database file if keep_db is false and the path exists. |
| run_sim | Create and run a simulation |
| run_trial | Define the objective for Optuna |
| run_workers | Run multiple workers in parallel |
| summarize | Print out results from the calibration |
| to_json | Convert the data to JSON. |
| worker | Run a single worker |
calibrate
analysis.Calibration.calibrate(calib_pars=None, verbose=True, **kwargs)Actually perform calibration.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| calib_pars | dict | if supplied, overwrite stored calib_pars | None |
| verbose | bool | whether to print output from each trial | True |
| kwargs | dict | if supplied, overwrite stored run_args (n_trials, n_workers, etc.) | {} |
make_study
analysis.Calibration.make_study()Make a study, deleting one if it already exists
parse_study
analysis.Calibration.parse_study()Parse the study into a data frame – called automatically
plot_all
analysis.Calibration.plot_all()Plot every point in the calibration. Warning, very slow for more than a few hundred trials.
New in version 3.1.1.
plot_best
analysis.Calibration.plot_best(best_thresh=2)Plot only the points with lowest mismatch. New in version 3.1.1.
plot_sims
analysis.Calibration.plot_sims(**kwargs)Plot sims, before and after calibration.
New in version 3.1.1: renamed from plot() to plot_sims().
plot_stride
analysis.Calibration.plot_stride(npts=200)Plot a fixed number of points in order across the results.
New in version 3.1.1.
plot_trend
analysis.Calibration.plot_trend(best_thresh=2)Plot the trend in best mismatch over time.
New in version 3.1.1.
remove_db
analysis.Calibration.remove_db()Remove the database file if keep_db is false and the path exists.
New in version 3.1.0.
run_sim
analysis.Calibration.run_sim(calib_pars, label=None, return_sim=False)Create and run a simulation
run_trial
analysis.Calibration.run_trial(trial)Define the objective for Optuna
run_workers
analysis.Calibration.run_workers()Run multiple workers in parallel
summarize
analysis.Calibration.summarize()Print out results from the calibration
to_json
analysis.Calibration.to_json(filename=None)Convert the data to JSON.
New in version 3.1.1.
worker
analysis.Calibration.worker()Run a single worker
Fit
analysis.Fit(
sim,
weights=None,
keys=None,
custom=None,
compute=True,
verbose=False,
die=True,
label=None,
**kwargs,
)A class for calculating the fit between the model and the data. Note the following terminology is used here:
- fit: nonspecific term for how well the model matches the data
- difference: the absolute numerical differences between the model and the data (one time series per result)
- goodness-of-fit: the result of passing the difference through a statistical function, such as mean squared error
- loss: the goodness-of-fit for each result multiplied by user-specified weights (one time series per result)
- mismatches: the sum of all the losses (a single scalar value per time series)
- mismatch: the sum of the mismatches -- this is the value to be minimized during calibration
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| sim | Sim | the sim object | required |
| weights | dict | the relative weight to place on each result (by default: 10 for deaths, 5 for diagnoses, 1 for everything else) | None |
| keys | list | the keys to use in the calculation | None |
| custom | dict | a custom dictionary of additional data to fit; format is e.g. {‘my_output’:{‘data’:[1,2,3], ‘sim’:[1,2,4], ‘weights’:2.0}} | None |
| compute | bool | whether to compute the mismatch immediately | True |
| verbose | bool | detail to print | False |
| die | bool | whether to raise an exception if no data are supplied | True |
| label | str | the label for the analyzer | None |
| kwargs | dict | passed to cv.compute_gof() – see this function for more detail on goodness-of-fit calculation options | {} |
Example::
sim = cv.Sim(datafile='my-data-file.csv')
sim.run()
fit = sim.compute_fit()
fit.plot()
Methods
| Name | Description |
|---|---|
| compute | Perform all required computations |
| compute_diffs | Find the differences between the sim and the data |
| compute_gofs | Compute the goodness-of-fit |
| compute_losses | Compute the weighted goodness-of-fit |
| compute_mismatch | Compute the final mismatch |
| plot | Plot the fit of the model to the data. For each result, plot the data |
| reconcile_inputs | Find matching keys and indices between the model and the data |
| summarize | Print out results from the fit |
compute
analysis.Fit.compute()Perform all required computations
compute_diffs
analysis.Fit.compute_diffs(absolute=False)Find the differences between the sim and the data
compute_gofs
analysis.Fit.compute_gofs(**kwargs)Compute the goodness-of-fit
compute_losses
analysis.Fit.compute_losses()Compute the weighted goodness-of-fit
compute_mismatch
analysis.Fit.compute_mismatch(use_median=False)Compute the final mismatch
plot
analysis.Fit.plot(
keys=None,
width=0.8,
fig_args=None,
axis_args=None,
plot_args=None,
date_args=None,
do_show=None,
fig=None,
**kwargs,
)Plot the fit of the model to the data. For each result, plot the data and the model; the difference; and the loss (weighted difference). Also plots the loss as a function of time.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| keys | list | which keys to plot (default, all) | None |
| width | float | bar width | 0.8 |
| fig_args | dict | passed to pl.figure() |
None |
| axis_args | dict | passed to pl.subplots_adjust() |
None |
| plot_args | dict | passed to pl.plot() |
None |
| date_args | dict | passed to cv.plotting.reset_ticks() (handle date format, rotation, etc.) |
None |
| do_show | bool | whether to show the plot | None |
| fig | fig |
if supplied, use this figure to plot in | None |
| kwargs | dict | passed to cv.options.with_style() |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| Figure object |
reconcile_inputs
analysis.Fit.reconcile_inputs()Find matching keys and indices between the model and the data
summarize
analysis.Fit.summarize()Print out results from the fit
TransTree
analysis.TransTree(sim, to_networkx=False, **kwargs)A class for holding a transmission tree. There are several different representations of the transmission tree: “infection_log” is copied from the people object and is the simplest representation. “detailed h” includes additional attributes about the source and target. If NetworkX is installed (required for most methods), “graph” includes an NX representation of the transmission tree.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| sim | Sim | the sim object | required |
| to_networkx | bool | whether to convert the graph to a NetworkX object | False |
Example::
sim = cv.Sim().run()
sim.run()
tt = sim.make_transtree()
tt.plot()
tt.plot_histograms()
New in version 2.1.0: tt.detailed is a dataframe rather than a list of dictionaries; for the latter, use tt.detailed.to_dict('records').
Methods
| Name | Description |
|---|---|
| animate | Animate the transmission tree. |
| count_targets | Count the number of targets each infected person has. If start and/or end |
| count_transmissions | Iterable over edges corresponding to transmission events |
| day | Convenience function for converting an input to an integer day |
| make_detailed | Construct a detailed transmission tree, with additional information for each person |
| plot | Plot the transmission tree. |
| plot_histograms | Plots a histogram of the number of transmissions. |
| r0 | Return average number of transmissions per person |
animate
analysis.TransTree.animate(*args, **kwargs)Animate the transmission tree.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| animate | bool | whether to animate the plot (otherwise, show when finished) | required |
| verbose | bool | print out progress of each frame | required |
| markersize | int | size of the markers | required |
| sus_color | list | color for susceptibles | required |
| fig_args | dict | arguments passed to pl.figure() | required |
| axis_args | dict | arguments passed to pl.subplots_adjust() | required |
| plot_args | dict | arguments passed to pl.plot() | required |
| delay | float | delay between frames in seconds | required |
| colors | list | color of each person | required |
| cmap | str | colormap for each person (if colors is not supplied) | required |
| fig | fig |
if supplied, use this figure | required |
Returns
| Name | Type | Description |
|---|---|---|
| fig | the figure object |
count_targets
analysis.TransTree.count_targets(start_day=None, end_day=None)Count the number of targets each infected person has. If start and/or end days are given, it will only count the targets of people who got infected between those dates (it does not, however, filter on the date the target got infected).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| start_day | int / str | the day on which to start counting people who got infected | None |
| end_day | int / str | the day on which to stop counting people who got infected | None |
count_transmissions
analysis.TransTree.count_transmissions()Iterable over edges corresponding to transmission events
This excludes edges corresponding to seeded infections without a source
day
analysis.TransTree.day(day=None, which=None)Convenience function for converting an input to an integer day
make_detailed
analysis.TransTree.make_detailed(people, reset=False)Construct a detailed transmission tree, with additional information for each person
plot
analysis.TransTree.plot(fig_args=None, plot_args=None, do_show=None, fig=None)Plot the transmission tree.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fig_args | dict | passed to pl.figure() | None |
| plot_args | dict | passed to pl.plot() | None |
| do_show | bool | whether to show the plot | None |
| fig | fig |
if supplied, use this figure | None |
plot_histograms
analysis.TransTree.plot_histograms(
start_day=None,
end_day=None,
bins=None,
width=0.8,
fig_args=None,
fig=None,
)Plots a histogram of the number of transmissions.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| start_day | int / str | the day on which to start counting people who got infected | None |
| end_day | int / str | the day on which to stop counting people who got infected | None |
| bins | list | bin edges to use for the histogram | None |
| width | float | width of bars | 0.8 |
| fig_args | dict | passed to pl.figure() | None |
| fig | fig |
if supplied, use this figure | None |
r0
analysis.TransTree.r0(recovered_only=False)Return average number of transmissions per person
This doesn’t include seed transmissions. By default, it also doesn’t adjust for length of infection (e.g. people infected towards the end of the simulation will have fewer transmissions because their infection may extend past the end of the simulation, these people are not included). If ‘recovered_only=True’ then the downstream transmissions will only be included for people that recover before the end of the simulation, thus ensuring they all had the same amount of time to transmit.
age_histogram
analysis.age_histogram(
days=None,
states=None,
edges=None,
datafile=None,
sim=None,
die=True,
**kwargs,
)Calculate statistics across age bins, including histogram plotting functionality.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| days | list | list of ints/strings/date objects, the days on which to calculate the histograms (default: last day) | None |
| states | list | which states of people to record (default: exposed, tested, diagnosed, dead) | None |
| edges | list | edges of age bins to use (default: 10 year bins from 0 to 100) | None |
| datafile | str | the name of the data file to load in for comparison, or a dataframe of data (optional) | None |
| sim | Sim | only used if the analyzer is being used after a sim has already been run | None |
| die | bool | whether to raise an exception if dates are not found (default true) | True |
| kwargs | dict | passed to Analyzer() | {} |
Examples::
sim = cv.Sim(analyzers=cv.age_histogram())
sim.run()
agehist = sim.get_analyzer()
agehist = cv.age_histogram(sim=sim) # Alternate method
agehist.plot()
Methods
| Name | Description |
|---|---|
| compute_windows | Convert cumulative histograms to windows |
| from_sim | Create an age histogram from an already run sim |
| get | Retrieve a specific histogram from the given key (int, str, or date) |
| plot | Simple method for plotting the histograms. |
compute_windows
analysis.age_histogram.compute_windows()Convert cumulative histograms to windows
from_sim
analysis.age_histogram.from_sim(sim)Create an age histogram from an already run sim
get
analysis.age_histogram.get(key=None)Retrieve a specific histogram from the given key (int, str, or date)
plot
analysis.age_histogram.plot(
windows=False,
width=0.8,
color='#F8A493',
fig_args=None,
axis_args=None,
data_args=None,
**kwargs,
)Simple method for plotting the histograms.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| windows | bool | whether to plot windows instead of cumulative counts | False |
| width | float | width of bars | 0.8 |
| color | hex or rgb |
the color of the bars | '#F8A493' |
| fig_args | dict | passed to pl.figure() | None |
| axis_args | dict | passed to pl.subplots_adjust() | None |
| data_args | dict | ‘width’, ‘color’, and ‘offset’ arguments for the data | None |
| kwargs | dict | passed to cv.options.with_style(); see that function for choices |
{} |
daily_age_stats
analysis.daily_age_stats(states=None, edges=None, **kwargs)Calculate daily counts by age, saving for each day of the simulation. Can plot either time series by age or a histogram over all time.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| states | list | which states of people to record (default: [‘diagnoses’, ‘deaths’, ‘tests’, ‘severe’]) | None |
| edges | list | edges of age bins to use (default: 10 year bins from 0 to 100) | None |
| kwargs | dict | passed to Analyzer() | {} |
Examples::
sim = cv.Sim(analyzers=cv.daily_age_stats())
sim = cv.Sim(pars, analyzers=daily_age)
sim.run()
daily_age = sim.get_analyzer()
daily_age.plot()
daily_age.plot(total=True)
Methods
| Name | Description |
|---|---|
| plot | Plot the results. |
| to_df | Create dataframe totals for each day |
| to_total_df | Create dataframe totals across days |
plot
analysis.daily_age_stats.plot(
total=False,
do_show=None,
fig_args=None,
axis_args=None,
plot_args=None,
dateformat=None,
width=0.8,
color='#F8A493',
**kwargs,
)Plot the results.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| total | bool | whether to plot the total histograms rather than time series | False |
| do_show | bool | whether to show the plot | None |
| fig_args | dict | passed to pl.figure() | None |
| axis_args | dict | passed to pl.subplots_adjust() | None |
| plot_args | dict | passed to pl.plot() | None |
| dateformat | str | the format to use for the x-axes (only used for time series) | None |
| width | float | width of bars (only used for histograms) | 0.8 |
| color | hex / rgb |
the color of the bars (only used for histograms) | '#F8A493' |
| kwargs | dict | passed to cv.options.with_style() |
{} |
to_df
analysis.daily_age_stats.to_df()Create dataframe totals for each day
to_total_df
analysis.daily_age_stats.to_total_df()Create dataframe totals across days
daily_stats
analysis.daily_stats(
days=None,
verbose=True,
reporter=None,
save_inds=False,
**kwargs,
)Print out daily statistics about the simulation. Note that this analyzer takes a considerable amount of time, so should be used primarily for debugging, not in production code. To keep the intervention but toggle it off, pass an empty list of days.
To show the stats for a day after a run has finished, use e.g. daily_stats.report('2020-04-04').
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| days | list | days on which to print out statistics (if None, assume all) | None |
| verbose | bool | whether to print on each timestep | True |
| reporter | func |
if supplied, a custom parser of the stats object into a report (see make_report() function for syntax) | None |
| save_inds | bool | whether to save the indices of every infection at every timestep (also recoverable from the infection log) | False |
Example::
sim = cv.Sim(analyzers=cv.daily_stats())
sim.run()
sim['analyzers'][0].plot()
Methods
| Name | Description |
|---|---|
| intersect | Compute the intersection between arrays of indices, handling either keys |
| make_report | Turn the statistics into a report |
| plot | Plot the daily statistics recorded. Some overlap with e.g. sim.plot(to_plot='overview'). |
| report | Print out one or all reports – take a date string or an int |
| transpose | Transpose the data from a list-of-dicts-of-dicts to a dict-of-dicts-of-lists |
intersect
analysis.daily_stats.intersect(*args)Compute the intersection between arrays of indices, handling either keys to precomputed indices or lists of indices. With two array inputs, simply performs np.intersect1d(arr1, arr2).
make_report
analysis.daily_stats.make_report(sim, stats, show_empty='count')Turn the statistics into a report
plot
analysis.daily_stats.plot(
fig_args=None,
axis_args=None,
plot_args=None,
do_show=None,
**kwargs,
)Plot the daily statistics recorded. Some overlap with e.g. sim.plot(to_plot='overview').
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fig_args | dict | passed to pl.figure() | None |
| axis_args | dict | passed to pl.subplots_adjust() | None |
| plot_args | dict | passed to pl.plot() | None |
| do_show | bool | whether to show the plot | None |
| kwargs | dict | passed to cv.options.with_style() |
{} |
report
analysis.daily_stats.report(day=None)Print out one or all reports – take a date string or an int
transpose
analysis.daily_stats.transpose(keys=None)Transpose the data from a list-of-dicts-of-dicts to a dict-of-dicts-of-lists
nab_histogram
analysis.nab_histogram(days=None, edges=None, **kwargs)Store histogram of log_{10}(NAb) distribution
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| days | list | days on which calculate the NAb histogram (if None, assume last day) | None |
| edges | list | log10 bin edges for histogram | None |
Example::
sim = cv.Sim(analyzers=cv.nab_histogram())
sim.run()
sim.get_analyzer().plot()
New in version 3.1.0.
Methods
| Name | Description |
|---|---|
| plot | Plot the results |
plot
analysis.nab_histogram.plot(
fig_args=None,
axis_args=None,
plot_args=None,
do_show=None,
**kwargs,
)Plot the results
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fig_args | dict | passed to pl.figure() | None |
| axis_args | dict | passed to pl.subplots_adjust() | None |
| plot_args | dict | passed to pl.plot() | None |
| do_show | bool | whether to show the plot | None |
| kwargs | dict | passed to cv.options.with_style() |
{} |
snapshot
analysis.snapshot(days, *args, die=True, **kwargs)Analyzer that takes a “snapshot” of the sim.people array at specified points in time, and saves them to itself. To retrieve them, you can either access the dictionary directly, or use the get() method.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| days | list | list of ints/strings/date objects, the days on which to take the snapshot | required |
| args | list | additional day(s) | () |
| die | bool | whether or not to raise an exception if a date is not found (default true) | True |
| kwargs | dict | passed to Analyzer() | {} |
Example::
sim = cv.Sim(analyzers=cv.snapshot('2020-04-04', '2020-04-14'))
sim.run()
snapshot = sim['analyzers'][0]
people = snapshot.snapshots[0] # Option 1
people = snapshot.snapshots['2020-04-04'] # Option 2
people = snapshot.get('2020-04-14') # Option 3
people = snapshot.get(34) # Option 4
people = snapshot.get() # Option 5
Methods
| Name | Description |
|---|---|
| get | Retrieve a snapshot from the given key (int, str, or date) |
get
analysis.snapshot.get(key=None)Retrieve a snapshot from the given key (int, str, or date)