analysis

analysis

Additional analysis functions that are not part of the core Covasim workflow, but which are useful for particular investigations.

Classes

Name	Description
Analyzer	Base class for analyzers. Based on the Intervention class. Analyzers are used
Calibration	A class to handle calibration of Covasim simulations. Uses the Optuna hyperparameter
Fit	A class for calculating the fit between the model and the data. Note the
TransTree	A class for holding a transmission tree. There are several different representations
age_histogram	Calculate statistics across age bins, including histogram plotting functionality.
daily_age_stats	Calculate daily counts by age, saving for each day of the simulation. Can
daily_stats	Print out daily statistics about the simulation. Note that this analyzer takes
nab_histogram	Store histogram of log_{10}(NAb) distribution
snapshot	Analyzer that takes a “snapshot” of the sim.people array at specified points

Analyzer

analysis.Analyzer(label=None)

Base class for analyzers. Based on the Intervention class. Analyzers are used to provide more detailed information about a simulation than is available by default – for example, pulling states out of sim.people on a particular timestep before it gets updated in the next timestep.

To retrieve a particular analyzer from a sim, use sim.get_analyzer().

Parameters

Name	Type	Description	Default
label	str	a label for the Analyzer (used for ease of identification)	`None`

Methods

Name	Description
apply	Apply analyzer at each time point. The analyzer has full access to the
finalize	Finalize analyzer
initialize	Initialize the analyzer, e.g. convert date strings to integers.
shrink	Remove any excess stored data from the intervention; for use with sim.shrink().
to_json	Return JSON-compatible representation

apply

analysis.Analyzer.apply(sim)

Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.

Parameters

Name	Type	Description	Default
sim		the Sim instance	required

finalize

analysis.Analyzer.finalize(sim=None)

Finalize analyzer

This method is run once as part of sim.finalize() enabling the analyzer to perform any final operations after the simulation is complete (e.g. rescaling)

initialize

analysis.Analyzer.initialize(sim=None)

Initialize the analyzer, e.g. convert date strings to integers.

shrink

analysis.Analyzer.shrink(in_place=False)

Remove any excess stored data from the intervention; for use with sim.shrink().

Parameters

Name	Type	Description	Default
in_place	bool	whether to shrink the intervention (else shrink a copy)	`False`

to_json

analysis.Analyzer.to_json()

Return JSON-compatible representation

Custom classes can’t be directly represented in JSON. This method is a one-way export to produce a JSON-compatible representation of the intervention. This method will attempt to JSONify each attribute of the intervention, skipping any that fail.

Returns

Name	Type	Description
		JSON-serializable representation

Calibration

analysis.Calibration(
    sim,
    calib_pars=None,
    fit_args=None,
    custom_fn=None,
    par_samplers=None,
    n_trials=None,
    n_workers=None,
    total_trials=None,
    name=None,
    db_name=None,
    keep_db=None,
    storage=None,
    label=None,
    die=False,
    verbose=True,
)

A class to handle calibration of Covasim simulations. Uses the Optuna hyperparameter optimization library (optuna.org), which must be installed separately (via pip install optuna).

Note: running a calibration does not guarantee a good fit! You must ensure that you run for a sufficient number of iterations, have enough free parameters, and that the parameters have wide enough bounds. Please see the tutorial on calibration for more information.

Parameters

Name	Description	Default
sim (Sim)	the simulation to calibrate	required
calib_pars (dict)	a dictionary of the parameters to calibrate of the format dict(key1=[best, low, high])	required
fit_args (dict)	a dictionary of options that are passed to sim.compute_fit() to calculate the goodness-of-fit	required
par_samplers (dict)	an optional mapping from parameters to the Optuna sampler to use for choosing new points for each; by default, suggest_float	required
custom_fn (func)	a custom function for modifying the simulation; receives the sim and calib_pars as inputs, should return the modified sim	required
n_trials (int)	the number of trials per worker	required
n_workers (int)	the number of parallel workers (default: maximum	required
total_trials (int)	if n_trials is not supplied, calculate by dividing this number by n_workers)	required
name (str)	the name of the database (default: ‘covasim_calibration’)	required
db_name (str)	the name of the database file (default: ‘covasim_calibration.db’)	required
keep_db (bool)	whether to keep the database after calibration (default: false)	required
storage (str)	the location of the database (default: sqlite)	required
label (str)	a label for this calibration object	required
die (bool)	whether to stop if an exception is encountered (default: false)	required
verbose (bool)	whether to print details of the calibration	required
kwargs (dict)	passed to cv.Calibration()	required

Returns

Name	Type	Description
		A Calibration object

Example::

sim = cv.Sim(datafile='data.csv')
calib_pars = dict(beta=[0.015, 0.010, 0.020])
calib = cv.Calibration(sim, calib_pars, total_trials=100)
calib.calibrate()
calib.plot()

New in version 3.0.3.

Methods

Name	Description
calibrate	Actually perform calibration.
make_study	Make a study, deleting one if it already exists
parse_study	Parse the study into a data frame – called automatically
plot_all	Plot every point in the calibration. Warning, very slow for more than a few hundred trials.
plot_best	Plot only the points with lowest mismatch. New in version 3.1.1.
plot_sims	Plot sims, before and after calibration.
plot_stride	Plot a fixed number of points in order across the results.
plot_trend	Plot the trend in best mismatch over time.
remove_db	Remove the database file if keep_db is false and the path exists.
run_sim	Create and run a simulation
run_trial	Define the objective for Optuna
run_workers	Run multiple workers in parallel
summarize	Print out results from the calibration
to_json	Convert the data to JSON.
worker	Run a single worker

calibrate

analysis.Calibration.calibrate(calib_pars=None, verbose=True, **kwargs)

Actually perform calibration.

Parameters

Name	Type	Description	Default
calib_pars	dict	if supplied, overwrite stored calib_pars	`None`
verbose	bool	whether to print output from each trial	`True`
kwargs	dict	if supplied, overwrite stored run_args (n_trials, n_workers, etc.)	`{}`

make_study

analysis.Calibration.make_study()

Make a study, deleting one if it already exists

parse_study

analysis.Calibration.parse_study()

Parse the study into a data frame – called automatically

plot_all

analysis.Calibration.plot_all()

Plot every point in the calibration. Warning, very slow for more than a few hundred trials.

New in version 3.1.1.

plot_best

analysis.Calibration.plot_best(best_thresh=2)

Plot only the points with lowest mismatch. New in version 3.1.1.

plot_sims

analysis.Calibration.plot_sims(**kwargs)

Plot sims, before and after calibration.

New in version 3.1.1: renamed from plot() to plot_sims().

plot_stride

analysis.Calibration.plot_stride(npts=200)

Plot a fixed number of points in order across the results.

New in version 3.1.1.

plot_trend

analysis.Calibration.plot_trend(best_thresh=2)

Plot the trend in best mismatch over time.

New in version 3.1.1.

remove_db

analysis.Calibration.remove_db()

Remove the database file if keep_db is false and the path exists.

New in version 3.1.0.

run_sim

analysis.Calibration.run_sim(calib_pars, label=None, return_sim=False)

Create and run a simulation

run_trial

analysis.Calibration.run_trial(trial)

Define the objective for Optuna

run_workers

analysis.Calibration.run_workers()

Run multiple workers in parallel

summarize

analysis.Calibration.summarize()

Print out results from the calibration

to_json

analysis.Calibration.to_json(filename=None)

Convert the data to JSON.

New in version 3.1.1.

worker

analysis.Calibration.worker()

Run a single worker

Fit

analysis.Fit(
    sim,
    weights=None,
    keys=None,
    custom=None,
    compute=True,
    verbose=False,
    die=True,
    label=None,
    **kwargs,
)

A class for calculating the fit between the model and the data. Note the following terminology is used here:

- fit: nonspecific term for how well the model matches the data
- difference: the absolute numerical differences between the model and the data (one time series per result)
- goodness-of-fit: the result of passing the difference through a statistical function, such as mean squared error
- loss: the goodness-of-fit for each result multiplied by user-specified weights (one time series per result)
- mismatches: the sum of all the losses (a single scalar value per time series)
- mismatch: the sum of the mismatches -- this is the value to be minimized during calibration

Parameters

Name	Type	Description	Default
sim	Sim	the sim object	required
weights	dict	the relative weight to place on each result (by default: 10 for deaths, 5 for diagnoses, 1 for everything else)	`None`
keys	list	the keys to use in the calculation	`None`
custom	dict	a custom dictionary of additional data to fit; format is e.g. {‘my_output’:{‘data’:[1,2,3], ‘sim’:[1,2,4], ‘weights’:2.0}}	`None`
compute	bool	whether to compute the mismatch immediately	`True`
verbose	bool	detail to print	`False`
die	bool	whether to raise an exception if no data are supplied	`True`
label	str	the label for the analyzer	`None`
kwargs	dict	passed to cv.compute_gof() – see this function for more detail on goodness-of-fit calculation options	`{}`

Example::

sim = cv.Sim(datafile='my-data-file.csv')
sim.run()
fit = sim.compute_fit()
fit.plot()

Methods

Name	Description
compute	Perform all required computations
compute_diffs	Find the differences between the sim and the data
compute_gofs	Compute the goodness-of-fit
compute_losses	Compute the weighted goodness-of-fit
compute_mismatch	Compute the final mismatch
plot	Plot the fit of the model to the data. For each result, plot the data
reconcile_inputs	Find matching keys and indices between the model and the data
summarize	Print out results from the fit

compute

analysis.Fit.compute()

Perform all required computations

compute_diffs

analysis.Fit.compute_diffs(absolute=False)

Find the differences between the sim and the data

compute_gofs

analysis.Fit.compute_gofs(**kwargs)

Compute the goodness-of-fit

compute_losses

analysis.Fit.compute_losses()

Compute the weighted goodness-of-fit

compute_mismatch

analysis.Fit.compute_mismatch(use_median=False)

Compute the final mismatch

plot

analysis.Fit.plot(
    keys=None,
    width=0.8,
    fig_args=None,
    axis_args=None,
    plot_args=None,
    date_args=None,
    do_show=None,
    fig=None,
    **kwargs,
)

Plot the fit of the model to the data. For each result, plot the data and the model; the difference; and the loss (weighted difference). Also plots the loss as a function of time.

Parameters

Name	Type	Description	Default
keys	list	which keys to plot (default, all)	`None`
width	float	bar width	`0.8`
fig_args	dict	passed to `pl.figure()`	`None`
axis_args	dict	passed to `pl.subplots_adjust()`	`None`
plot_args	dict	passed to `pl.plot()`	`None`
date_args	dict	passed to `cv.plotting.reset_ticks()` (handle date format, rotation, etc.)	`None`
do_show	bool	whether to show the plot	`None`
fig	`fig`	if supplied, use this figure to plot in	`None`
kwargs	dict	passed to `cv.options.with_style()`	`{}`

Returns

Name	Type	Description
		Figure object

reconcile_inputs

analysis.Fit.reconcile_inputs()

Find matching keys and indices between the model and the data

summarize

analysis.Fit.summarize()

Print out results from the fit

TransTree

analysis.TransTree(sim, to_networkx=False, **kwargs)

A class for holding a transmission tree. There are several different representations of the transmission tree: “infection_log” is copied from the people object and is the simplest representation. “detailed h” includes additional attributes about the source and target. If NetworkX is installed (required for most methods), “graph” includes an NX representation of the transmission tree.

Parameters

Name	Type	Description	Default
sim	Sim	the sim object	required
to_networkx	bool	whether to convert the graph to a NetworkX object	`False`

Example::

sim = cv.Sim().run()
sim.run()
tt = sim.make_transtree()
tt.plot()
tt.plot_histograms()

New in version 2.1.0: tt.detailed is a dataframe rather than a list of dictionaries; for the latter, use tt.detailed.to_dict('records').

Methods

Name	Description
animate	Animate the transmission tree.
count_targets	Count the number of targets each infected person has. If start and/or end
count_transmissions	Iterable over edges corresponding to transmission events
day	Convenience function for converting an input to an integer day
make_detailed	Construct a detailed transmission tree, with additional information for each person
plot	Plot the transmission tree.
plot_histograms	Plots a histogram of the number of transmissions.
r0	Return average number of transmissions per person

animate

analysis.TransTree.animate(*args, **kwargs)

Animate the transmission tree.

Parameters

Name	Type	Description	Default
animate	bool	whether to animate the plot (otherwise, show when finished)	required
verbose	bool	print out progress of each frame	required
markersize	int	size of the markers	required
sus_color	list	color for susceptibles	required
fig_args	dict	arguments passed to pl.figure()	required
axis_args	dict	arguments passed to pl.subplots_adjust()	required
plot_args	dict	arguments passed to pl.plot()	required
delay	float	delay between frames in seconds	required
colors	list	color of each person	required
cmap	str	colormap for each person (if colors is not supplied)	required
fig	`fig`	if supplied, use this figure	required

Returns

Name	Type	Description
fig		the figure object

count_targets

analysis.TransTree.count_targets(start_day=None, end_day=None)

Count the number of targets each infected person has. If start and/or end days are given, it will only count the targets of people who got infected between those dates (it does not, however, filter on the date the target got infected).

Parameters

Name	Type	Description	Default
start_day	int / str	the day on which to start counting people who got infected	`None`
end_day	int / str	the day on which to stop counting people who got infected	`None`

count_transmissions

analysis.TransTree.count_transmissions()

Iterable over edges corresponding to transmission events

This excludes edges corresponding to seeded infections without a source

day

analysis.TransTree.day(day=None, which=None)

Convenience function for converting an input to an integer day

make_detailed

analysis.TransTree.make_detailed(people, reset=False)

Construct a detailed transmission tree, with additional information for each person

plot

analysis.TransTree.plot(fig_args=None, plot_args=None, do_show=None, fig=None)

Plot the transmission tree.

Parameters

Name	Type	Description	Default
fig_args	dict	passed to pl.figure()	`None`
plot_args	dict	passed to pl.plot()	`None`
do_show	bool	whether to show the plot	`None`
fig	`fig`	if supplied, use this figure	`None`

plot_histograms

analysis.TransTree.plot_histograms(
    start_day=None,
    end_day=None,
    bins=None,
    width=0.8,
    fig_args=None,
    fig=None,
)

Plots a histogram of the number of transmissions.

Parameters

Name	Type	Description	Default
start_day	int / str	the day on which to start counting people who got infected	`None`
end_day	int / str	the day on which to stop counting people who got infected	`None`
bins	list	bin edges to use for the histogram	`None`
width	float	width of bars	`0.8`
fig_args	dict	passed to pl.figure()	`None`
fig	`fig`	if supplied, use this figure	`None`

r0

analysis.TransTree.r0(recovered_only=False)

Return average number of transmissions per person

This doesn’t include seed transmissions. By default, it also doesn’t adjust for length of infection (e.g. people infected towards the end of the simulation will have fewer transmissions because their infection may extend past the end of the simulation, these people are not included). If ‘recovered_only=True’ then the downstream transmissions will only be included for people that recover before the end of the simulation, thus ensuring they all had the same amount of time to transmit.

age_histogram

analysis.age_histogram(
    days=None,
    states=None,
    edges=None,
    datafile=None,
    sim=None,
    die=True,
    **kwargs,
)

Calculate statistics across age bins, including histogram plotting functionality.

Parameters

Name	Type	Description	Default
days	list	list of ints/strings/date objects, the days on which to calculate the histograms (default: last day)	`None`
states	list	which states of people to record (default: exposed, tested, diagnosed, dead)	`None`
edges	list	edges of age bins to use (default: 10 year bins from 0 to 100)	`None`
datafile	str	the name of the data file to load in for comparison, or a dataframe of data (optional)	`None`
sim	Sim	only used if the analyzer is being used after a sim has already been run	`None`
die	bool	whether to raise an exception if dates are not found (default true)	`True`
kwargs	dict	passed to Analyzer()	`{}`

Examples::

sim = cv.Sim(analyzers=cv.age_histogram())
sim.run()

agehist = sim.get_analyzer()
agehist = cv.age_histogram(sim=sim) # Alternate method
agehist.plot()

Methods

Name	Description
compute_windows	Convert cumulative histograms to windows
from_sim	Create an age histogram from an already run sim
get	Retrieve a specific histogram from the given key (int, str, or date)
plot	Simple method for plotting the histograms.

compute_windows

analysis.age_histogram.compute_windows()

Convert cumulative histograms to windows

from_sim

analysis.age_histogram.from_sim(sim)

Create an age histogram from an already run sim

get

analysis.age_histogram.get(key=None)

Retrieve a specific histogram from the given key (int, str, or date)

plot

analysis.age_histogram.plot(
    windows=False,
    width=0.8,
    color='#F8A493',
    fig_args=None,
    axis_args=None,
    data_args=None,
    **kwargs,
)

Simple method for plotting the histograms.

Parameters

Name	Type	Description	Default
windows	bool	whether to plot windows instead of cumulative counts	`False`
width	float	width of bars	`0.8`
color	hex or `rgb`	the color of the bars	`'#F8A493'`
fig_args	dict	passed to pl.figure()	`None`
axis_args	dict	passed to pl.subplots_adjust()	`None`
data_args	dict	‘width’, ‘color’, and ‘offset’ arguments for the data	`None`
kwargs	dict	passed to `cv.options.with_style()`; see that function for choices	`{}`

daily_age_stats

analysis.daily_age_stats(states=None, edges=None, **kwargs)

Calculate daily counts by age, saving for each day of the simulation. Can plot either time series by age or a histogram over all time.

Parameters

Name	Type	Description	Default
states	list	which states of people to record (default: [‘diagnoses’, ‘deaths’, ‘tests’, ‘severe’])	`None`
edges	list	edges of age bins to use (default: 10 year bins from 0 to 100)	`None`
kwargs	dict	passed to Analyzer()	`{}`

Examples::

sim = cv.Sim(analyzers=cv.daily_age_stats())
sim = cv.Sim(pars, analyzers=daily_age)
sim.run()
daily_age = sim.get_analyzer()
daily_age.plot()
daily_age.plot(total=True)

Methods

Name	Description
plot	Plot the results.
to_df	Create dataframe totals for each day
to_total_df	Create dataframe totals across days

plot

analysis.daily_age_stats.plot(
    total=False,
    do_show=None,
    fig_args=None,
    axis_args=None,
    plot_args=None,
    dateformat=None,
    width=0.8,
    color='#F8A493',
    **kwargs,
)

Plot the results.

Parameters

Name	Type	Description	Default
total	bool	whether to plot the total histograms rather than time series	`False`
do_show	bool	whether to show the plot	`None`
fig_args	dict	passed to pl.figure()	`None`
axis_args	dict	passed to pl.subplots_adjust()	`None`
plot_args	dict	passed to pl.plot()	`None`
dateformat	str	the format to use for the x-axes (only used for time series)	`None`
width	float	width of bars (only used for histograms)	`0.8`
color	hex / `rgb`	the color of the bars (only used for histograms)	`'#F8A493'`
kwargs	dict	passed to `cv.options.with_style()`	`{}`

to_df

analysis.daily_age_stats.to_df()

Create dataframe totals for each day

to_total_df

analysis.daily_age_stats.to_total_df()

Create dataframe totals across days

daily_stats

analysis.daily_stats(
    days=None,
    verbose=True,
    reporter=None,
    save_inds=False,
    **kwargs,
)

Print out daily statistics about the simulation. Note that this analyzer takes a considerable amount of time, so should be used primarily for debugging, not in production code. To keep the intervention but toggle it off, pass an empty list of days.

To show the stats for a day after a run has finished, use e.g. daily_stats.report('2020-04-04').

Parameters

Name	Type	Description	Default
days	list	days on which to print out statistics (if None, assume all)	`None`
verbose	bool	whether to print on each timestep	`True`
reporter	`func`	if supplied, a custom parser of the stats object into a report (see make_report() function for syntax)	`None`
save_inds	bool	whether to save the indices of every infection at every timestep (also recoverable from the infection log)	`False`

Example::

sim = cv.Sim(analyzers=cv.daily_stats())
sim.run()
sim['analyzers'][0].plot()

Methods

Name	Description
intersect	Compute the intersection between arrays of indices, handling either keys
make_report	Turn the statistics into a report
plot	Plot the daily statistics recorded. Some overlap with e.g. `sim.plot(to_plot='overview')`.
report	Print out one or all reports – take a date string or an int
transpose	Transpose the data from a list-of-dicts-of-dicts to a dict-of-dicts-of-lists

intersect

analysis.daily_stats.intersect(*args)

Compute the intersection between arrays of indices, handling either keys to precomputed indices or lists of indices. With two array inputs, simply performs np.intersect1d(arr1, arr2).

make_report

analysis.daily_stats.make_report(sim, stats, show_empty='count')

Turn the statistics into a report

plot

analysis.daily_stats.plot(
    fig_args=None,
    axis_args=None,
    plot_args=None,
    do_show=None,
    **kwargs,
)

Plot the daily statistics recorded. Some overlap with e.g. sim.plot(to_plot='overview').

Parameters

Name	Type	Description	Default
fig_args	dict	passed to pl.figure()	`None`
axis_args	dict	passed to pl.subplots_adjust()	`None`
plot_args	dict	passed to pl.plot()	`None`
do_show	bool	whether to show the plot	`None`
kwargs	dict	passed to `cv.options.with_style()`	`{}`

report

analysis.daily_stats.report(day=None)

Print out one or all reports – take a date string or an int

transpose

analysis.daily_stats.transpose(keys=None)

Transpose the data from a list-of-dicts-of-dicts to a dict-of-dicts-of-lists

nab_histogram

analysis.nab_histogram(days=None, edges=None, **kwargs)

Store histogram of log_{10}(NAb) distribution

Parameters

Name	Type	Description	Default
days	list	days on which calculate the NAb histogram (if None, assume last day)	`None`
edges	list	log10 bin edges for histogram	`None`

Example::

sim = cv.Sim(analyzers=cv.nab_histogram())
sim.run()
sim.get_analyzer().plot()

New in version 3.1.0.

Methods

Name	Description
plot	Plot the results

plot

analysis.nab_histogram.plot(
    fig_args=None,
    axis_args=None,
    plot_args=None,
    do_show=None,
    **kwargs,
)

Plot the results

Parameters

Name	Type	Description	Default
fig_args	dict	passed to pl.figure()	`None`
axis_args	dict	passed to pl.subplots_adjust()	`None`
plot_args	dict	passed to pl.plot()	`None`
do_show	bool	whether to show the plot	`None`
kwargs	dict	passed to `cv.options.with_style()`	`{}`

snapshot

analysis.snapshot(days, *args, die=True, **kwargs)

Analyzer that takes a “snapshot” of the sim.people array at specified points in time, and saves them to itself. To retrieve them, you can either access the dictionary directly, or use the get() method.

Parameters

Name	Type	Description	Default
days	list	list of ints/strings/date objects, the days on which to take the snapshot	required
args	list	additional day(s)	`()`
die	bool	whether or not to raise an exception if a date is not found (default true)	`True`
kwargs	dict	passed to Analyzer()	`{}`

Example::

sim = cv.Sim(analyzers=cv.snapshot('2020-04-04', '2020-04-14'))
sim.run()
snapshot = sim['analyzers'][0]
people = snapshot.snapshots[0]            # Option 1
people = snapshot.snapshots['2020-04-04'] # Option 2
people = snapshot.get('2020-04-14')       # Option 3
people = snapshot.get(34)                 # Option 4
people = snapshot.get()                   # Option 5

Methods

Name	Description
get	Retrieve a snapshot from the given key (int, str, or date)

get

analysis.snapshot.get(key=None)

Retrieve a snapshot from the given key (int, str, or date)