Skip to content

Structured logging

Asif Tamuri edited this page Jul 3, 2020 · 6 revisions

All logging should use the module provided by TLOmodel: from tlo import logging

Configuring a simulation

Seeding the random number generator and configuring logging when instantiating the Simulation ensures reproducibility of the run.

sim = Simulation(start_date=start_date, seed=seed, log_config=log_config)

Log configuration

By default, all log messages at INFO level or higher are printed to screen (stdout). Configure logging behaviour using the log_config argument of Simulation

log_config should be given a dictionary with the keys:

  • filename: the name of the output file, to which a timestamp will suffixed. If not given, log is only printed to screen.
  • directory: overrides the default output path ./outputs
  • custom_levels: a dictionary defining the log levels for the disease/intervention modules, having a structure of {'disease_module_path': logging_level}. These are applied in order.
    • '*' matches all loggers to override the default level for all loggers
    • You can hide output from some modules and enable INFO or DEBUG level for others.
example_log_config = {
    "filename": "labour_analysis", 
    "directory": "./outputs",  
    "custom_levels": { 
        "*": logging.WARNING,  # set the level of all modules to 'WARNING'
        "tlo.methods.labour": logging.INFO,  # set level of labour to 'INFO'
    }
}

Example simulation setup with all configuration

# To reproduce the results, you must set the seed for the Simulation instance. The Simulation
# will seed the random number generators for each module when they are registered.
# If a seed argument is not given, one is generated. It is output in the log and can be
# used to reproduce results of a run
seed = 567

log_config = {
    "filename": "labour_analysis",   # The name of the output file (a timestamp will be appended).
    "directory": "./outputs",  # The default output path is `./outputs`. Change it here, if necessary
    "custom_levels": {  # Customise the output of specific loggers. They are applied in order:
        "*": logging.WARNING,  # Asterisk matches all loggers - we set the default level to WARNING
        "tlo.methods.labour": logging.INFO,
        "tlo.methods.newborn_outcomes": logging.DEBUG
    }
}

# Basic arguments required for the simulation
start_date = Date(2010, 1, 1)
end_date = Date(2020, 1, 1)
pop_size = 1000

# This creates the Simulation instance for this run. Because we've passed the `seed` and
# `log_config` arguments, these will override the default behaviour.
sim = Simulation(start_date=start_date, seed=seed, log_config=log_config)

# Path to the resource files used by the disease and intervention methods
resources = "./resources"

# Used to configure health system behaviour
service_availability = ["*"]

# We register all modules in a single call to the register method, calling once with multiple
# objects. This is preferred to registering each module in multiple calls because we will be
# able to handle dependencies if modules are registered together
sim.register(
    demography.Demography(resourcefilepath=resources),
    enhanced_lifestyle.Lifestyle(resourcefilepath=resources),
    healthsystem.HealthSystem(resourcefilepath=resources, service_availability=service_availability),
    symptommanager.SymptomManager(resourcefilepath=resources),
    healthseekingbehaviour.HealthSeekingBehaviour(resourcefilepath=resources),
    contraception.Contraception(resourcefilepath=resources),
    labour.Labour(resourcefilepath=resources),
    pregnancy_supervisor.PregnancySupervisor(resourcefilepath=resources),
)
# create and run the simulation
sim.make_initial_population(n=pop_size)
sim.simulate(end_date=end_date)

# parse the simulation logfile to get the output dataframes
log_df = parse_log_file(sim.log_filepath)

Data types that can be logged

Structured logging will convert logged information into a dataframe for each key. The simulation date at the time of logging is automatically added. Below, we show some examples of logging different types and the parsed output.

Recommended types for logging

Simple key/value dictionaries

These are the preferred datatype for logging. Simple scalar data types will be preserved, supporting:

  • Numeric types (integers and floats)
  • String types
  • Categories (will be converted to strings)
  • Timestamps
  • Missing values (NaN, NaT, None)
death_summary = {
    'died_count': died,
    'died_proportion': died / len(df),
    'most_common_symptom_on_death': rng.choice(
        ['fever', 'vomiting', 'stomachache', 'sore_throat', 'respiratory_symptoms', 'headache']
    )
}

logger.info(key='death',
            data=death_summary,
            description='Summary of deaths this month')
date died_count died_ratio most_common_symptom_on_death
0 2010-01-01 00:00:00 52 0.0052 respiratory_symptoms
1 2010-02-01 00:00:00 54 0.0054 respiratory_symptoms
2 2010-03-01 00:00:00 77 0.0077 stomachache
3 2010-04-01 00:00:00 41 0.0041 vomiting
4 2010-05-01 00:00:00 90 0.009 sore_throat

Dataframe with a single row

A dataframe with only one row can be logged without any changes required.

# get single row of dataframe (but not a series)
individual = df.loc[[1]]

logger.debug(key='individual_check',
             data=individual,
             description='following an individual through simulation')
date is_alive date_of_birth date_of_death sex mother_id age_exact_years
0 2010-01-01 00:00:00 True 2006-08-09 00:00:00 F -1 3.39774
1 2010-02-01 00:00:00 True 2006-08-09 00:00:00 F -1 3.47988
2 2010-03-01 00:00:00 True 2006-08-09 00:00:00 F -1 3.55654
3 2010-04-01 00:00:00 True 2006-08-09 00:00:00 F -1 3.64142
4 2010-05-01 00:00:00 True 2006-08-09 00:00:00 F -1 3.72355

Fixed-length lists

Lists that will always have the same number of items can be logged. To make them fit into the dataframe structure, each position in a list becomes a column.

top_3_symptoms: list = rng.choice(
    ['fever', 'vomiting', 'stomachache', 'sore_throat', 'respiratory_symptoms', 'headache'],
    3).tolist()

logger.info(key='top_symptoms',
            data=top_3_symptoms,
            description='Top 3 most common symptoms in the population')
date item_1 item_2 item_3
0 2010-01-01 00:00:00 sore_throat sore_throat fever
1 2010-02-01 00:00:00 fever stomachache headache
2 2010-03-01 00:00:00 stomachache respiratory_symptoms stomachache
3 2010-04-01 00:00:00 respiratory_symptoms stomachache fever
4 2010-05-01 00:00:00 fever stomachache stomachache

Series

A pandas series can be logged, it is handled just like a list

first_age_years = df.age_years.head(6) # series of 6 individual's age years
logger.info(key='age_years',
            data=first_age_years,
            description='First 6 individuals age in years')
date item_1 item_2 item_3 item_4 item_5 item_6
0 2010-01-01 00:00:00 3 53 11 19 31 4
1 2010-02-01 00:00:00 3 53 11 19 31 5
2 2010-03-01 00:00:00 3 53 11 19 32 5
3 2010-04-01 00:00:00 3 53 11 19 32 5
4 2010-05-01 00:00:00 3 53 11 19 32 5

Other less recommended types

You can log other types of data, but you might need to do work after parsing the logfile to convert the datatypes back to the right type.

Variable length list

If your list varies in length, you log it by wrapping it in a dictionary

number_to_pick = rng.randint(1, 4)
symptoms_at_time = rng.choice(
    ['fever', 'vomiting', 'stomachache', 'sore_throat', 'respiratory_symptoms', 'headache'],
    number_to_pick).tolist()

logger.info(key='variable_selection',
            data={'list_head': symptoms_at_time},
            description='Symptoms an individual experiences at a given time')
date list_head
0 2010-01-01 00:00:00 ['respiratory_symptoms']
1 2010-02-01 00:00:00 ['respiratory_symptoms', 'stomachache', 'respiratory_symptoms']
2 2010-03-01 00:00:00 ['headache']
3 2010-04-01 00:00:00 ['stomachache']
4 2010-05-01 00:00:00 ['fever', 'sore_throat', 'fever']

Dataframe with more than one row

Logging an entire dataframe is possible, converting it to a multi-index dataframe. The first index in the pair defines the timepoint that the dataframe was logged, the second index is from the dataframe being logged.

date is_alive date_of_birth date_of_death sex mother_id
(0, '0') 2010-01-01 00:00:00 True 2006-08-09T00:00:00 F -1
(0, '1') 2010-01-01 00:00:00 True 1956-10-13T00:00:00 M -1
(0, '2') 2010-01-01 00:00:00 True 1998-12-10T00:00:00 F -1
(1, '0') 2010-02-01 00:00:00 True 2006-08-09T00:00:00 F -1
(1, '1') 2010-02-01 00:00:00 True 1956-10-13T00:00:00 M -1
(1, '2') 2010-02-01 00:00:00 True 1998-12-10T00:00:00 F -1
(2, '0') 2010-03-01 00:00:00 True 2006-08-09T00:00:00 F -1
(2, '1') 2010-03-01 00:00:00 True 1956-10-13T00:00:00 M -1
(2, '2') 2010-03-01 00:00:00 True 1998-12-10T00:00:00 F -1
Clone this wiki locally