Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Population dataframe accesses in HSI_Contraception_FamilyPlanningAppt.EXPECTED_APPT_FOOTPRINT are performance bottleneck #1467

Open
matt-graham opened this issue Sep 26, 2024 · 2 comments
Labels
performance question Further information is requested

Comments

@matt-graham
Copy link
Collaborator

Look at the most recent successful profiling run results

https://github-pages.ucl.ac.uk/TLOmodel-profiling/_static/profiling_html/schedule_1226_dbf33d69edadbacdc2dad1fc32c93eb771b03f9b.html

calls to the HSI_Contraception_FamilyPlanningAppt.EXPECTED_APPT_FOOTPRINT property are consuming a lot of run time, with a total of 2570s spent in evaluating this property out of a total run time of 27 400s (9.3%). There are three functions calling HSI_Contraception_FamilyPlanningAppt.EXPECTED_APPT_FOOTPRINT with a significant amount of time recorded in each call path

  • HealthSystem.check_hsi_event_is_valid (1110s)
  • HSI_Contraception_FamilyPlanningAppt.initialise (545s)
  • HSI_Contraception_FamilyPlanningAppt.run (911s)

and in all cases the vast majority of the time in the HSI_Contraception_FamilyPlanningAppt.EXPECTED_APPT_FOOTPRINT call is spent in .loc indexing operations on the population dataframe.

Looking at the implementation of the property

@property
def EXPECTED_APPT_FOOTPRINT(self):
"""Return the expected appt footprint based on contraception method and whether the HSI has been rescheduled."""
person_id = self.target
current_method = self.sim.population.props.loc[person_id].co_contraception
if self._number_of_times_run > 0: # if it is to re-schedule due to unavailable consumables
return self.make_appt_footprint({})
# if to switch to a method
elif self.new_contraceptive in ['female_sterilization']:
return self.make_appt_footprint({'MinorSurg': 1})
elif self.new_contraceptive != current_method:
return self.make_appt_footprint({'FamPlan': 1})
# if to maintain on a method
elif self.new_contraceptive in ['injections', 'IUD', 'implant']:
return self.make_appt_footprint({'FamPlan': 1})
elif self.new_contraceptive in ['male_condom', 'other_modern', 'pill']:
return self.make_appt_footprint({'PharmDispensing': 1})
else:
return self.make_appt_footprint({})
warnings.warn(UserWarning("Assumed empty footprint for Contraception Routine appt because couldn't find"
"actual case."))

we can see this must be arising from the access to the current contraception method property for the individual the HSI event is targetting in the line

current_method = self.sim.population.props.loc[person_id].co_contraception 
This pattern of the `EXPECTED_APPT_FOOTPRINT` being a property rather than an attribute seems to also be used by a few other HSI events

HSI_Hiv_StartOrContinueTreatment

@property
def EXPECTED_APPT_FOOTPRINT(self):
"""Returns the appointment footprint for this person according to their current status:
* `NewAdult` for an adult, newly starting treatment
* `EstNonCom` for an adult, re-starting treatment or already on treatment
(NB. This is an appointment type that assumes that the patient does not have complications.)
* `Peds` for a child - whether newly starting or already on treatment
"""
person_id = self.target
if self.sim.population.props.at[person_id, 'age_years'] < 15:
return self.make_appt_footprint({"Peds": 1}) # Child
if (self.sim.population.props.at[person_id, 'hv_art'] == "not") & (
pd.isna(self.sim.population.props.at[person_id, 'hv_date_treated'])
):
return self.make_appt_footprint({"NewAdult": 1}) # Adult newly starting treatment
else:
return self.make_appt_footprint({"EstNonCom": 1}) # Adult already on treatment

HSI_Tb_StartTreatment

TLOmodel/src/tlo/methods/tb.py

Lines 2175 to 2183 in 5b4f328

@property
def EXPECTED_APPT_FOOTPRINT(self):
"""
Return the expected appt footprint based on whether the HSI has been rescheduled due to unavailable treatment.
"""
if self.number_of_occurrences == 0:
return self.make_appt_footprint({'TBNew': 1})
else:
return self.make_appt_footprint({'PharmDispensing': 1})

HsiBaseVaccine

@property
def EXPECTED_APPT_FOOTPRINT(self):
"""Returns the EPI appointment footprint for this person according to the vaccine:
* tetanus/diphtheria for pregnant women uses 1 EPI appt
* childhood vaccines can occur in bundles at birth, weeks 6, 10 and 14
* measles/rubella always given in one appt (`HSI_MeaslesRubellaVaccine`) in months 9, 15
* hpv given for adolescents uses 1 EPI appt
* if a vaccine is given at the same time as other vaccines, decrease the EPI footprint to 0.5
"""
if self.suppress_footprint:
return self.make_appt_footprint({})
# these vaccines are always given jointly with other childhood vaccines.
# NB. If p["vaccine_schedule"] changes, this would need to be updated.
vaccine_bundle = ['Epi_Childhood_Bcg', 'Epi_Childhood_Opv', 'Epi_Childhood_DtpHibHep', 'Epi_Childhood_Rota',
'Epi_Childhood_Pneumo']
# determine whether this HSI gives a vaccine as part of a vaccine bundle
# if vaccine is in list of vaccine bundles, return EPI footprint 0.5
# all other vaccines use 1 full EPI appt
if self.treatment_id() in vaccine_bundle:
return self.make_appt_footprint({"EPI": 0.5})
else:
return self.make_appt_footprint({"EPI": 1})

I'm assuming this has been implemented like this to allow the expected appointment footprint to be based on the target's properties at the point at which it is run rather than when it is scheduled? Tagging @tbhallett, @BinglingICL and @EvaJanouskova as I think this was originally implemented in #826.

One thing I'm not sure about is how this pattern interacts with this logic in HSI_Event.initialise

if sum(self.BEDDAYS_FOOTPRINT.values()):
self.EXPECTED_APPT_FOOTPRINT = (
health_system.bed_days.add_first_day_inpatient_appts_to_footprint(
self.EXPECTED_APPT_FOOTPRINT
)
)

which reassigns self.EXPECTED_APPT_FOOTPRINT in certain cases, in which case it would end up as a plain (static) attribute rather than a property. As HSI_Event.initialise appears to be called in HealthSystem.schedule_hsi_event, that is at the time the HSI event is scheduled, this means in some cases even when EXPECTED_APPT_FOOTPRINT is defined as a property it will be 'locked-in' as a static attribute with values taken from population dataframe at point at which it is scheduled.

As accessing the population dataframe dynamically when EXPECTED_APPT_FOOTPRINT is a property seems to be a performance bottleneck, and given it does seem to be guaranteed this will actually be done at the point the HSI event is run rather than scheduled, it seems that enforcing that EXPECTED_APPT_FOOTPRINT is an attribute which is contructed when the HSI event is constructed (when scheduling) might be a good idea to both keep the behaviour consistent and allow us to reduce the population dataframes by just doing once on construction rather than each access of EXPECTED_APPT_FOOTPRINT.

@matt-graham matt-graham added question Further information is requested performance labels Sep 26, 2024
@EvaJanouskova
Copy link
Collaborator

@BinglingICL, I think it is okay to set the EXPECTED_APPT_FOOTPRINT with the same logic when HSI is scheduled instead of when it is run, isn't it?

@BinglingICL
Copy link
Collaborator

BinglingICL commented Oct 7, 2024

@BinglingICL, I think it is okay to set the EXPECTED_APPT_FOOTPRINT with the same logic when HSI is scheduled instead of when it is run, isn't it?

Hi Eva, it seems so to me, if you meant the way to assign appt footprint within the property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants