LinearModel: precise output type of predict() fnc #1479

EvaJanouskova · 2024-10-07T22:04:20Z

I'm getting an error, because the predict() fnc does in some cases return numpy.bool instead of pd.Series.

(I've added assert in here to point it out.) Could we change the function to return pd.Series in all cases as declared when the function defined?

tbhallett · 2024-10-08T08:07:09Z

From the doctring, I think you can guanratee return of series by setting 'squeeze_single_row_output' to False.
If this doesn't solve the question, then please raise an issue with a demonstration of what you're doing and the behaviour you expect.
Thanks

matt-graham · 2024-10-08T08:39:17Z

Hi @EvaJanouskova - as @tbhallett said I think the squeeze_single_row_output argument here should allow you to override the default behaviour of the predict method to consistently give a pandas.Series output. As various parts of the code rely on the existing behaviour of treating single row outputs differently from multi row outputs, globally changing to always return a pandas.Series would be quite disruptive.

EvaJanouskova · 2024-10-08T10:29:12Z

Thank you, @tbhallett and @matt-graham, it is working.

Still shouldn't the declaration reflect that it may sometimes return different type? Smt like -> Unit[pd.Series, np.bool]?

EvaJanouskova · 2024-10-08T10:37:41Z

And if rng attribute not set to a NumPy RandomState instance, it will also return model output directly. So I would say -> Unit[pd.Series, np.bool, float]:?

matt-graham · 2024-10-08T11:11:02Z

@EvaJanouskova you're right that the current type annotation is a bit misleading (though this won't have any functional effect) - I think pd.Series | np.bool_ (or equivalently Union[pd.Series, np.bool_] would be more accurate. I don't think it's possible to get out a scalar float as the squeeze_single_row_output=True argument only has an effect if rng is not None in which case a boolean outcome is returned.

EvaJanouskova · 2024-10-08T11:49:55Z

@matt-graham, What I meant is that it returns float if rng is None. Isn't it so?

matt-graham · 2024-10-08T12:09:21Z

@matt-graham, What I meant is that it returns float if rng is None. Isn't it so?

If rng is None then the method will return a pandas.Series object irrespective of the value of squeeze_single_row_output:

TLOmodel/src/tlo/lm.py

Lines 454 to 463 in 8d0cfee

    
           if rng: 
        
               outcome = rng.random_sample(len(result)) < result 
        
               # pop the boolean out of the series if we have a single row, 
        
               # otherwise return the series 
        
               if len(outcome) == 1 and squeeze_single_row_output: 
        
                   return outcome.iloc[0] 
        
               else: 
        
                   return outcome 
        
           else: 
        
               return result

The dtype of the panda.Series object will be a floating point type in this case but the data will be wrapped in a series even if there is only one value

EvaJanouskova · 2024-10-08T13:27:54Z

what about in this case:

TLOmodel/src/tlo/lm.py

Lines 439 to 442 in 8d0cfee

    
           # Ensure result of floating point type even if all predictor coefficients 
        
           # are integer but intercept is floating point 
        
           if isinstance(self.intercept, float) and result.dtype == int: 
        
               result = result.astype(float)

matt-graham · 2024-10-08T14:04:29Z

what about in this case:

TLOmodel/src/tlo/lm.py

Lines 439 to 442 in 8d0cfee

# Ensure result of floating point type even if all predictor coefficients

# are integer but intercept is floating point

if isinstance(self.intercept, float) and result.dtype == int:

result = result.astype(float)

The result object there is still a pandas.Series - the Series.astype method returns another series with the dtype changed to the specified datatype (or potentially the same series if the specified datatype is the same as what is currently set to).

EvaJanouskova · 2024-10-08T17:56:15Z

Ah, yeah. I can see it now, thank you @matt-graham.

This reverts commit 3464e3e.

…t type when False

lm: predict() does not always return pd.Series

3464e3e

EvaJanouskova assigned tamuri and matt-graham Oct 7, 2024

EvaJanouskova changed the title ~~LinearModel: predict() does not always return pd.Series~~ LinearModel: precise output type of predict() fnc Oct 8, 2024

EvaJanouskova added 2 commits October 8, 2024 19:09

Revert "lm: predict() does not always return pd.Series"

03b018c

This reverts commit 3464e3e.

lm: precise output type of predict()

71efb57

EvaJanouskova assigned EvaJanouskova and unassigned tamuri and matt-graham Oct 8, 2024

EvaJanouskova requested a review from matt-graham October 8, 2024 18:10

lm: update description of squeeze_single_row_output by stating outpu…

789b1b5

…t type when False

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearModel: precise output type of predict() fnc #1479

LinearModel: precise output type of predict() fnc #1479

EvaJanouskova commented Oct 7, 2024 •

edited

Loading

tbhallett commented Oct 8, 2024

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024 •

edited

Loading

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

LinearModel: precise output type of predict() fnc #1479

Are you sure you want to change the base?

LinearModel: precise output type of predict() fnc #1479

Conversation

EvaJanouskova commented Oct 7, 2024 • edited Loading

tbhallett commented Oct 8, 2024

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024 • edited Loading

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

matt-graham commented Oct 8, 2024

EvaJanouskova commented Oct 8, 2024

EvaJanouskova commented Oct 7, 2024 •

edited

Loading

EvaJanouskova commented Oct 8, 2024 •

edited

Loading