-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] - Define new object organization (Data / Results / Base / Algorithm) #291
base: main
Are you sure you want to change the base?
Conversation
This looks really cool/useful! In the last case you gave, could the super class be SpectralModel? Or what is the benefit of using the base class + the type magic when defining a new class/procedure? For example, the bit where class CustomModelNoRobustA(SpectralModel):
def _robust_ap_fit(self, *args, **kwargs):
return self._simple_ap_fit(*args, **kwargs)
cfm = CustomModelNoRobustAP()
cfm.fit(freqs, powers) |
@ryanhammonds - yeh, the more I think about this, the more I think I over-engineered the last part about redefining the base class lol. What you posted works, and I don't think that redefining base classes as I did adds any real benefit for this kind of changes. Overall, I think simpler approaches to define new objects and overload specific methods are possible, even with version 1.0 (in general, I think). I was having fun digging into redefining base classes and using What I find useful about this change is a) I think it makes the kind of procedure editing we're talking about more obvious in some ways (even if it doesn't necessarily clearly change what is possible), but also I'm thinking about this more prospectively - if we move this direction, |
Cool! I agree with you that this is useful and in the direction of being more flexible. The distinction between the base and model/api is nice. I think it would make sense to standardize or have requirements for new/custom models. Like a new class should implement a |
@ryanhammonds - yeh, fully agreed, on having some kind of standardization / template for new models. I've started to explore in this direction - recent commits look at the idea of a |
BaseModel
objectBase
objects
Base
objects
@ryanhammonds & @voytek - I'm tagging you back in here, since while the theme of this PR is about the same as before, the details of the changes and implementations have changed a lot as I explore. Main details are in the now heavily edited first comment - let me know of any thoughts about this: I'm looking for general reads on whether we like this direction. Also: sorry it's so long - I've been somewhat over-writing the PR descriptions to try and explore the ideas (and hopefully the text here can / will be adapted into docs & release notes as we choose what to include). |
Picking up on this! Overall, I updated the first comment with a now up to date overview / description of the current changes. Recent updates:
For @ryanhammonds & @voytek - I'm now pretty confident that this is the the overall way to go, as I can see a way (that I'm pretty confident works) to finalize the new organization to support variable fit functions and fit algorithms. I haven't merged this yet, pending some broader user tests that nothing goes weird (given how broad the changes here). In terms of review - let me know anything you notice if you try it out / take a quick look - but since this is going to be further updated (mainly - #298), I think if the overall strategy looks good, and doesn't seem to be breaking, we can move on this, and do an overall deep-dive review for the end results after more things are merged in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomDonoghue I took a quick look and here are some (maybe half-baked) thoughts:
Data objects:
An alternative approach may have been to have a single base data class that accepts an arbitrary nd-array, with the power on the last dim, or an axis arg to specify which dim power is along, defaulting to the last dim, then reshape to 2d, fit, then results back to nd. So like input shape (2, 3, 4, 100) to (24, 100), fit, then results back to (2, 3, 4, 100).
Also, the reshape approach would take a 1d and also reshape to (1, n), then fit, then reshape results to original 1d. This would force everything to 2d before fitting to allow easy parallelization and a single interface - removing the need for both SpectralGroupModel and SpectralModel. SpectralModel could handle any nd input, simplifying things on the user.
However, it looks like there is currently specific/custom processing for various dims that may require a non-general solution that I haven't wrapped my head around yet.
Custom procedures:
A custom model I've written in the past is a robust ap fit, followed by peak fitting, removing the simple ap fit, since I found a way to get robust fit to be accurate enough for a particular use case, and dropping the simple fit sped up computation a bit. Doing this seems difficult still, unless I wanted to overwrite the entire ._fit
method? One possibility is:
class CustomAlgorithm(Base):
def _robust_ap_fit(self, *args, **kwargs):
# Custom robust ap fit code
...
def _simple_ap_fit(self, *args, **kwargs):
pass
# In SpectralFitAlgorithm._fit
...
aperiodic_params_ = self._simple_ap_fit(self.freqs, self._spectrum_peak_rm)
if aperiodic_params_ is not None:
self.aperiodic_params_ = aperiodic_params_
...
Currently self.aperiodic_params_
would get updated with None if I overwrite simple_ap_fit, I think. If doing this is possible with the PR, sorry for not digging deeper! Also, maybe this is too specific of a use-case.
[MNT] - Update actions
This is an exploratory PR for re-organizing the objects in
specparam
.New Objects
Data
,Results
, andBase
objectsThis PR defines new sets of objects
Data
andResults
, which define the core properties of managing data and managing model results respectively. These establish the object layouts for spectral models (fitting function and algorithm agnostic). This includes no information about what model to fit or how to fit it - but means any model built on top of this base should always have the same basic API for common and data elements. This also allows for a conceptual distinction between 'Data' and 'Results' elements. There are separate objects for 1D / 2D / 3D data & associated results.Here, we also add a set of
Base
objects, which combineData
andResults
objects, as well as adding any elements that require the combination of data & results. At this point, theBase
object defines the core API / interface for the data & results components, without having an algorithm or any fit functions to fit - establishing a common platform for fitting.Note that none of these objects are really expected to be used independently or accessed / used by the user, and even if defining a new algorithm, etc, should only need to be inherited from without really interacting with. Related, the
Data
objects can be used to manage / check data independently, but theResults
andBase
objects don't actually offer the capacity to do anything without having a defined algorithm.Algorithm Objects
With the
Base
objects, we now need to define an algorithm, with the idea being that a separate object defines the algorithm - and only the algorithm. In doing so, anAlgorithm
object should be aware ofBase
(meaning, it should use and expected the methods and attributes thatBase
define (as a combination ofData
andResults
). If so, then by combining aBase
andAlgorithm
object we get a functional spectral parameterization object, with a common core for managing data & results, that operates with a specified algorithm.In this current PR, the
SpectralFitAlgorithm
object (name could definitely be changed) defines the current algorithm. If this object is used as the Algorithm object, everything should end up the same as 1.0. The idea here is if someone wants to define a new fitting algorithm, then they can define a new Algorithm object, without having to implement any of the underlying basics, and getting the same overall API out at the end.Note that in separating out the algorithm, this PR currently exposes the same set of 'public' settings as before, but now also does allow for passing through updates to 'private' settings (see the
SpectralFitAlgorithm
definition).Model Objects
At this point, we have objects that define the basic data, results, and algorithm features - and just need to combine these into a user facing object, that also defines the user facing API (doing things like printing, plotting, getting results). This is what is now in the
SpectralModel
,SpectralGroupModel
, etc objects (now inmodel
andgroup
files), which, from the user perspectives, should act exactly the same as they did prior to this update.Note - as it stands, the
SpectralModel
object inherits directly from the current Algorithm object, and defines the API on top of that. To put in a new fitting algorithm, one has to redefine the model objects, updated to use a new algorithm object (see outline below) - by using the same Base (& additional model methods), the API should end up identical, with a different internal fit procedure. Further updates should be able to add helpers / shortcuts to redefine model objects with new fit functions, without too much work on the user end to reconstitute the objects.Overall Recap
From the user perspective, nothing should change, objects such as
SpectralModel
should work the same.From a developer / internal perspective, we now have the following object organization:
Data
objects: structured object for managing data (with versions for 1D, 2D, 3D, etc)Results
objects: structured object for managing model fit results (with versions for 1D, 2D, 3D, etc)Base
objects: combinations of Data & Results, plus any additional methods that cover both aspectsAlgorithm
objects: defines a fitting procedureModel
objects: combines base & algorithm objs to create a model object (plus some additional user facing methods)Code Examples
Importing New Base Objects
With the caveat that there is no real use case to using the Base objects directly - there objects can be imported and defined:
Algorithm Objects: Opening Up Access to "private" parameters
One of the original goals of trying some of the updates here was to open up explicit access to the "private" settings, for particular use cases that might want to tweak them, without complicating the Model object for standard usage. This works for that - as before, we can pass through "private" settings into the standard model object, using
**model_kwargs
:Where this is going: defining new fit algorithms
A key goal of this update is to support the generalization of fitting, and while I think this will help with supporting new / different fit functions (updates to come), the more obvious support this adds is that it should support adding a fit procedure relatively easily. At least, as compared to previous organization, I do not think it would be very easy / obvious to define a new fit function while maintaining other more general functionality, whereas as now an interested party should be able to define a brand new
Algorithm
object, and plug it in (inherit from Base / put into API object) and it should work. We can add a bit of guidance of what needs to be followed for this to work - but I think it should be quite tractable.As an outline of what it would look like to add a new fit function, see this (functional, but schematic) example:
Additional notes and examples
Below are some additional notes and examples that are somewhat tangential (the code examples are partly from older explorations, and all still work - but they are more like cool / interesting extras than anything that is actually important in relation to this update.
Having a minimal Model object
In the current version, the Base objects can be joined with an Algorithm object to form an operational minimal fit object, without the additional methods that are defined on the full model objects (basically, without having the public facing API). It is now fairly easy to create a functional 'minimal fit object' - having a functionally complete object for fitting the model, with user facing methods on top of that (reports, plots, fancy parameter access, etc) - though, again, this might have no clear use case.
This code shows how to make a minimal fit object:
Note: although more minimal of an object, the Base object does the exact same as the previous object for fitting, so is no faster. We did chat briefly on a "minimum / speedy" version of the object / fit function that reduces the number of checks, etc that are done. I don't think there's much to do on that - we are pretty efficient on that stuff, and our time is spent fitting the actual model.
Editing a fit procedure
More nuanced than redefining a brand new function, is what was included in the previous version about editing fit approaches (including redefining Base objects). I think the current layout suggests slightly different mechanics to do so, while supporting all the same possibilities.
For example, we can customize an existing Algorithm object to modify a fit:
Interesting result - with basic simulated data, it's less catastrophic a change than I anticipated 🤷:
Before, there was an example showing something similar by retrofitting an existing object with a new Base (now Algorithm) object. If we go in the current direction, the recommended procedure for a brand new algorithm would probably be different - but FWIW, this does still work:
Relevant previous text (for posterity):
The above object has an overloaded fit function, and so prints out the message ("what if... a new fit procedure model!"), but otherwise has everything defined on the normal
SpectralModel
object! This means that be filling out the newfit
function with a new procedure, one could built out new algorithm approaches, using the toolbox, but without needing to edit inside the module.A benefit of moving in this direction (as well as being a bit more modular) is that the explicit isolation of the actual fitting from all the "management" code opens up the possibility for plugging in different fit functions to the same overlying object to interact with the model.
A heads up though - I'm not sure what is shown (relating to redefining underlying objects) is something we want to lean into exposing / promoting for people to try custom fit approaches - but I find it cool, and also it could be useful as a development feature to explore how different fit procedures work.
Thoughts & Conclusions
This update opens up our set of objects, and re-organizes the 'do it all together' approach, into a set of distinct objects, that can be combined, that should allow for developing different fit algorithms, and more easily using different fit functions. The argument from modularity is that this is a much more organized layout, even if the increased number of entities may seem (on the face of it) more complicated. In so far as
specparam
is a framework for spectral parameterization (more than any specific algorithm or function for doing so), I think something like this is probably what we want - but there's definitely room for tweaks and different ideas. As it stands, I think this current change is useful in terms of doing what we want it to do (opening private setting access), and has potentially useful benefits for going forward (opens up more access to defining custom fit procedures).Reviewing
This is still at a pretty high-level stage of deciding on key strategy - so in terms of reviewing, I'm not really looking for detailed review, but rather strategy discussions - so any conceptual / strategic check-ins are more than welcome!