Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial refactor / restructure of the codebase #73

Closed
sfmig opened this issue Oct 31, 2023 · 3 comments
Closed

Initial refactor / restructure of the codebase #73

sfmig opened this issue Oct 31, 2023 · 3 comments
Assignees

Comments

@sfmig
Copy link
Collaborator

sfmig commented Oct 31, 2023

The codebase is getting a bit wild 🐆 , and I think some steps are consolidated enough now to make them a bit more established.

Roughly the pipeline would involve the following steps:

  • Labelling prep: frames extraction and video reencoding. Maybe also checking the parameters of the raw videos (frame rate, image size etc)
  • Labels postprocessing: combine existing labels and putting them in the correct format (COCO)
  • Model training
  • Evaluation
  • Inference
    The last three steps are less well defined at this point.

@nikk-nikaznan and I chatted a bit today and some ideas came up:

  • one option could be to follow a more functional programming approach: we have config files for each of the pipeline steps holding the main parameters, and scripts that take these config files as CLI arguments. The the steps are run using bash scripts. I am not very keen on more config files but I think this would be the easiest to transform to atm. This is sort of what we are doing now with the frame extraction step.

  • Would this functional programming option be similar to the chain of responsibility @samcunliffe suggested?

  • Another option would be a more OOP approach, I was thinking maybe similar to SLEAP's pipelines?

  • Nik and Matt also suggested having a look at DVC - seems very well suited for ML but still flexible, might be a good investment to learn about it.

Any thoughts more than welcome, happy to discuss further at our next gemba.

@samcunliffe
Copy link
Member

samcunliffe commented Nov 5, 2023

Spoke a bit about this on Thursday (@nikk-nikaznan).

My CoR-like idea is something closer to SLEAP's. Although their objects don't have an inheritance structure. It'd be something in code on our side. I was thinking something like:

class PipelineStep(abc.ABC):
    @abc.abstractmethod
    def run():
        pass

class VideoConversionStep(PipelineStep):
    def run():
        # run the actual encoder code
        self.encoder.encode(list_of_videos)

class FrameExtractionStep(PipelineStep):
    def run():
        ...


class Pipeline:
    def __init__(steps: list[PipelineStep]):
        self.steps = steps

    def run():
        for step in steps:
            output = step.run()
            if not output.is_ok():
                raise OhDearSomethingWentWrongError()

Other options we discussed:

My kneejerk would be to try snakemake first.

@sfmig
Copy link
Collaborator Author

sfmig commented Nov 16, 2023

This was referenced Nov 16, 2023
@sfmig
Copy link
Collaborator Author

sfmig commented Nov 20, 2023

After first chunk of light refactoring (#86), we merged the new structure to main, and then went on to close currently open PRs:

@samcunliffe samcunliffe changed the title Refactor / restructure codebase Initial refactor / restructure codebase Feb 19, 2024
@samcunliffe samcunliffe changed the title Initial refactor / restructure codebase Initial refactor / restructure of the codebase Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants