Skip to content

Commit

Permalink
Maintenance of configs and update README (#229)
Browse files Browse the repository at this point in the history
* Edit pre-commit config to fix missing `wheel` dependency

* Check if problem is macos15

* Update pyproject.toml to match movement

* Update precommit to match movement

* Add precommit CI

* Run CI on  intel macOS and macos-15

* Make new precommits happy

* Make new precommits happy

* Some more pre-commit changes

* Make ruff precommit happy with tests - pending mypy

* Make mypy pass

* Remove sleap comment

* Update readme

* Fix test with typer  and ellipsis in argument

* Remove macOS-15 from CI

* Fixed check-manifest issue

* Update evaluate command description

* Update readme and cli help

* Change cli of detect+track to better match the other entry points. Simplify structure of outputs.

* Update readme of detect+track to reflect current status

* Fix test on track video CLI
  • Loading branch information
sfmig authored Oct 29, 2024
1 parent 5d58d85 commit 7105c4c
Show file tree
Hide file tree
Showing 43 changed files with 1,001 additions and 716 deletions.
6 changes: 4 additions & 2 deletions .github/workflows/test_and_deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,11 @@ jobs:
# Run all supported Python versions on linux
os: [ubuntu-latest]
python-version: ["3.9", "3.10"]
# Include one macos run
# Include 1 Intel macos (13) and 1 M1 macos (latest)
include:
- os: macos-latest
- os: macos-13 # intel macOS
python-version: "3.10"
- os: macos-latest # M1 macOS
python-version: "3.10"
steps:
- uses: neuroinformatics-unit/actions/test@v2
Expand Down
101 changes: 65 additions & 36 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,37 +1,66 @@
# exclude: 'conf.py' --- relevant for docs
# Configuring https://pre-commit.ci/
ci:
autoupdate_schedule: monthly
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v3.0.0-alpha.9-for-vscode
hooks:
- id: prettier
args: [--ignore-path=guides/CorrectingTrackLabellingSteps.md]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-docstring-first
# - id: check-executables-have-shebangs TODO: fix later
- id: check-merge-conflict
- id: check-toml
- id: end-of-file-fixer
- id: mixed-line-ending
args: [--fix=lf]
- id: trailing-whitespace
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.280
hooks:
- id: ruff
- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.3.0
hooks:
- id: mypy
additional_dependencies:
- types-setuptools
- repo: https://github.com/mgedmin/check-manifest
rev: "0.49"
hooks:
- id: check-manifest
args: [--no-build-isolation]
additional_dependencies: [setuptools-scm]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-added-large-files
- id: check-docstring-first
- id: check-executables-have-shebangs
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
- id: check-toml
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
args: [--fix=lf]
- id: name-tests-test
args: ["--pytest-test-first"]
exclude: ^tests/fixtures
- id: requirements-txt-fixer
- id: trailing-whitespace
# - repo: https://github.com/pre-commit/pygrep-hooks
# rev: v1.10.0
# hooks:
# - id: rst-backticks
# - id: rst-directive-colons
# - id: rst-inline-touching-normal
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
- id: mypy
additional_dependencies:
- attrs
- types-setuptools
- pandas-stubs
- types-attrs
- types-PyYAML
- types-requests
- repo: https://github.com/mgedmin/check-manifest
rev: "0.49"
hooks:
- id: check-manifest
args: [--no-build-isolation]
additional_dependencies: [setuptools-scm]
# - repo: https://github.com/codespell-project/codespell
# # Configuration for codespell is in pyproject.toml
# rev: v2.3.0
# hooks:
# - id: codespell
# additional_dependencies:
# # tomli dependency can be removed when we drop support for Python 3.10
# - tomli
exclude: |
(?x)(
^notebooks/|
^tests/data/
)
158 changes: 122 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,76 +12,162 @@ A toolkit for detecting and tracking crabs in the field.

<!-- Any tools or versions of languages needed to run code. For example specific Python or Node versions. Minimum hardware requirements also go here. -->

requires Python 3.9 or 3.10 or 3.11.
`crabs` uses neural networks to detect and track multiple crabs in the field. The detection model is based on the [Faster R-CNN](https://arxiv.org/abs/1506.01497) architecture. The tracking model is based on the [SORT](https://github.com/abewley/sort) tracking algorithm.

The package supports Python 3.9 or 3.10, and is tested on Linux and MacOS.

We highly recommend running `crabs` on a machine with a dedicated graphics device, such as an NVIDIA GPU or an Apple M1+ chip.


### Installation

<!-- How to build or install the application. -->
#### Users
To install the `crabs` package, first clone this git repository.
```bash
git clone https://github.com/SainsburyWellcomeCentre/crabs-exploration.git
```

### Data Structure
Then, navigate to the root directory of the repository and install the `crabs` package in a conda environment:

We assume the following structure for the dataset directory:
```bash
conda create -n crabs-env python=3.10 -y
conda activate crabs-env
pip install .
```

#### Developers
For development, we recommend installing the package in editable mode and with additional `dev` dependencies:

```bash
pip install -e .[dev] # or ".[dev]" if you are using zsh
```
|_ Dataset
|_ frames
|_ annotations
|_ VIA_JSON_combined_coco_gen.json

### CrabsField - Sept2023 dataset

We trained the detector model on our [CrabsField - Sept2023](https://gin.g-node.org/SainsburyWellcomeCentre/CrabsField) dataset. The dataset consists of 53041 annotations (bounding boxes) over 544 frames extracted from 28 videos of crabs in the field.

The dataset is currently private. If you have access to the [GIN](https://gin.g-node.org/) repository, you can download the dataset using the GIN CLI tool. To set up the GIN CLI tool:
1. Create [a GIN account](https://gin.g-node.org/user/sign_up).
2. [Download GIN CLI](https://gin.g-node.org/G-Node/Info/wiki/GIN+CLI+Setup#setup-gin-client) and set it up by running:
```
$ gin login
```
You will be prompted for your GIN username and password.
3. Confirm that everything is working properly by typing:
```
$ gin --version
```

Then to download the dataset, run the following command from the directory you want the data to be in:
```
gin get SainsburyWellcomeCentre/CrabsField
```
This command will clone the data repository to the current working directory, and download the large files in the dataset as lightweight placeholder files. To download the content of these placeholder files, run:
```
gin download --content
```
Because the large files in the dataset are **locked**, this command will download the content to the git annex subdirectory, and turn the placeholder files in the working directory into symlinks that point to that content. For more information on how to work with a GIN repository, see the corresponding [NIU HowTo guide](https://howto.neuroinformatics.dev/open_science/GIN-repositories.html).

The default name assumed for the annotations file is `VIA_JSON_combined_coco_gen.json`. This is used if no input files are passed. Other filenames (or fullpaths) can be passed with the `--annotation_files` command-line argument.
## Basic commands

### Running Locally
### Train a detector

For training
To train a detector on an existing dataset, run the following command:

```bash
python train-detector --dataset_dirs {parent_directory_of_frames_and_annotation} {optional_second_parent_directory_of_frames_and_annotation} --annotation_files {path_to_annotation_file.json} {path_to_optional_second_annotation_file.json}
```
train-detector --dataset_dirs <list-of-dataset-directories>
```

Example (using default annotation file and one dataset):
This command assumes each dataset directory has the following structure:

```bash
python train-detector --dataset_dirs /home/data/dataset1
```
dataset
|_ frames
|_ annotations
|_ VIA_JSON_combined_coco_gen.json
```

Example (passing the full path of the annotation file):
The default name assumed for the annotations file is `VIA_JSON_combined_coco_gen.json`. Other filenames (or full paths to annotation files) can be passed with the `--annotation_files` command-line argument.

```bash
python train-detector --dataset_dirs /home/data/dataset1 --annotation_files /home/user/annotations/annotations42.json
To see the full list of possible arguments to the `train-detector` command run:
```
train-detector --help
```

Example (passing several datasets with annotation filenames different from the default):
### Monitor a training job

We use [MLflow](https://mlflow.org) to monitor the training of the detector and log the hyperparameters used.

To run MLflow, execute the following command from your `crabs-env` conda environment:

```bash
python train-detector --dataset_dirs /home/data/dataset1 /home/data/dataset2 --annotation_files annotation_dataset1.json annotation_dataset2.json
```
mlflow ui --backend-store-uri file:///<path-to-ml-runs>
```

For evaluation
Replace `<path-to-ml-runs>` with the path to the directory where the MLflow output is. By default, the output is placed in an `ml-runs` folder under the directory from which the `train-detector` is launched.

```bash
python evaluate-detector --model_dir {directory_to_saved_model} --images_dirs {parent_directory_of_frames_and_annotation} {optional_second_parent_directory_of_frames_and_annotation} --annotation_files {annotation_file.json} {optional_second_annotation_file.json}
In the MLflow browser-based user-interface, you can find the path to the checkpoints directory for any run, under the `path_to_checkpoints` parameter. This will be useful to evaluate the trained model. The model saved at the end of the training job is saved as `last.ckpt` in the `path_to_checkpoints` directory.

### Evaluate a detector

To evaluate a trained detector on the test split of the dataset, run the following command:

```
evaluate-detector --trained_model_path <path-to-ckpt-file>
```

Example:
This command assumes the trained detector model (a `.ckpt` checkpoint file) is saved in an MLflow database structure. That is, the checkpoint is assumed to be under a `checkpoints` directory, which in turn should be under a `<mlflow-experiment-hash>/<mlflow-run-hash>` directory. This will be the case if the model has been trained using the `train-detector` command.

```bash
python evaluate-detector --model_dir model/model_00.pt --main_dir /home/data/dataset1/frames /home/data/dataset2/frames --annotation_files /home/data/dataset1/annotations/annotation_dataset1.json /home/data/dataset2/annotations/annotation_dataset2.json
The `evaluate-detector` command will print to screen the average precision and average recall of the detector on the test set. It will also log those metrics to the MLflow database, along with the hyperparameters of the evaluation job. To visualise the MLflow summary of the evaluation job, run:
```
mlflow ui --backend-store-uri file:///<path-to-ml-runs>
```
where `<path-to-ml-runs>` is the path to the directory where the MLflow output is.

For running inference
To see the full list of possible arguments to the `evaluate-detector` command, run it with the `--help` flag.

### Run detector+tracking on a video

To track crabs in a new video, using a trained detector and a tracker, run the following command:

```bash
python crabs/detection_tracking/inference_model.py --model_dir {oath_to_trained_model} --vid_path {path_to_input_video}
```
detect-and-track-video --trained_model_path <path-to-ckpt-file> --video_path <path-to-input-video>
```

This will produce a `tracking_output_<timestamp>` directory with the output from tracking.

The tracking output consists of:
- a .csv file named `<video-name>_tracks.csv`, with the tracked bounding boxes data;
- if the flag `--save_video` is added to the command: a video file named `<video-name>_tracks.mp4`, with the tracked bounding boxes;
- if the flag `--save_frames` is added to the command: a subdirectory named `<video_name>_frames` is created, and the video frames are saved in it.

The .csv file with tracked bounding boxes can be imported in [movement](https://github.com/neuroinformatics-unit/movement) for further analysis. See the [movement documentation](https://movement.neuroinformatics.dev/getting_started/input_output.html#loading-bounding-boxes-tracks) for more details.

Note that when using `--save_frames`, the frames of the video are saved as-is, without added bounding boxes. The aim is to support the visualisation and correction of the predictions using the [VGG Image Annotator (VIA)](https://www.robots.ox.ac.uk/~vgg/software/via/) tool. To do so, follow the instructions of the [VIA Face track annotation tutorial](https://www.robots.ox.ac.uk/~vgg/software/via/docs/face_track_annotation.html).

If a file with ground-truth annotations is passed to the command (with the `--annotations_file` flag), the MOTA metric for evaluating tracking is computed and printed to screen.

### MLFLow
<!-- When used in combination with the `--save_video` flag, the tracked video will contain predicted bounding boxes in red, and ground-truth bounding boxes in green. -- PR 216-->

We are using [MLflow](https://mlflow.org) to log our training loss and the hyperparameters used.
To run MLflow, execute the following command in your terminal:
To see the full list of possible arguments to the `evaluate-detector` command, run it with the `--help` flag.



<!-- ### Evaluate the tracking performance
To evaluate the tracking performance of a trained detector + tracker, run the following command:
```
mlflow ui --backend-store-uri file:///<path-to-ml-runs>
evaluate-tracking ...
```
Replace `<path-to-ml-runs>` with the path to the directory where you want to store the MLflow output. By default, it's an `ml-runs` directory under the current working directory.
We currently only support the SORT tracker, and the evaluation is based on the MOTA metric. -->

<!-- # Other common workflows -->
<!-- [TODO: add separate guides for this? eventually make into sphinx docs?] -->
<!-- - Prepare data for training a detector -->
<!-- - Extract frames from videos -->
<!-- - Annotate the frames with bounding boxes -->
<!-- - Combine several annotation files into a single file -->
<!-- - Retrain a detector on an extended dataset -->
<!-- - Prepare data for labelling ground truth for tracking -->
2 changes: 2 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
"""Pytest configuration file."""

pytest_plugins = [
"tests.fixtures.frame_extraction",
]
Loading

0 comments on commit 7105c4c

Please sign in to comment.