Maintenance of configs and update README (#229)

* Edit pre-commit config to fix missing `wheel` dependency * Check if problem is macos15 * Update pyproject.toml to match movement * Update precommit to match movement * Add precommit CI * Run CI on intel macOS and macos-15 * Make new precommits happy * Make new precommits happy * Some more pre-commit changes * Make ruff precommit happy with tests - pending mypy * Make mypy pass * Remove sleap comment * Update readme * Fix test with typer and ellipsis in argument * Remove macOS-15 from CI * Fixed check-manifest issue * Update evaluate command description * Update readme and cli help * Change cli of detect+track to better match the other entry points. Simplify structure of outputs. * Update readme of detect+track to reflect current status * Fix test on track video CLI
SainsburyWellcomeCentre · Oct 29, 2024 · 7105c4c · 7105c4c
1 parent 5d58d85
commit 7105c4c
Show file tree

Hide file tree

Showing 43 changed files with 1,001 additions and 716 deletions.
diff --git a/.github/workflows/test_and_deploy.yml b/.github/workflows/test_and_deploy.yml
@@ -30,9 +30,11 @@ jobs:
         # Run all supported Python versions on linux
         os: [ubuntu-latest]
         python-version: ["3.9", "3.10"]
-        # Include one macos run
+        # Include 1 Intel macos (13) and 1 M1 macos (latest)
         include:
-          - os: macos-latest
+          - os: macos-13  # intel macOS
+            python-version: "3.10"
+          - os: macos-latest  # M1 macOS
             python-version: "3.10"
     steps:
       - uses: neuroinformatics-unit/actions/test@v2

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,37 +1,66 @@
+# exclude: 'conf.py' --- relevant for docs
+# Configuring https://pre-commit.ci/
+ci:
+    autoupdate_schedule: monthly
 repos:
-  - repo: https://github.com/pre-commit/mirrors-prettier
-    rev: v3.0.0-alpha.9-for-vscode
-    hooks:
-      - id: prettier
-        args: [--ignore-path=guides/CorrectingTrackLabellingSteps.md]
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.4.0
-    hooks:
-      - id: check-docstring-first
-      # - id: check-executables-have-shebangs TODO: fix later
-      - id: check-merge-conflict
-      - id: check-toml
-      - id: end-of-file-fixer
-      - id: mixed-line-ending
-        args: [--fix=lf]
-      - id: trailing-whitespace
-  - repo: https://github.com/charliermarsh/ruff-pre-commit
-    rev: v0.0.280
-    hooks:
-      - id: ruff
-  - repo: https://github.com/psf/black
-    rev: 23.7.0
-    hooks:
-      - id: black
-  - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.3.0
-    hooks:
-      - id: mypy
-        additional_dependencies:
-          - types-setuptools
-  - repo: https://github.com/mgedmin/check-manifest
-    rev: "0.49"
-    hooks:
-      - id: check-manifest
-        args: [--no-build-isolation]
-        additional_dependencies: [setuptools-scm]
+    - repo: https://github.com/pre-commit/pre-commit-hooks
+      rev: v5.0.0
+      hooks:
+          - id: check-added-large-files
+          - id: check-docstring-first
+          - id: check-executables-have-shebangs
+          - id: check-case-conflict
+          - id: check-merge-conflict
+          - id: check-symlinks
+          - id: check-yaml
+          - id: check-toml
+          - id: debug-statements
+          - id: end-of-file-fixer
+          - id: mixed-line-ending
+            args: [--fix=lf]
+          - id: name-tests-test
+            args: ["--pytest-test-first"]
+            exclude: ^tests/fixtures
+          - id: requirements-txt-fixer
+          - id: trailing-whitespace
+    # - repo: https://github.com/pre-commit/pygrep-hooks
+    #   rev: v1.10.0
+    #   hooks:
+    #     - id: rst-backticks
+    #     - id: rst-directive-colons
+    #     - id: rst-inline-touching-normal
+    - repo: https://github.com/astral-sh/ruff-pre-commit
+      rev: v0.6.9
+      hooks:
+        - id: ruff
+        - id: ruff-format
+    - repo: https://github.com/pre-commit/mirrors-mypy
+      rev: v1.11.2
+      hooks:
+          - id: mypy
+            additional_dependencies:
+                - attrs
+                - types-setuptools
+                - pandas-stubs
+                - types-attrs
+                - types-PyYAML
+                - types-requests
+    - repo: https://github.com/mgedmin/check-manifest
+      rev: "0.49"
+      hooks:
+          - id: check-manifest
+            args: [--no-build-isolation]
+            additional_dependencies: [setuptools-scm]
+    # - repo: https://github.com/codespell-project/codespell
+    #   # Configuration for codespell is in pyproject.toml
+    #   rev: v2.3.0
+    #   hooks:
+    #   - id: codespell
+    #     additional_dependencies:
+    #     # tomli dependency can be removed when we drop support for Python 3.10
+    #     - tomli
+exclude: |
+  (?x)(
+      ^notebooks/|
+      ^tests/data/
+  )
diff --git a/README.md b/README.md
@@ -12,76 +12,162 @@ A toolkit for detecting and tracking crabs in the field.
 
 <!-- Any tools or versions of languages needed to run code. For example specific Python or Node versions. Minimum hardware requirements also go here. -->
 
-requires Python 3.9 or 3.10 or 3.11.
+`crabs` uses neural networks to detect and track multiple crabs in the field. The detection model is based on the [Faster R-CNN](https://arxiv.org/abs/1506.01497) architecture. The tracking model is based on the [SORT](https://github.com/abewley/sort) tracking algorithm.
+
+The package supports Python 3.9 or 3.10, and is tested on Linux and MacOS.
+
+We highly recommend running `crabs` on a machine with a dedicated graphics device, such as an NVIDIA GPU or an Apple M1+ chip.
+
 
 ### Installation
 
-<!-- How to build or install the application. -->
+#### Users
+To install the `crabs` package, first clone this git repository.
+```bash
+git clone https://github.com/SainsburyWellcomeCentre/crabs-exploration.git
+```
 
-### Data Structure
+Then, navigate to the root directory of the repository and install the `crabs` package in a conda environment:
 
-We assume the following structure for the dataset directory:
+```bash
+conda create -n crabs-env python=3.10 -y
+conda activate crabs-env
+pip install .
+```
 
+#### Developers
+For development, we recommend installing the package in editable mode and with additional `dev` dependencies:
+
+```bash
+pip install -e .[dev]  # or ".[dev]" if you are using zsh
 ```
-|_ Dataset
-    |_ frames
-    |_ annotations
-        |_ VIA_JSON_combined_coco_gen.json
+
+### CrabsField - Sept2023 dataset
+
+We trained the detector model on our [CrabsField - Sept2023](https://gin.g-node.org/SainsburyWellcomeCentre/CrabsField) dataset. The dataset consists of 53041 annotations (bounding boxes) over 544 frames extracted from 28 videos of crabs in the field.
+
+The dataset is currently private. If you have access to the [GIN](https://gin.g-node.org/) repository, you can download the dataset using the GIN CLI tool. To set up the GIN CLI tool:
+1. Create [a GIN account](https://gin.g-node.org/user/sign_up).
+2. [Download GIN CLI](https://gin.g-node.org/G-Node/Info/wiki/GIN+CLI+Setup#setup-gin-client) and set it up by running:
+   ```
+   $ gin login
+   ```
+   You will be prompted for your GIN username and password.
+3. Confirm that everything is working properly by typing:
+   ```
+   $ gin --version
+   ```
+
+Then to download the dataset, run the following command from the directory you want the data to be in:
 ```
+gin get SainsburyWellcomeCentre/CrabsField
+```
+This command will clone the data repository to the current working directory, and download the large files in the dataset as lightweight placeholder files. To download the content of these placeholder files, run:
+```
+gin download --content
+```
+Because the large files in the dataset are **locked**, this command will download the content to the git annex subdirectory, and turn the placeholder files in the working directory into symlinks that point to that content. For more information on how to work with a GIN repository, see the corresponding [NIU HowTo guide](https://howto.neuroinformatics.dev/open_science/GIN-repositories.html).
 
-The default name assumed for the annotations file is `VIA_JSON_combined_coco_gen.json`. This is used if no input files are passed. Other filenames (or fullpaths) can be passed with the `--annotation_files` command-line argument.
+## Basic commands
 
-### Running Locally
+### Train a detector
 
-For training
+To train a detector on an existing dataset, run the following command:
 
-```bash
-python train-detector --dataset_dirs {parent_directory_of_frames_and_annotation} {optional_second_parent_directory_of_frames_and_annotation} --annotation_files {path_to_annotation_file.json} {path_to_optional_second_annotation_file.json}
+```
+train-detector --dataset_dirs <list-of-dataset-directories>
 ```
 
-Example (using default annotation file and one dataset):
+This command assumes each dataset directory has the following structure:
 
-```bash
-python train-detector --dataset_dirs /home/data/dataset1
+```
+dataset
+|_ frames
+|_ annotations
+    |_ VIA_JSON_combined_coco_gen.json
 ```
 
-Example (passing the full path of the annotation file):
+The default name assumed for the annotations file is `VIA_JSON_combined_coco_gen.json`. Other filenames (or full paths to annotation files) can be passed with the `--annotation_files` command-line argument.
 
-```bash
-python train-detector --dataset_dirs /home/data/dataset1 --annotation_files /home/user/annotations/annotations42.json
+To see the full list of possible arguments to the `train-detector` command run:
+```
+train-detector --help
 ```
 
-Example (passing several datasets with annotation filenames different from the default):
+### Monitor a training job
+
+We use [MLflow](https://mlflow.org) to monitor the training of the detector and log the hyperparameters used.
+
+To run MLflow, execute the following command from your `crabs-env` conda environment:
 
-```bash
-python train-detector --dataset_dirs /home/data/dataset1 /home/data/dataset2 --annotation_files annotation_dataset1.json annotation_dataset2.json
+```
+mlflow ui --backend-store-uri file:///<path-to-ml-runs>
 ```
 
-For evaluation
+Replace `<path-to-ml-runs>` with the path to the directory where the MLflow output is. By default, the output is placed in an `ml-runs` folder under the directory from which the `train-detector` is launched.
 
-```bash
-python evaluate-detector --model_dir {directory_to_saved_model} --images_dirs {parent_directory_of_frames_and_annotation} {optional_second_parent_directory_of_frames_and_annotation} --annotation_files {annotation_file.json} {optional_second_annotation_file.json}
+In the MLflow browser-based user-interface, you can find the path to the checkpoints directory for any run, under the `path_to_checkpoints` parameter. This will be useful to evaluate the trained model. The model saved at the end of the training job is saved as `last.ckpt` in the `path_to_checkpoints` directory.
+
+### Evaluate a detector
+
+To evaluate a trained detector on the test split of the dataset, run the following command:
+
+```
+evaluate-detector --trained_model_path <path-to-ckpt-file>
 ```
 
-Example:
+This command assumes the trained detector model (a `.ckpt` checkpoint file) is saved in an MLflow database structure. That is, the checkpoint is assumed to be under a `checkpoints` directory, which in turn should be under a `<mlflow-experiment-hash>/<mlflow-run-hash>` directory. This will be the case if the model has been trained using the `train-detector` command.
 
-```bash
-python evaluate-detector --model_dir model/model_00.pt --main_dir /home/data/dataset1/frames /home/data/dataset2/frames --annotation_files /home/data/dataset1/annotations/annotation_dataset1.json /home/data/dataset2/annotations/annotation_dataset2.json
+The `evaluate-detector` command will print to screen the average precision and average recall of the detector on the test set. It will also log those metrics to the MLflow database, along with the hyperparameters of the evaluation job. To visualise the MLflow summary of the evaluation job, run:
+```
+mlflow ui --backend-store-uri file:///<path-to-ml-runs>
 ```
+where `<path-to-ml-runs>` is the path to the directory where the MLflow output is.
 
-For running inference
+To see the full list of possible arguments to the `evaluate-detector` command, run it with the `--help` flag.
+
+### Run detector+tracking on a video
+
+To track crabs in a new video, using a trained detector and a tracker, run the following command:
 
-```bash
-python crabs/detection_tracking/inference_model.py --model_dir {oath_to_trained_model} --vid_path {path_to_input_video}
 ```
+detect-and-track-video --trained_model_path <path-to-ckpt-file> --video_path <path-to-input-video>
+```
+
+This will produce a `tracking_output_<timestamp>` directory with the output from tracking.
+
+The tracking output consists of:
+- a .csv file named `<video-name>_tracks.csv`, with the tracked bounding boxes data;
+- if the flag `--save_video` is added to the command: a video file named `<video-name>_tracks.mp4`, with the tracked bounding boxes;
+- if the flag `--save_frames` is added to the command: a subdirectory named `<video_name>_frames` is created, and the video frames are saved in it.
+
+The .csv file with tracked bounding boxes can be imported in [movement](https://github.com/neuroinformatics-unit/movement) for further analysis. See the [movement documentation](https://movement.neuroinformatics.dev/getting_started/input_output.html#loading-bounding-boxes-tracks) for more details.
+
+Note that when using `--save_frames`, the frames of the video are saved as-is, without added bounding boxes. The aim is to support the visualisation and correction of the predictions using the [VGG Image Annotator (VIA)](https://www.robots.ox.ac.uk/~vgg/software/via/) tool. To do so, follow the instructions of the [VIA Face track annotation tutorial](https://www.robots.ox.ac.uk/~vgg/software/via/docs/face_track_annotation.html).
+
+If a file with ground-truth annotations is passed to the command (with the `--annotations_file` flag), the MOTA metric for evaluating tracking is computed and printed to screen.
 
-### MLFLow
+<!-- When used in combination with the `--save_video` flag, the tracked video will contain predicted bounding boxes in red, and ground-truth bounding boxes in green. -- PR 216-->
 
-We are using [MLflow](https://mlflow.org) to log our training loss and the hyperparameters used.
-To run MLflow, execute the following command in your terminal:
+To see the full list of possible arguments to the `evaluate-detector` command, run it with the `--help` flag.
+
+
+
+<!-- ### Evaluate the tracking performance
+
+To evaluate the tracking performance of a trained detector + tracker, run the following command:
 
 ```
-mlflow ui --backend-store-uri file:///<path-to-ml-runs>
+evaluate-tracking ...
 ```
 
-Replace `<path-to-ml-runs>` with the path to the directory where you want to store the MLflow output. By default, it's an `ml-runs` directory under the current working directory.
+We currently only support the SORT tracker, and the evaluation is based on the MOTA metric. -->
+
+<!-- # Other common workflows -->
+<!-- [TODO: add separate guides for this? eventually make into sphinx docs?] -->
+<!-- - Prepare data for training a detector -->
+  <!-- - Extract frames from videos -->
+  <!-- - Annotate the frames with bounding boxes -->
+  <!-- - Combine several annotation files into a single file -->
+<!-- - Retrain a detector on an extended dataset -->
+<!-- - Prepare data for labelling ground truth for tracking -->
diff --git a/conftest.py b/conftest.py
@@ -1,3 +1,5 @@
+"""Pytest configuration file."""
+
 pytest_plugins = [
     "tests.fixtures.frame_extraction",
 ]