Contributors to WAZP are absolutely encouraged, whether to fix a bug, develop a new feature, or improve the documentation. If you're unsure about any part of the contributing process, please get in touch. It's best to reach out in public, e.g. by opening an issue so that others can benefit from the discussion.
It is recommended to use conda to install a development environment for
WAZP. Once you have conda
installed, the following commands
will create and activate a conda
environment with the requirements needed
for a development environment:
conda create -n wazp-dev -c conda-forge python=3 pytables
conda activate wazp-dev
This installs packages that often can't be installed via pip
, including
hdf5.
To install WAZP for development, clone the GitHub repository, and then run from inside the repository:
pip install -e '.[dev]'
This will install the package, its dependencies, and its development dependencies.
In all cases, please submit code to the main repository via a pull request. We recommend, and adhere, to the following conventions:
- Please submit draft pull requests as early as possible to allow for discussion.
- One approval of a PR (by a repo owner) is enough for it to be merged.
- Unless someone approves the PR with optional comments, the PR is immediately merged by the approving reviewer.
- Ask for a review from someone specific if you think they would be a particularly suited reviewer
Running pre-commit install
will set up pre-commit hooks to ensure a consistent formatting style. Currently, these are:
- ruff does a number of jobs, including enforcing PEP8 and sorting imports
- black for auto-formatting
- mypy as a static type checker
These will prevent code from being committed if any of these hooks fail. To run them individually (from the root of the repository), you can use:
ruff .
black ./
mypy -p wazp
To run all the hooks before committing:
pre-commit run # for staged files
pre-commit run -a # for all files in the repository
We use pytest for testing, and our integration tests require Google chrome or chromium and a compatible chromedriver
.
Please try to ensure that all functions are tested, including both unit and integration tests.
Write your test methods and classes in the test
folder.
The integration tests start a server and browse with chrome(ium),
so you will need to download and install Google chrome or chromium (if you don't already use one of them).
You will then need to download a compatible version of chromedriver
.
Depending on your OS you may also need to trust the executable.
Ubuntu
Installing chromium and chromedriver is a one-liner (tested in Ubuntu 20.04 and 22.04).
sudo apt install chromium-chromedriver
pytest # in the root of the repository
MacOS
There is also a [homebrew cask](https://formulae.brew.sh/cask/chromedriver) for `chromedriver` so instead of going to the web and downloading you should be able to:brew install chromedriver
brew info chromedriver
And take note of the installation path.
(It's probably something like /opt/homebrew/Caskroom/chromedriver/<version>
).
However you obtained chomedriver
, you can trust the executable via the security settings and/or keychain GUI or just:
cd /place/where/your/chromedriver/is
xattr -d com.apple.quarantine chromedriver
Once downloaded, make sure the chromedriver
binary in your PATH
and check that you can run the integration tests.
export PATH=$PATH:/place/where/your/chromedriver/is/chromedriver
pytest # in the root of the repository
Windows
For Windows, be sure to download the chromedriver_win32.zip
file, extract the executable, and it's probably easiest to simply place it in the directory where you want to run pytest
.
It's a good idea to test locally before pushing. Pytest will run all tests and also report test coverage.
For some tests, you will need to use real experimental data. We store some sample projects in an external data repository. See sample projects for more information.
All pushes and pull requests will be built by GitHub actions. This will usually include linting, testing and deployment.
A GitHub actions workflow (.github/workflows/test_and_deploy.yml
) has been set up to run (on each commit/PR):
- Linting checks (pre-commit).
- Testing (only if linting checks pass)
- Release to PyPI (only if a git tag is present and if tests pass). Requires
TWINE_API_KEY
from PyPI to be set in repository secrets.
We use semantic versioning, which includes MAJOR
.MINOR
.PATCH
version numbers:
- PATCH = small bugfix
- MINOR = new feature
- MAJOR = breaking change
We use setuptools_scm
to automatically version WAZP. It has been pre-configured in the pyproject.toml
file. setuptools_scm
will automatically infer the version using git. To manually set a new semantic version, create a tag and make sure the tag is pushed to GitHub. Make sure you commit any changes you wish to be included in this version. E.g. to bump the version to 1.0.0
:
git add .
git commit -m "Add new changes"
git tag -a v1.0.0 -m "Bump to version 1.0.0"
git push --follow-tags
Pushing a tag to GitHub triggers the package's deployment to PyPI. The version number is automatically determined from the latest tag on the main
branch.
The documentation is hosted via GitHub pages at sainsburywellcomecentre.github.io/WAZP/. Its source files are located in the docs
folder of this repository.
They are written in either reStructuredText or markdown.
The index.rst
file corresponds to the main page of the documentation website. Other .rst
or .md
files are included in the main page via the toctree
directive.
We use Sphinx and the PyData Sphinx Theme to build the source files into html output. This is handled by a GitHub actions workflow (.github/workflows/publish_docs.yml
) which is triggerred whenever changes are pushed to the main
branch. The workflow builds the html output files and sends them to a gh-pages
branch.
To edit the documentation, first clone the repository, and install WAZP
in a development environment (see instructions above).
Now open a new branch, edit the documentation source files (.md
or .rst
in the docs
folder), and commit your changes. Submit your documentation changes via a pull request, following the same guidelines as for code changes (see pull requests).
If you create a new documentation source file (e.g. my_new_file.md
or my_new_file.rst
), you will need to add it to the toctree
directive in index.rst
for it to be included in the documentation website:
.. toctree::
:maxdepth: 2
existing_file
my_new_file
We recommend that you build and view the documentation website locally, before you push it. To do so, first install the requirements for building the documentation:
pip install -r docs/requirements.txt
Then, from the root of the repository, run:
sphinx-build docs/source docs/build
You can view the local build by opening docs/build/index.html
in a browser.
To refresh the documentation, after making changes, remove the docs/build
folder and re-run the above command:
rm -rf docs/build
sphinx-build docs/source docs/build
We maintain some sample WAZP projects to be used for testing, examples and tutorials on an external data repository. Our hosting platform of choice is called GIN and is maintained by the German Neuroinformatics Node. GIN has a GitHub-like interface and git-like CLI functionalities.
The projects are stored in folders named after the species - e.g. jewel-wasp
(Ampulex compressa).
Each species folder may contain various WAZP sample projects as zipped archives. For example, the jewel-wasp
folder contains the following projects:
short-clips_raw.zip
- a project containing short ~10 second clips extracted from raw .avi files.short-clips_compressed.zip
- same as above, but compressed using the H.264 codec and saved as .mp4 files.entire-video_raw.zip
- a project containing the raw .avi file of an entire video, ~32 minutes long.entire-video_compressed.zip
- same as above, but compressed using the H.264 codec and saved as .mp4 file.
Each WAZP sample project has the following structure:
{project-name}.zip
└── videos
├── {video1-name}.{ext}
├── {video1-name}.metadata.yaml
├── {video2-name}.{ext}
├── {video2-name}.metadata.yaml
└── ...
└── pose_estimation_results
├── {video1-name}{model-name}.h5
├── {video2-name}{model-name}.h5
└── ...
└── WAZP_config.yaml
└── metadata_fields.yaml
To learn more about how the sample projects were generated, see scripts/generate_sample_projects
in the WAZP GitHub repository.
To fetch the data from GIN, we use the pooch Python package, which can download data from pre-specified URLs and store them locally for all subsequent uses. It also provides some nice utilities, like verification of sha256 hashes and decompression of archives.
The relevant funcitonality is implemented in the wazp.datasets.py
module. The most important parts of this module are:
- The
sample_projects
registry, which contains a list of the zipped projects and their known hashes. - The
find_sample_projects()
function, which returns the names of available projects per species, in the form of a dictionary. - The
get_sample_project()
function, which downloads a project (if not already cached locally), unzips it, and returns the path to the unzipped folder.
Example usage:
>>> from wazp.datasets import find_sample_projects, get_sample_project
>>> projects_per_species = find_sample_projects()
>>> print(projects_per_species)
{'jewel-wasp': ['short-clips_raw', 'short-clips_compressed', 'entire-video_raw', 'entire-video_compressed']}
>>> project_path = get_sample_project('jewel-wasp', 'short-clips_raw')
>>> print(project_path)
/home/user/.WAZP/sample_data/jewel-wasp/short-clips_raw
By default, the projects are stored in the ~/.WAZP/sample_data
folder. This can be changed by setting the LOCAL_DATA_DIR
variable in the wazp.datasets.py
module.
Only core WAZP developers may add new projects to the external data repository. To add a new poject, you will need to:
- Create a GIN account
- Ask to be added as a collaborator on the WAZP data repository (if not already)
- Download the GIN CLI and set it up with your GIN credentials, by running
gin login
in a terminal. - Clone the WAZP data repository to your local machine, by running
gin get SainsburyWellcomeCentre/WAZP
in a terminal. - Add your new projects, followed by
gin commit -m <message> <filename>
. Make sure to follow the project organisation as described above. Don't forget to modify the README file accordingly. - Upload the committed changes to the GIN repository, by running
gin upload
. Latest changes to the repository can be pulled viagin download
.gin sync
will synchronise the latest changes bidirectionally. - Determine the sha256 checksum hash of each new project archive, by running
sha256sum {project-name.zip}
in a terminal. Alternatively, you can usepooch
to do this for you:python -c "import pooch; pooch.file_hash('/path/to/file.zip')"
. If you wish to generate a text file containing the hashes of all the files in a given folder, you can usepython -c "import pooch; pooch.make_registry('/path/to/folder', 'hash_registry.txt')
. - Update the
wazp.datasets.py
module on the WAZP GitHub repository by adding the new projects to thesample_projects
registry. Make sure to include the correct sha256 hash, as determined in the previous step. Follow all the usual guidelines for contributing code. Additionally, you may want to update the scripts inscripts/generate_sample_projects
, depending on how you generated the new projects. Make sure to test whether the new projects can be fetched successfully (see fetching projects above) before submitting your pull request.
You can also perform steps 3-6 via the GIN web interface, if you prefer to avoid using the CLI.
This package layout and configuration (including pre-commit hooks and GitHub actions) have been copied from the python-cookiecutter template.