Merge pull request #456 from ttngu207/dev_aeon-docs-78

Docs to deploy/onboard new datajoint pipeline for project aeon & ingest/process released data
SainsburyWellcomeCentre · Dec 4, 2024 · bf4e56c · bf4e56c
2 parents 6ed933a + 94ef78a
commit bf4e56c
Show file tree

Hide file tree

Showing 5 changed files with 628 additions and 5 deletions.
diff --git a/aeon/dj_pipeline/README.md b/aeon/dj_pipeline/README.md
@@ -89,20 +89,25 @@ animals, cameras, food patches setup, etc.
 + These information are either entered by hand, or parsed and inserted from configuration
     yaml files.
 + For experiments these info can be inserted by running
-  + [create_experiment_01](create_experiments/create_experiment_01.py)
   + [create_socialexperiment_0](create_experiments/create_socialexperiment_0.py)
   + [create_experiment_02](create_experiments/create_experiment_02.py)
+  + [create_socialexperiment](create_experiments/create_socialexperiment.py)
   (just need to do this once)
 
 Tables in DataJoint are written with a `make()` function -
 instruction to generate and insert new records to itself, based on data from upstream tables.
 Triggering the auto ingestion and processing/computation routine is essentially
 calling the `.populate()` method for all relevant tables.
 
-These routines are prepared in this [auto-processing script](populate/process.py).
+These routines are prepared in this [auto-processing script](populate/worker.py).
 Essentially, turning on the auto-processing routine amounts to running the
-following 2 commands (in different processing threads)
+following 4 commands , either in sequence or in parallel (with different processing threads).
+Data ingestion/populate with DataJoint is idempotent, so it is safe to run the same command multiple times.
 
-    aeon_ingest high
+    aeon_ingest pyrat_worker
 
-    aeon_ingest mid
+    aeon_ingest acquisition_worker
+
+    aeon_ingest streams_worker
+
+    aeon_ingest analysis_worker
diff --git a/aeon/dj_pipeline/create_experiments/__init__.py b/aeon/dj_pipeline/create_experiments/__init__.py
diff --git a/aeon/dj_pipeline/docs/PIPELINE_LOCAL_DEPLOYMENT.md b/aeon/dj_pipeline/docs/PIPELINE_LOCAL_DEPLOYMENT.md
@@ -0,0 +1,68 @@
+# Pipeline Deployment (On-Premises)
+
+This page describes the processes and required resources to deploy the Project Aeon data pipeline on-premises.
+
+## Prerequisites
+
+On the most basic level, in order to deploy and operate a DataJoint pipeline, you will need:
+
+1. A MySQL database server (version 8.0) with configured to be DataJoint compatible
+   - see [here](https://github.com/datajoint/mysql-docker/blob/master/config/my.cnf) for configuration of the MySQL server to be DataJoint compatible
+2. If you want to use a preconfigured Docker container ([install Docker](https://docs.docker.com/engine/install/)), run the following command:
+      ```bash
+         docker run -d \
+           --name db \
+           -p 3306:3306 \
+           -e MYSQL_ROOT_PASSWORD=simple \
+           -v ./mysql/data:/var/lib/mysql \
+           datajoint/mysql:8.0 \
+           mysqld --default-authentication-plugin=mysql_native_password
+      ```
+
+    A new MySQL server will be launched in a Docker Container with the following credentials: 
+    - host: `localhost`
+    - username: `root`
+    - password: `simple`
+
+   To stop the container, run the following command:
+
+    ```bash
+       docker stop db
+    ```
+
+3. a GitHub repository with the [codebase](https://github.com/SainsburyWellcomeCentre/aeon_mecha) of the DataJoint pipeline
+   - this repository is the codebase, no additional modifications are needed to deploy this codebase locally
+4. file storage
+   - the pipeline requires a location to access/store the data files (this can be a local directory or mounted network storage)
+5. compute
+   - you need some form of a compute environment with the right software installed to run the pipeline (this could be a laptop, local work station or an HPC cluster)
+
+## Download the data
+
+The released data for Project Aeon can be downloaded from the data repository [here](https://zenodo.org/records/13881885)
+
+
+## Pipeline Installation & Configuration
+
+### Installation Instructions
+
+In order to run the pipeline, follow the instruction to install this codebase in the [Local set-up](../../../README.md#local-set-up) section
+
+### Configuration Instructions
+
+DataJoint requires a configuration file named `dj_local_conf.json`. This file should be located in the root directory of the codebase.
+
+1. Generate the `dj_local_conf.json` file:
+   - Make a copy of the `sample_dj_local_conf.json` file with the exact name `dj_local_conf.json`.
+   - Update the file with your database credentials (username, password, and database host).
+   - Ensure the file is kept secure and not leaked.
+2. In the `custom` section, specify the `database.prefix` - you can keep the default `aeon_`.
+3. In the `custom` section, update the value of `ceph_aeon` (under `repository_config`) to the root directory of the downloaded data.
+For example, if you download the data to `D:/data/project-aeon/aeon/data/raw/AEON3/...`, then update `ceph_aeon` to `D:/data/project-aeon/aeon/data`.
+
+
+## Data Ingestion & Processing
+
+Now that the pipeline is installed and configured, you can start ingesting and processing the downloaded data.
+
+Follow the instructions in the [Data Ingestion & Processing](./notebooks/Data_Ingestion_and_Processing.ipynb) guide to learn how to ingest and process the downloaded data.