Skip to content

Commit

Permalink
Merge pull request #456 from ttngu207/dev_aeon-docs-78
Browse files Browse the repository at this point in the history
Docs to deploy/onboard new datajoint pipeline for project aeon & ingest/process released data
  • Loading branch information
MilagrosMarin authored Dec 4, 2024
2 parents 6ed933a + 94ef78a commit bf4e56c
Show file tree
Hide file tree
Showing 5 changed files with 628 additions and 5 deletions.
15 changes: 10 additions & 5 deletions aeon/dj_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,20 +89,25 @@ animals, cameras, food patches setup, etc.
+ These information are either entered by hand, or parsed and inserted from configuration
yaml files.
+ For experiments these info can be inserted by running
+ [create_experiment_01](create_experiments/create_experiment_01.py)
+ [create_socialexperiment_0](create_experiments/create_socialexperiment_0.py)
+ [create_experiment_02](create_experiments/create_experiment_02.py)
+ [create_socialexperiment](create_experiments/create_socialexperiment.py)
(just need to do this once)

Tables in DataJoint are written with a `make()` function -
instruction to generate and insert new records to itself, based on data from upstream tables.
Triggering the auto ingestion and processing/computation routine is essentially
calling the `.populate()` method for all relevant tables.

These routines are prepared in this [auto-processing script](populate/process.py).
These routines are prepared in this [auto-processing script](populate/worker.py).
Essentially, turning on the auto-processing routine amounts to running the
following 2 commands (in different processing threads)
following 4 commands , either in sequence or in parallel (with different processing threads).
Data ingestion/populate with DataJoint is idempotent, so it is safe to run the same command multiple times.

aeon_ingest high
aeon_ingest pyrat_worker

aeon_ingest mid
aeon_ingest acquisition_worker

aeon_ingest streams_worker

aeon_ingest analysis_worker
Empty file.
68 changes: 68 additions & 0 deletions aeon/dj_pipeline/docs/PIPELINE_LOCAL_DEPLOYMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Pipeline Deployment (On-Premises)

This page describes the processes and required resources to deploy the Project Aeon data pipeline on-premises.

## Prerequisites

On the most basic level, in order to deploy and operate a DataJoint pipeline, you will need:

1. A MySQL database server (version 8.0) with configured to be DataJoint compatible
- see [here](https://github.com/datajoint/mysql-docker/blob/master/config/my.cnf) for configuration of the MySQL server to be DataJoint compatible
2. If you want to use a preconfigured Docker container ([install Docker](https://docs.docker.com/engine/install/)), run the following command:
```bash
docker run -d \
--name db \
-p 3306:3306 \
-e MYSQL_ROOT_PASSWORD=simple \
-v ./mysql/data:/var/lib/mysql \
datajoint/mysql:8.0 \
mysqld --default-authentication-plugin=mysql_native_password
```

A new MySQL server will be launched in a Docker Container with the following credentials:
- host: `localhost`
- username: `root`
- password: `simple`

To stop the container, run the following command:

```bash
docker stop db
```

3. a GitHub repository with the [codebase](https://github.com/SainsburyWellcomeCentre/aeon_mecha) of the DataJoint pipeline
- this repository is the codebase, no additional modifications are needed to deploy this codebase locally
4. file storage
- the pipeline requires a location to access/store the data files (this can be a local directory or mounted network storage)
5. compute
- you need some form of a compute environment with the right software installed to run the pipeline (this could be a laptop, local work station or an HPC cluster)

## Download the data

The released data for Project Aeon can be downloaded from the data repository [here](https://zenodo.org/records/13881885)


## Pipeline Installation & Configuration

### Installation Instructions

In order to run the pipeline, follow the instruction to install this codebase in the [Local set-up](../../../README.md#local-set-up) section

### Configuration Instructions

DataJoint requires a configuration file named `dj_local_conf.json`. This file should be located in the root directory of the codebase.

1. Generate the `dj_local_conf.json` file:
- Make a copy of the `sample_dj_local_conf.json` file with the exact name `dj_local_conf.json`.
- Update the file with your database credentials (username, password, and database host).
- Ensure the file is kept secure and not leaked.
2. In the `custom` section, specify the `database.prefix` - you can keep the default `aeon_`.
3. In the `custom` section, update the value of `ceph_aeon` (under `repository_config`) to the root directory of the downloaded data.
For example, if you download the data to `D:/data/project-aeon/aeon/data/raw/AEON3/...`, then update `ceph_aeon` to `D:/data/project-aeon/aeon/data`.


## Data Ingestion & Processing

Now that the pipeline is installed and configured, you can start ingesting and processing the downloaded data.

Follow the instructions in the [Data Ingestion & Processing](./notebooks/Data_Ingestion_and_Processing.ipynb) guide to learn how to ingest and process the downloaded data.
Loading

0 comments on commit bf4e56c

Please sign in to comment.