Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Inference on cluster #189

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
4d1383a
adding config file, load from checkpoint
nikk-nikaznan Jun 17, 2024
94761b8
adding inference to toml
nikk-nikaznan Jun 17, 2024
e4f1bac
adding bash script
nikk-nikaznan Jun 18, 2024
0b3ddd9
change variable
nikk-nikaznan Jun 18, 2024
892914e
change variable
nikk-nikaznan Jun 18, 2024
66c22be
naming error
nikk-nikaznan Jun 18, 2024
3fab713
naming error
nikk-nikaznan Jun 18, 2024
2b0d273
fixed import
nikk-nikaznan Jun 18, 2024
85452af
cleaned up sort
nikk-nikaznan Jun 18, 2024
f056b41
add app_wrapper
nikk-nikaznan Jun 18, 2024
8780c36
changed accelerator
nikk-nikaznan Jun 18, 2024
56b74ff
bugs
nikk-nikaznan Jun 18, 2024
a30b0dc
removed accelerator
nikk-nikaznan Jun 18, 2024
918674d
removed accelerator
nikk-nikaznan Jun 18, 2024
2d6da1e
wrong path
nikk-nikaznan Jun 18, 2024
e458c6d
edit path
nikk-nikaznan Jun 19, 2024
29cfea6
adding batches
nikk-nikaznan Jun 19, 2024
ec6886a
debugging oom
nikk-nikaznan Jun 19, 2024
83ed342
save video to false
nikk-nikaznan Jun 19, 2024
d3942ff
save video to false
nikk-nikaznan Jun 19, 2024
2900a9e
adding device
nikk-nikaznan Jun 19, 2024
500d274
revert the batch out
nikk-nikaznan Jun 20, 2024
7260ca8
modify bash script
nikk-nikaznan Jun 20, 2024
def687a
add guide
nikk-nikaznan Jun 21, 2024
1a5d853
debugging
nikk-nikaznan Jun 21, 2024
8ca41c3
fixed codec
nikk-nikaznan Jun 21, 2024
be6cff9
cleaned up
nikk-nikaznan Jun 21, 2024
7117511
adding gt_dir
nikk-nikaznan Jun 21, 2024
45cd8bd
codev revert
nikk-nikaznan Jun 21, 2024
1c56dfc
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan Jun 21, 2024
6077a7e
adding some logging
nikk-nikaznan Jun 21, 2024
e5d362f
Merge branch 'main' of github.com:SainsburyWellcomeCentre/crabs-explo…
nikk-nikaznan Jun 28, 2024
a114200
cleaned up rebase
nikk-nikaznan Jun 28, 2024
17146ad
some changes based on the new modules
nikk-nikaznan Jun 28, 2024
1e250b0
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan Jul 4, 2024
3ccc258
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan Jul 4, 2024
6d22c4f
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan Jul 8, 2024
bfd97bd
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan Jul 9, 2024
8284157
adding bash script for running all escape events
nikk-nikaznan Jul 9, 2024
cf04af3
small changes on the bash script
nikk-nikaznan Jul 9, 2024
8d4c5a2
changed to the correct video example
nikk-nikaznan Jul 9, 2024
2ffce7a
changes of guide
nikk-nikaznan Jul 9, 2024
8663563
removed device, already set in code
nikk-nikaznan Jul 9, 2024
9af60ee
check cuda status
nikk-nikaznan Jul 9, 2024
86a309b
modified some path
nikk-nikaznan Jul 9, 2024
3d33730
changes branch to main
nikk-nikaznan Jul 9, 2024
b72b4b3
add args to handle run on directory on the cluster
nikk-nikaznan Jul 10, 2024
2b9973e
add args to handle run on directory on the cluster
nikk-nikaznan Jul 10, 2024
feace52
cleaned up
nikk-nikaznan Jul 10, 2024
7977b48
cleaned up
nikk-nikaznan Jul 10, 2024
bff7606
forgot the args
nikk-nikaznan Jul 10, 2024
9c0a560
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan Jul 22, 2024
c5bd870
Update guides/TrackingModelHPC.md
nikk-nikaznan Jul 22, 2024
cd497d7
extension, check dir
nikk-nikaznan Jul 22, 2024
f87814c
Merge branch 'nikkna/inference_cluster' of github.com:SainsburyWellco…
nikk-nikaznan Jul 22, 2024
586d412
Update bash_scripts/run_tracking.sh
nikk-nikaznan Jul 22, 2024
742ee1a
debug
nikk-nikaznan Jul 29, 2024
b96d4fb
debug
nikk-nikaznan Jul 29, 2024
e8d77f0
add log
nikk-nikaznan Jul 29, 2024
5121e45
add log
nikk-nikaznan Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions bash_scripts/run_tracking.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/bin/bash

#SBATCH -p gpu # a100 # partition
#SBATCH --gres=gpu:1
#SBATCH -N 1 # number of nodes
#SBATCH --ntasks-per-node 8 # 2 # max number of tasks per node
#SBATCH --mem 64G # memory pool for all cores
#SBATCH -t 3-00:00 # time (D-HH:MM)
#SBATCH -o slurm.%A.%N.out
#SBATCH -e slurm.%A.%N.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]

# ---------------------
# Source bashrc
# ----------------------
# Otherwise `which python` points to the miniconda module's Python
source ~/.bashrc

# memory
# see https://pytorch.org/docs/stable/notes/cuda.html#environment-variables
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

# -----------------------------
# Error settings for bash
# -----------------------------
# see https://wizardzines.com/comics/bash-errors/
set -e # do not continue after errors
set -u # throw error if variable is unset
set -o pipefail # make the pipe fail if any part of it fails

# ---------------------
# Define variables
# ----------------------

# video and inference config
VIDEO_PATH=/ceph/zoo/users/sminano/crabs_tracks_label/04.09.2023-04-Right_RE_test/04.09.2023-04-Right_RE_test.mp4
CONFIG_FILE=/ceph/zoo/users/sminano/cluster_tracking_config.yaml

# checkpoint
TRAINED_MODEL_PATH=/ceph/zoo/users/sminano/ml-runs-all/ml_runs-nikkna-copy/243676951438603508/8dbe61069f17453a87c27b4f61f6e681/checkpoints/last.ckpt


# output directory
OUTPUT_DIR=/ceph/zoo/users/sminano/crabs_track_output

# ground truth if available
GT_PATH=/ceph/zoo/users/sminano/crabs_tracks_label/04.09.2023-04-Right_RE_test/04.09.2023-04-Right_RE_test_corrected_ST_csv.csv

# version of the codebase
GIT_BRANCH=main

# -----------------------------
# Create virtual environment
# -----------------------------
module load miniconda

# Define a environment for each job in the
# temporary directory of the compute node
ENV_NAME=crabs-dev-$SLURM_JOB_ID
ENV_PREFIX=$TMPDIR/$ENV_NAME

# create environment
conda create \
--prefix $ENV_PREFIX \
-y \
python=3.10

# activate environment
conda activate $ENV_PREFIX

# install crabs package in virtual env
python -m pip install git+https://github.com/SainsburyWellcomeCentre/crabs-exploration.git@$GIT_BRANCH


# log pip and python locations
echo $ENV_PREFIX
which python
which pip

# print the version of crabs package (last number is the commit hash)
echo "Git branch: $GIT_BRANCH"
conda list crabs
echo "-----"

# ------------------------------------
# GPU specs
# ------------------------------------
echo "Memory used per GPU before training"
echo $(nvidia-smi --query-gpu=name,memory.total,memory.free,memory.used --format=csv) #noheader
echo "-----"

# -------------------
# Run evaluation script
# -------------------
detect-and-track-video \
--trained_model_path $TRAINED_MODEL_PATH \
--video_path $VIDEO_PATH \
--config_file $CONFIG_FILE \
--output_dir $OUTPUT_DIR \
--gt_path $GT_PATH
116 changes: 116 additions & 0 deletions bash_scripts/run_tracking_all_escape_events.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one! ✨

Maybe going forwards I can combine them to read a dir or a single video, but this is a great starting point

Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#!/bin/bash

#SBATCH -p gpu # a100 # partition
#SBATCH --gres=gpu:1
#SBATCH -N 1 # number of nodes
#SBATCH --ntasks-per-node 8 # 2 # max number of tasks per node
#SBATCH --mem 64G # memory pool for all cores
#SBATCH -t 3-00:00 # time (D-HH:MM)
#SBATCH -o slurm.%A.%N.out
#SBATCH -e slurm.%A.%N.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]

# ---------------------
# Source bashrc
# ----------------------
# Otherwise `which python` points to the miniconda module's Python
source ~/.bashrc

# memory
# see https://pytorch.org/docs/stable/notes/cuda.html#environment-variables
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

# -----------------------------
# Error settings for bash
# -----------------------------
# see https://wizardzines.com/comics/bash-errors/
set -e # do not continue after errors
set -u # throw error if variable is unset
set -o pipefail # make the pipe fail if any part of it fails

# ---------------------
# Define variables
# ----------------------

# video and inference config
VIDEO_DIR=/ceph/scratch/nikkna/crabs-exploration/crab_video
VIDEO_EXT=mp4
CONFIG_FILE=/ceph/zoo/users/sminano/cluster_tracking_config.yaml

# checkpoint
TRAINED_MODEL_PATH=/ceph/zoo/users/sminano/ml-runs-all/ml_runs-nikkna-copy/243676951438603508/8dbe61069f17453a87c27b4f61f6e681/checkpoints/last.ckpt

# output directory
OUTPUT_DIR=/ceph/scratch/nikkna/crabs-exploration/crabs_track_output

# version of the codebase
GIT_BRANCH=main

# Check if the target is not a directory
if [ ! -d "$VIDEO_DIR" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! ✨

exit 1
fi

# -----------------------------
# Create virtual environment
# -----------------------------
module load miniconda

# Define a environment for each job in the
# temporary directory of the compute node
ENV_NAME=crabs-dev-$SLURM_JOB_ID
ENV_PREFIX=$TMPDIR/$ENV_NAME

# create environment
conda create \
--prefix $ENV_PREFIX \
-y \
python=3.10

# activate environment
conda activate $ENV_PREFIX

# install crabs package in virtual env
python -m pip install git+https://github.com/SainsburyWellcomeCentre/crabs-exploration.git@$GIT_BRANCH

# log pip and python locations
echo $ENV_PREFIX
which python
which pip

# print the version of crabs package (last number is the commit hash)
echo "Git branch: $GIT_BRANCH"
conda list crabs
echo "-----"

# ------------------------------------
# GPU specs
# ------------------------------------
echo "Memory used per GPU before training"
echo $(nvidia-smi --query-gpu=name,memory.total,memory.free,memory.used --format=csv) #noheader
echo "-----"

# -------------------
# Run evaluation script for each .mov file in VIDEO_DIR
# -------------------

TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
PARENT_OUTPUT_DIR="${OUTPUT_DIR}_${TIMESTAMP}"
mkdir -p "$PARENT_OUTPUT_DIR"

for VIDEO_PATH in "$VIDEO_DIR"/*"$VIDEO_EXT"; do
VIDEO_BASENAME=$(basename "$VIDEO_PATH" ."$VIDEO_EXT")

echo "Processing video: $VIDEO_PATH"

VIDEO_OUTPUT_DIR="$PARENT_OUTPUT_DIR/$VIDEO_BASENAME"

mkdir -p "$VIDEO_OUTPUT_DIR"

detect-and-track-video \
--trained_model_path "$TRAINED_MODEL_PATH" \
--video_path "$VIDEO_PATH" \
--config_file "$CONFIG_FILE" \
--output_dir "$VIDEO_OUTPUT_DIR" \
done
16 changes: 15 additions & 1 deletion crabs/tracker/track_video.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,11 @@ def setup(self):
"""
Load tracking config, trained model and input video path.
"""
# Check for CUDA availability
if self.device == "cuda" and not torch.cuda.is_available():
nikk-nikaznan marked this conversation as resolved.
Show resolved Hide resolved
logging.info("CUDA is not available. Falling back to CPU.")
self.device = "cpu"

with open(self.config_file, "r") as f:
self.config = yaml.safe_load(f)

Expand All @@ -83,11 +88,15 @@ def prep_outputs(self):
"""
Prepare csv writer and if required, video writer.
"""
logging.info(self.video_file_root)
(
self.csv_writer,
self.csv_file,
self.tracking_output_dir,
) = prep_csv_writer(self.args.output_dir, self.video_file_root)
) = prep_csv_writer(
self.args.output_dir,
self.video_file_root,
)

if self.args.save_video:
frame_width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH))
Expand Down Expand Up @@ -299,6 +308,11 @@ def tracking_parse_args(args):
action="store_true",
help="Save frame to be used in correcting track labelling",
)
# parser.add_argument(
# "--run_on_video_dir",
# action="store_true",
# help="option to run track video on directory instead of a video.",
# )
parser.add_argument(
"--device",
type=str,
Expand Down
18 changes: 13 additions & 5 deletions crabs/tracker/utils/io.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import csv
import logging
import os
from datetime import datetime
from pathlib import Path
Expand Down Expand Up @@ -29,11 +30,18 @@ def prep_csv_writer(output_dir: str, video_file_root: str):
Tuple
A tuple containing the CSV writer, the CSV file object, and the tracking output directory path.
"""

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
tracking_output_dir = Path(output_dir + f"_{timestamp}") / video_file_root
# Create the subdirectory for the specific video file root
tracking_output_dir.mkdir(parents=True, exist_ok=True)
logging.info(video_file_root)
if os.path.isdir(Path(video_file_root)):
logging.info("here")
tracking_output_dir = Path(output_dir)
logging.info(tracking_output_dir)
else:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
tracking_output_dir = (
Path(output_dir + f"_{timestamp}") / video_file_root
)
# Create the subdirectory for the specific video file root
tracking_output_dir.mkdir(parents=True, exist_ok=True)

csv_file = open(
f"{str(tracking_output_dir)}/predicted_tracks.csv",
Expand Down
Loading