-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Inference on cluster #189
Open
nikk-nikaznan
wants to merge
60
commits into
main
Choose a base branch
from
nikkna/inference_cluster
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
4d1383a
adding config file, load from checkpoint
nikk-nikaznan 94761b8
adding inference to toml
nikk-nikaznan e4f1bac
adding bash script
nikk-nikaznan 0b3ddd9
change variable
nikk-nikaznan 892914e
change variable
nikk-nikaznan 66c22be
naming error
nikk-nikaznan 3fab713
naming error
nikk-nikaznan 2b0d273
fixed import
nikk-nikaznan 85452af
cleaned up sort
nikk-nikaznan f056b41
add app_wrapper
nikk-nikaznan 8780c36
changed accelerator
nikk-nikaznan 56b74ff
bugs
nikk-nikaznan a30b0dc
removed accelerator
nikk-nikaznan 918674d
removed accelerator
nikk-nikaznan 2d6da1e
wrong path
nikk-nikaznan e458c6d
edit path
nikk-nikaznan 29cfea6
adding batches
nikk-nikaznan ec6886a
debugging oom
nikk-nikaznan 83ed342
save video to false
nikk-nikaznan d3942ff
save video to false
nikk-nikaznan 2900a9e
adding device
nikk-nikaznan 500d274
revert the batch out
nikk-nikaznan 7260ca8
modify bash script
nikk-nikaznan def687a
add guide
nikk-nikaznan 1a5d853
debugging
nikk-nikaznan 8ca41c3
fixed codec
nikk-nikaznan be6cff9
cleaned up
nikk-nikaznan 7117511
adding gt_dir
nikk-nikaznan 45cd8bd
codev revert
nikk-nikaznan 1c56dfc
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan 6077a7e
adding some logging
nikk-nikaznan e5d362f
Merge branch 'main' of github.com:SainsburyWellcomeCentre/crabs-explo…
nikk-nikaznan a114200
cleaned up rebase
nikk-nikaznan 17146ad
some changes based on the new modules
nikk-nikaznan 1e250b0
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan 3ccc258
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan 6d22c4f
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan bfd97bd
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan 8284157
adding bash script for running all escape events
nikk-nikaznan cf04af3
small changes on the bash script
nikk-nikaznan 8d4c5a2
changed to the correct video example
nikk-nikaznan 2ffce7a
changes of guide
nikk-nikaznan 8663563
removed device, already set in code
nikk-nikaznan 9af60ee
check cuda status
nikk-nikaznan 86a309b
modified some path
nikk-nikaznan 3d33730
changes branch to main
nikk-nikaznan b72b4b3
add args to handle run on directory on the cluster
nikk-nikaznan 2b9973e
add args to handle run on directory on the cluster
nikk-nikaznan feace52
cleaned up
nikk-nikaznan 7977b48
cleaned up
nikk-nikaznan bff7606
forgot the args
nikk-nikaznan 9c0a560
Merge branch 'main' into nikkna/inference_cluster
nikk-nikaznan c5bd870
Update guides/TrackingModelHPC.md
nikk-nikaznan cd497d7
extension, check dir
nikk-nikaznan f87814c
Merge branch 'nikkna/inference_cluster' of github.com:SainsburyWellco…
nikk-nikaznan 586d412
Update bash_scripts/run_tracking.sh
nikk-nikaznan 742ee1a
debug
nikk-nikaznan b96d4fb
debug
nikk-nikaznan e8d77f0
add log
nikk-nikaznan 5121e45
add log
nikk-nikaznan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
#!/bin/bash | ||
|
||
#SBATCH -p gpu # a100 # partition | ||
#SBATCH --gres=gpu:1 | ||
#SBATCH -N 1 # number of nodes | ||
#SBATCH --ntasks-per-node 8 # 2 # max number of tasks per node | ||
#SBATCH --mem 64G # memory pool for all cores | ||
#SBATCH -t 3-00:00 # time (D-HH:MM) | ||
#SBATCH -o slurm.%A.%N.out | ||
#SBATCH -e slurm.%A.%N.err | ||
#SBATCH --mail-type=ALL | ||
#SBATCH [email protected] | ||
|
||
# --------------------- | ||
# Source bashrc | ||
# ---------------------- | ||
# Otherwise `which python` points to the miniconda module's Python | ||
source ~/.bashrc | ||
|
||
# memory | ||
# see https://pytorch.org/docs/stable/notes/cuda.html#environment-variables | ||
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True | ||
|
||
# ----------------------------- | ||
# Error settings for bash | ||
# ----------------------------- | ||
# see https://wizardzines.com/comics/bash-errors/ | ||
set -e # do not continue after errors | ||
set -u # throw error if variable is unset | ||
set -o pipefail # make the pipe fail if any part of it fails | ||
|
||
# --------------------- | ||
# Define variables | ||
# ---------------------- | ||
|
||
# video and inference config | ||
VIDEO_PATH=/ceph/zoo/users/sminano/crabs_tracks_label/04.09.2023-04-Right_RE_test/04.09.2023-04-Right_RE_test.mp4 | ||
CONFIG_FILE=/ceph/zoo/users/sminano/cluster_tracking_config.yaml | ||
|
||
# checkpoint | ||
TRAINED_MODEL_PATH=/ceph/zoo/users/sminano/ml-runs-all/ml_runs-nikkna-copy/243676951438603508/8dbe61069f17453a87c27b4f61f6e681/checkpoints/last.ckpt | ||
|
||
|
||
# output directory | ||
OUTPUT_DIR=/ceph/zoo/users/sminano/crabs_track_output | ||
|
||
# ground truth if available | ||
GT_PATH=/ceph/zoo/users/sminano/crabs_tracks_label/04.09.2023-04-Right_RE_test/04.09.2023-04-Right_RE_test_corrected_ST_csv.csv | ||
|
||
# version of the codebase | ||
GIT_BRANCH=main | ||
|
||
# ----------------------------- | ||
# Create virtual environment | ||
# ----------------------------- | ||
module load miniconda | ||
|
||
# Define a environment for each job in the | ||
# temporary directory of the compute node | ||
ENV_NAME=crabs-dev-$SLURM_JOB_ID | ||
ENV_PREFIX=$TMPDIR/$ENV_NAME | ||
|
||
# create environment | ||
conda create \ | ||
--prefix $ENV_PREFIX \ | ||
-y \ | ||
python=3.10 | ||
|
||
# activate environment | ||
conda activate $ENV_PREFIX | ||
|
||
# install crabs package in virtual env | ||
python -m pip install git+https://github.com/SainsburyWellcomeCentre/crabs-exploration.git@$GIT_BRANCH | ||
|
||
|
||
# log pip and python locations | ||
echo $ENV_PREFIX | ||
which python | ||
which pip | ||
|
||
# print the version of crabs package (last number is the commit hash) | ||
echo "Git branch: $GIT_BRANCH" | ||
conda list crabs | ||
echo "-----" | ||
|
||
# ------------------------------------ | ||
# GPU specs | ||
# ------------------------------------ | ||
echo "Memory used per GPU before training" | ||
echo $(nvidia-smi --query-gpu=name,memory.total,memory.free,memory.used --format=csv) #noheader | ||
echo "-----" | ||
|
||
# ------------------- | ||
# Run evaluation script | ||
# ------------------- | ||
detect-and-track-video \ | ||
--trained_model_path $TRAINED_MODEL_PATH \ | ||
--video_path $VIDEO_PATH \ | ||
--config_file $CONFIG_FILE \ | ||
--output_dir $OUTPUT_DIR \ | ||
--gt_path $GT_PATH |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
#!/bin/bash | ||
|
||
#SBATCH -p gpu # a100 # partition | ||
#SBATCH --gres=gpu:1 | ||
#SBATCH -N 1 # number of nodes | ||
#SBATCH --ntasks-per-node 8 # 2 # max number of tasks per node | ||
#SBATCH --mem 64G # memory pool for all cores | ||
#SBATCH -t 3-00:00 # time (D-HH:MM) | ||
#SBATCH -o slurm.%A.%N.out | ||
#SBATCH -e slurm.%A.%N.err | ||
#SBATCH --mail-type=ALL | ||
#SBATCH [email protected] | ||
|
||
# --------------------- | ||
# Source bashrc | ||
# ---------------------- | ||
# Otherwise `which python` points to the miniconda module's Python | ||
source ~/.bashrc | ||
|
||
# memory | ||
# see https://pytorch.org/docs/stable/notes/cuda.html#environment-variables | ||
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True | ||
|
||
# ----------------------------- | ||
# Error settings for bash | ||
# ----------------------------- | ||
# see https://wizardzines.com/comics/bash-errors/ | ||
set -e # do not continue after errors | ||
set -u # throw error if variable is unset | ||
set -o pipefail # make the pipe fail if any part of it fails | ||
|
||
# --------------------- | ||
# Define variables | ||
# ---------------------- | ||
|
||
# video and inference config | ||
VIDEO_DIR=/ceph/scratch/nikkna/crabs-exploration/crab_video | ||
VIDEO_EXT=mp4 | ||
CONFIG_FILE=/ceph/zoo/users/sminano/cluster_tracking_config.yaml | ||
|
||
# checkpoint | ||
TRAINED_MODEL_PATH=/ceph/zoo/users/sminano/ml-runs-all/ml_runs-nikkna-copy/243676951438603508/8dbe61069f17453a87c27b4f61f6e681/checkpoints/last.ckpt | ||
|
||
# output directory | ||
OUTPUT_DIR=/ceph/scratch/nikkna/crabs-exploration/crabs_track_output | ||
|
||
# version of the codebase | ||
GIT_BRANCH=main | ||
|
||
# Check if the target is not a directory | ||
if [ ! -d "$VIDEO_DIR" ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice! ✨ |
||
exit 1 | ||
fi | ||
|
||
# ----------------------------- | ||
# Create virtual environment | ||
# ----------------------------- | ||
module load miniconda | ||
|
||
# Define a environment for each job in the | ||
# temporary directory of the compute node | ||
ENV_NAME=crabs-dev-$SLURM_JOB_ID | ||
ENV_PREFIX=$TMPDIR/$ENV_NAME | ||
|
||
# create environment | ||
conda create \ | ||
--prefix $ENV_PREFIX \ | ||
-y \ | ||
python=3.10 | ||
|
||
# activate environment | ||
conda activate $ENV_PREFIX | ||
|
||
# install crabs package in virtual env | ||
python -m pip install git+https://github.com/SainsburyWellcomeCentre/crabs-exploration.git@$GIT_BRANCH | ||
|
||
# log pip and python locations | ||
echo $ENV_PREFIX | ||
which python | ||
which pip | ||
|
||
# print the version of crabs package (last number is the commit hash) | ||
echo "Git branch: $GIT_BRANCH" | ||
conda list crabs | ||
echo "-----" | ||
|
||
# ------------------------------------ | ||
# GPU specs | ||
# ------------------------------------ | ||
echo "Memory used per GPU before training" | ||
echo $(nvidia-smi --query-gpu=name,memory.total,memory.free,memory.used --format=csv) #noheader | ||
echo "-----" | ||
|
||
# ------------------- | ||
# Run evaluation script for each .mov file in VIDEO_DIR | ||
# ------------------- | ||
|
||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S") | ||
PARENT_OUTPUT_DIR="${OUTPUT_DIR}_${TIMESTAMP}" | ||
mkdir -p "$PARENT_OUTPUT_DIR" | ||
|
||
for VIDEO_PATH in "$VIDEO_DIR"/*"$VIDEO_EXT"; do | ||
VIDEO_BASENAME=$(basename "$VIDEO_PATH" ."$VIDEO_EXT") | ||
|
||
echo "Processing video: $VIDEO_PATH" | ||
|
||
VIDEO_OUTPUT_DIR="$PARENT_OUTPUT_DIR/$VIDEO_BASENAME" | ||
|
||
mkdir -p "$VIDEO_OUTPUT_DIR" | ||
|
||
detect-and-track-video \ | ||
--trained_model_path "$TRAINED_MODEL_PATH" \ | ||
--video_path "$VIDEO_PATH" \ | ||
--config_file "$CONFIG_FILE" \ | ||
--output_dir "$VIDEO_OUTPUT_DIR" \ | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this one! ✨
Maybe going forwards I can combine them to read a dir or a single video, but this is a great starting point