-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code for extracting additional channel based on baseline++ #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job Nik!
I included some comments that I hope are useful. The two main ones are:
(1) I think there is an alternative way of computing the channels that may be a bit clearer and more memory efficient - basically I think we can do away with the batch parameter (I'm not sure if it is entirely correct to use it actually - more on that on the review).
(2) I think there may also be an issue when computing the differences between frames for the motion channel: as it is now, I think they are computed between frames that are delta
frames apart in the list of extracted frames, rather than delta
frames apart in the actual video.
I suggested having two separate loops: one for computing the mean_blurred_frame (aka, the "background") and the max_abs_blurred_frame, and another one to compute for each frame selected for labelling, the background subtracted frame and the motion frame.
A smaller comment is that generally it is nice to include a brief description of the PR in the comment above (just saying what it does, sort of giving an overview). But the size of the PR was perfect! I always struggle with that 😅
Let me know if anything is not clear (or if I understood things wrong), happy to discuss!
|
||
for vid_file, frame_idx in frame_dict.items(): | ||
if not os.path.exists(vid_file): | ||
print(f"Video path not found: {vid_file}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add in the message that in that case the video is skipped
frame_image = (frame_image * 255).astype(np.uint8) | ||
out_fp = os.path.join(args.out_dir, file_name) | ||
Image.fromarray(frame_image).save(out_fp, quality=95) | ||
del frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the del frames
bit needed? if so, maybe it'd be nice to add a comment to clarify why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps with the OOM issue.
return None | ||
|
||
|
||
def get_frames(args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of almost self-explanatory function names, and I feel maybe this one could be a bit more clear about what it does? Maybe something like compute_additional_channels
or compute_stacked_inputs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an alternative way of computing the channels that may be a bit clearer and more memory efficient - basically I think we can do away with the batch parameter.
I'm not sure if it is entirely correct to use it actually. If I am interpreting correctly, we are computing the mean over chunks of 1000 frames (whereas in the original implementation they compute it over the whole video)....is that correct?
I think there may also be an issue when computing the differences between frames for the motion channel: as it is now, I think they are computed between frames that are delta
frames apart in the list of extracted frames, rather than delta
frames apart in the actual video.
I would suggest having two loops: one for computing the mean_blurred_frame (aka, the "background") and the max_abs_blurred_frame, and another one to compute the background subtracted frame and the motion frame for each frame selected for labelling.
For the first one (computing the mean and the max_abs arrays), I would do the following (in pseudocode)
# initialise capture for this video
cap = cv2.VideoCapture(vid_file)
# get image size
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
# initialise array for mean blurred frame
mean_blurred_frame = np.zeros((width, height))
# initialise array for max_abs_blurred_frame
max_abs_blurred_frame = np.zeros((width, height))
# loop thru frames in the video
frame_counter = 0
while True:
# read a frame
success_frame, frame = cap.read()
if not success_frame:
print("Can't receive frame. Exiting ...")
break
# apply transformations to that frame
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
blurred_frame = cv2.GaussianBlur(frame, (5, 5), 0)
# accumulate sum in mean_blurred_frame
mean_blurred_frame += blurred_frame
# accumulate max absolute values in max_abs_blurred_frame
max_abs_blurred_frame = np.maximum(max_abs_blurred_frame, abs(blurred_frames)) # for each element of the array, select the one with the max absolute value; see here https://numpy.org/doc/stable/reference/generated/numpy.maximum.html
# update frame counter
frame_counter += 1
# compute the mean
mean_blurred_frame = mean_blurred_frame/frame_counter
Note that in this loop we go through all the frames in the video. We also compute the average across all frames, but we avoid holding a large 3d array in memory by accumulating the temporary values in mean_blurred_frame
and max_abs_blurred_frame
|
||
# detecting motion by finding the differences between frame | ||
# set the delta : frame[i+delta] - frame[i] | ||
for i, frame_offset in enumerate(range(len(frames) - delta)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be clearer if we the loop goes from frame f=delta
until f=n_frames
, and at each iteration we compute the difference between frame f
and frame f-delta
?
If I understand correctly it should be equivalent right? (maybe I'm missing something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go around this loop slightly differently. This would be the second loop I was referring to earlier.
In pseudocode, I would do the following:
frame_dict = read_json_file(args.json_path)
for vid_file, list_extracted_frame_idcs in frame_dict.items():
# initialise video capture object
cap = cv2.VideoCapture(vid_file)
# for every frame extracted for labelling
for frame_idx in list_extracted_frame_idcs:
# read the frame from the video
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
success, frame = cap.read()
if not success:
# break or warning.....
# transform the frame
# (since we do these operations quite a bit, we could have them as separate functions)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
blurred_frame = cv2.GaussianBlur(frame, (5, 5), 0)
#-------------------------------
# compute the background subtracted frame
# (note that mean_blurred_frame and max_abs_blurred_frame were computed in a previous separate loop)
background_subtracted_frame = (((frame - mean_blurred_frame)/max_abs_blurred_frame) + 1)/2
#-------------------------------
# compute the motion frame
# read frame f-delta, the frame delta frames before the current one
# NOTE: we'd need to have a way to deal with the case in which frame_idx - delta < 0. For example, we can force it to be at least 0?
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx - delta)
success_delta, frame_delta = cap.read()
# transform the frame f-delta
frame_delta = cv2.cvtColor(frame_delta, cv2.COLOR_BGR2GRAY)
blurred_frame_delta = cv2.GaussianBlur(frame_delta, (5, 5), 0)
# compute the motion channel for frame f
motion_frame = np.abs(blurred_frame - blurred_frame_delta)
#-------------------------------
# stack the three channels
final_frame = np.dstack(
[
cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), # original grayscaled image?
background_subtracted_frame,
motion_frame
]
)
# save final frame as file
Image.fromarray(final_frame).save(path, quality=95)
The sections between dotted lines could be separate functions. Note that we could have a frame selected for labelling with index < delta, and therefore it doesn't have a frame that is delta frames before it. Maybe in that case we can cap it? Not sure if there is a better way, let me know thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCV is a bit odd when you want to fetch a specific frame in a video. Below a snippet to show how it works (from this notebook):
frame_idx = 25 # to read frame with index 25 (in zero indexed system)
# this sets the index for the frame we will read next
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
# this reads the frame at idx=25 and updates the index to idx=26
success, frame = cap.read()
# you can check that the index is 26 here
print(cap.get(cv2.CAP_PROP_POS_FRAMES))
# detecting motion by finding the differences between frame | ||
# set the delta : frame[i+delta] - frame[i] | ||
for i, frame_offset in enumerate(range(len(frames) - delta)): | ||
if (i + (n_frame - batch_size)) in frame_idx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, I think there is an issue here with frame_idx
. If I understand correctly, it is the list of frames extracted for labelling. But we want to compute the difference between frames that are separated by delta frames in the original video, not in the subset of extracted frames (right?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes frame_idx
is the list of frames extracted for labelling. The original code will not have the condition there, and save all the frames. This condition is to make sure we save the correct frame we want as in the frame_idx
list. We still compute the difference between frame[i+delta] - frame[I]
. So if let's say
frame_idx = [270, ..]
if 270 in frame_idx:
frame_images = np.dstack([
frame[270],
blurred_frames[270],
blurred_frames[270+delta] - blurred_frames[270]
])
Is it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I will check again in regards to the frame indexing you mentioned above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah ok I see!! Thanks for clarifying (I remember now you mentioned this earlier, sorry I forgot!)
I didn't realise blurred_frames
is the array with all the frames, not only the ones selected for labelling.
I think maybe the frame indexing approach is more readable, but happy to discuss!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff!
I made some minor changes, so please can you check everything is still a-ok?
The changes are mainly:
- I changed the docstrings format from Google to Numpy - we can chat about this if you want
- I split the larger function computing the stacked inputs into smaller ones
Happy to discuss any thoughts!
"""Function to apply transformation to the frame. | ||
Convert the frame to grayscale and apply Gaussian blurring | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that these docstrings are Google format rather than Numpy format?
See the difference here
# compute the mean | ||
mean_blurred_frame = mean_blurred_frame / frame_counter | ||
# save the mean | ||
cv2.imwrite(f"{Path(vid_file).stem}_mean.jpg", mean_blurred_frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, fantastic for sanity check!
I'm thinking in the future we could save it or not based on a flag - I added it as an enhancement issue #16
|
||
if not success_delta: | ||
# Break the loop if no more frames to read | ||
print(f"Cannot read frame{frame_idx}+{args.delta}. Exiting...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good enough for now - basically we won't label any frame whose motion channel cannot be computed. But we need to make sure we don't label those frames by mistake (since it will be useless) - I added another issue for that #17
background_subtracted_frame_delta = ( | ||
((blurred_frame_delta - mean_blurred_frame) / max_abs_blurred_frame) + 1 | ||
) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we use this a couple of times, maybe this could also be a separate function that receives blurred frame and mean blurred frame, and outputs the background subtracted one (with the normalisation, +1 and division by 2)
( | ||
mean_blurred_frame, | ||
max_abs_blurred_frame, | ||
) = compute_mean_and_max_abs_blurred_frame(cap, args.kernel_size, args.sigmax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the computation of mean_blurred_frame
and max_abs_blurred_frame
to their own separate function (to try make functions a bit more compact and focused)
import json | ||
|
||
|
||
def read_json_file(file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic! ✨
No description provided.