Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code for extracting additional channel based on baseline++ #9

Merged
merged 11 commits into from
Aug 7, 2023

Conversation

nikk-nikaznan
Copy link
Collaborator

No description provided.

@nikk-nikaznan nikk-nikaznan requested a review from sfmig July 31, 2023 14:03
Copy link
Collaborator

@sfmig sfmig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job Nik!

I included some comments that I hope are useful. The two main ones are:

(1) I think there is an alternative way of computing the channels that may be a bit clearer and more memory efficient - basically I think we can do away with the batch parameter (I'm not sure if it is entirely correct to use it actually - more on that on the review).

(2) I think there may also be an issue when computing the differences between frames for the motion channel: as it is now, I think they are computed between frames that are delta frames apart in the list of extracted frames, rather than delta frames apart in the actual video.

I suggested having two separate loops: one for computing the mean_blurred_frame (aka, the "background") and the max_abs_blurred_frame, and another one to compute for each frame selected for labelling, the background subtracted frame and the motion frame.

A smaller comment is that generally it is nice to include a brief description of the PR in the comment above (just saying what it does, sort of giving an overview). But the size of the PR was perfect! I always struggle with that 😅

Let me know if anything is not clear (or if I understood things wrong), happy to discuss!

bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved
bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved
bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved

for vid_file, frame_idx in frame_dict.items():
if not os.path.exists(vid_file):
print(f"Video path not found: {vid_file}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add in the message that in that case the video is skipped

bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved
bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved
frame_image = (frame_image * 255).astype(np.uint8)
out_fp = os.path.join(args.out_dir, file_name)
Image.fromarray(frame_image).save(out_fp, quality=95)
del frames
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the del frames bit needed? if so, maybe it'd be nice to add a comment to clarify why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps with the OOM issue.

return None


def get_frames(args):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of almost self-explanatory function names, and I feel maybe this one could be a bit more clear about what it does? Maybe something like compute_additional_channels or compute_stacked_inputs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an alternative way of computing the channels that may be a bit clearer and more memory efficient - basically I think we can do away with the batch parameter.

I'm not sure if it is entirely correct to use it actually. If I am interpreting correctly, we are computing the mean over chunks of 1000 frames (whereas in the original implementation they compute it over the whole video)....is that correct?

I think there may also be an issue when computing the differences between frames for the motion channel: as it is now, I think they are computed between frames that are delta frames apart in the list of extracted frames, rather than delta frames apart in the actual video.

I would suggest having two loops: one for computing the mean_blurred_frame (aka, the "background") and the max_abs_blurred_frame, and another one to compute the background subtracted frame and the motion frame for each frame selected for labelling.

For the first one (computing the mean and the max_abs arrays), I would do the following (in pseudocode)

# initialise capture for this video
cap = cv2.VideoCapture(vid_file)

# get image size
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)  
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)  

# initialise array for mean blurred frame
mean_blurred_frame = np.zeros((width, height))

# initialise array for max_abs_blurred_frame
max_abs_blurred_frame = np.zeros((width, height))

# loop thru frames in the video
frame_counter = 0
while True:
    # read a frame
    success_frame, frame = cap.read()
    if not success_frame:
        print("Can't receive frame. Exiting ...")
        break
        
   # apply transformations to that frame
   frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
   blurred_frame = cv2.GaussianBlur(frame, (5, 5), 0)
   
   # accumulate sum in mean_blurred_frame
   mean_blurred_frame += blurred_frame
   
   # accumulate max absolute values in max_abs_blurred_frame
   max_abs_blurred_frame = np.maximum(max_abs_blurred_frame, abs(blurred_frames))  # for each element of the array, select the one with the max absolute value; see here https://numpy.org/doc/stable/reference/generated/numpy.maximum.html
   
   # update frame counter
   frame_counter += 1
   
# compute the mean
mean_blurred_frame = mean_blurred_frame/frame_counter

Note that in this loop we go through all the frames in the video. We also compute the average across all frames, but we avoid holding a large 3d array in memory by accumulating the temporary values in mean_blurred_frame and max_abs_blurred_frame


# detecting motion by finding the differences between frame
# set the delta : frame[i+delta] - frame[i]
for i, frame_offset in enumerate(range(len(frames) - delta)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be clearer if we the loop goes from frame f=delta until f=n_frames, and at each iteration we compute the difference between frame f and frame f-delta?

If I understand correctly it should be equivalent right? (maybe I'm missing something)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go around this loop slightly differently. This would be the second loop I was referring to earlier.

In pseudocode, I would do the following:

frame_dict = read_json_file(args.json_path)

for vid_file, list_extracted_frame_idcs in frame_dict.items():
   # initialise video capture object
   cap = cv2.VideoCapture(vid_file)

  # for every frame extracted for labelling
  for frame_idx in list_extracted_frame_idcs:
  
      # read the frame from the video
      cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
      success, frame = cap.read()
      
      if not success:
          # break or warning.....
      
      # transform the frame
      # (since we do these operations quite a bit, we could have them as separate functions)
      frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
      blurred_frame = cv2.GaussianBlur(frame, (5, 5), 0)
      
      #-------------------------------
      # compute the background subtracted frame
      # (note that mean_blurred_frame and max_abs_blurred_frame were computed in a previous separate loop)
      background_subtracted_frame = (((frame - mean_blurred_frame)/max_abs_blurred_frame) + 1)/2

      #-------------------------------
      # compute the motion frame
      # read frame f-delta, the frame delta frames before the current one
      # NOTE: we'd need to have a way to deal with the case in which frame_idx - delta < 0. For example, we can force it to be at least 0?
      cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx - delta)
      success_delta, frame_delta = cap.read()
      
      # transform the frame f-delta
      frame_delta = cv2.cvtColor(frame_delta, cv2.COLOR_BGR2GRAY)
      blurred_frame_delta = cv2.GaussianBlur(frame_delta, (5, 5), 0)
      
      # compute the motion channel for frame f
      motion_frame = np.abs(blurred_frame - blurred_frame_delta)
      #-------------------------------
      # stack the three channels
      final_frame = np.dstack(
          [
              cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), # original grayscaled image? 
              background_subtracted_frame,
              motion_frame
          ]
      )
      
      # save final frame as file
      Image.fromarray(final_frame).save(path, quality=95)

The sections between dotted lines could be separate functions. Note that we could have a frame selected for labelling with index < delta, and therefore it doesn't have a frame that is delta frames before it. Maybe in that case we can cap it? Not sure if there is a better way, let me know thoughts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCV is a bit odd when you want to fetch a specific frame in a video. Below a snippet to show how it works (from this notebook):

frame_idx = 25  # to read frame with index 25 (in zero indexed system)

# this sets the index for the frame we will read next
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)

# this reads the frame at idx=25 and updates the index to idx=26
success, frame = cap.read()

# you can check that the index is 26 here
print(cap.get(cv2.CAP_PROP_POS_FRAMES))

# detecting motion by finding the differences between frame
# set the delta : frame[i+delta] - frame[i]
for i, frame_offset in enumerate(range(len(frames) - delta)):
if (i + (n_frame - batch_size)) in frame_idx:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, I think there is an issue here with frame_idx. If I understand correctly, it is the list of frames extracted for labelling. But we want to compute the difference between frames that are separated by delta frames in the original video, not in the subset of extracted frames (right?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes frame_idx is the list of frames extracted for labelling. The original code will not have the condition there, and save all the frames. This condition is to make sure we save the correct frame we want as in the frame_idx list. We still compute the difference between frame[i+delta] - frame[I]. So if let's say

frame_idx = [270, ..] 
if 270 in frame_idx:
   frame_images = np.dstack([
   frame[270],
   blurred_frames[270],
   blurred_frames[270+delta] - blurred_frames[270]
])

Is it make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I will check again in regards to the frame indexing you mentioned above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah ok I see!! Thanks for clarifying (I remember now you mentioned this earlier, sorry I forgot!)
I didn't realise blurred_frames is the array with all the frames, not only the ones selected for labelling.

I think maybe the frame indexing approach is more readable, but happy to discuss!

Copy link
Collaborator

@sfmig sfmig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

I made some minor changes, so please can you check everything is still a-ok?

The changes are mainly:

  • I changed the docstrings format from Google to Numpy - we can chat about this if you want
  • I split the larger function computing the stacked inputs into smaller ones

Happy to discuss any thoughts!

"""Function to apply transformation to the frame.
Convert the frame to grayscale and apply Gaussian blurring

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that these docstrings are Google format rather than Numpy format?

See the difference here

# compute the mean
mean_blurred_frame = mean_blurred_frame / frame_counter
# save the mean
cv2.imwrite(f"{Path(vid_file).stem}_mean.jpg", mean_blurred_frame)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, fantastic for sanity check!

I'm thinking in the future we could save it or not based on a flag - I added it as an enhancement issue #16

bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved
bboxes labelling/additional_channels_extraction.py Outdated Show resolved Hide resolved

if not success_delta:
# Break the loop if no more frames to read
print(f"Cannot read frame{frame_idx}+{args.delta}. Exiting...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough for now - basically we won't label any frame whose motion channel cannot be computed. But we need to make sure we don't label those frames by mistake (since it will be useless) - I added another issue for that #17

Comment on lines 132 to 134
background_subtracted_frame_delta = (
((blurred_frame_delta - mean_blurred_frame) / max_abs_blurred_frame) + 1
) / 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we use this a couple of times, maybe this could also be a separate function that receives blurred frame and mean blurred frame, and outputs the background subtracted one (with the normalisation, +1 and division by 2)

(
mean_blurred_frame,
max_abs_blurred_frame,
) = compute_mean_and_max_abs_blurred_frame(cap, args.kernel_size, args.sigmax)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the computation of mean_blurred_frame and max_abs_blurred_frame to their own separate function (to try make functions a bit more compact and focused)

import json


def read_json_file(file_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fantastic! ✨

@nikk-nikaznan nikk-nikaznan merged commit d399c16 into main Aug 7, 2023
1 check passed
@nikk-nikaznan nikk-nikaznan deleted the nikkna/extract_addchannel branch August 7, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants