Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy-Paste augmentation #12599

Open
wants to merge 79 commits into
base: master
Choose a base branch
from
Open

Copy-Paste augmentation #12599

wants to merge 79 commits into from

Conversation

Arno1235
Copy link

@Arno1235 Arno1235 commented Jan 8, 2024

Currently the Copy-Paste augmentation only flips the copied object and places it if it doesn't overlap too much.
This code places the copied object randomly on the image and places it if it doesn't overlap too much (like the cited paper explains https://arxiv.org/abs/2012.07177).

Possible improvements:

  • The copied object could also be augmented (flip, scale, ...) before placing it on the image.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

📊 Key Changes

  • Added shift_array function to handle image translation.
  • Improved copy_paste augmentation method to include random translation with boundary checks and segment translation.

🎯 Purpose & Impact

The changes introduce a more diverse Copy-Paste augmentation which can enhance model robustness by training it on images with objects pasted in variable positions. It makes the training process closer to real-world scenarios where objects can appear anywhere in the frame, thus helping the model generalize better. This could potentially improve object detection accuracy in unseen data.

🌟 Summary

Implemented enhanced Copy-Paste augmentation for better object detection model training. 🎨✂️📌

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @Arno1235, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with ultralytics/yolov5 master branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge master locally.
  • ✅ Verify all YOLOv5 Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

@glenn-jocher
Copy link
Member

@Arno1235 hello!

Thank you for your interest in YOLOv5 and for bringing up the Copy-Paste augmentation. Your suggestion to enhance the augmentation by including additional transformations like flipping and scaling is indeed in line with the cited paper and could potentially improve the robustness of the model.

We always welcome contributions from the community. If you're interested in implementing these improvements, feel free to fork the repo, make your changes, and submit a pull request. We'll be happy to review it. For guidelines on contributing, you can refer to our documentation.

Keep in mind that any changes should be thoroughly tested to ensure they benefit the model's performance without introducing unexpected behavior.

Thanks again for your input, and we look forward to any contributions you might make! 😊🚀

@glenn-jocher
Copy link
Member

@Arno1235 this looks good, but one of the main issues may be speed. It looks like you have 2 cv2.warpaffine() calls in the innermost part of the for loops, which means that these will run very many times, and likely lead to very significant augmentation compute burden.

@Arno1235
Copy link
Author

Arno1235 commented Jan 9, 2024

Hi @glenn-jocher, thanks for the quick response!

You're right.
After some testing I found that instead of using cv2.warpaffine() you can just shift the arrays, and this is as fast (even a little faster than the original cv2.flip() function).

The code for shifting the array looks like this:

def shift_array(im, move_x, move_y, fill_value=0):
    result = np.empty_like(im)

    if move_y > 0:
        result[:move_y, :] = fill_value
        if move_x > 0:
            result[:, :move_x] = fill_value
            result[move_y:, move_x:] = im[:-move_y, :-move_x]
        elif move_x < 0:
            result[:, move_x:] = fill_value
            result[move_y:, :move_x] = im[:-move_y, -move_x:]
        else:
            result[move_y:, :] = im[:-move_y, :]
    elif move_y < 0:
        result[move_y:, :] = fill_value
        if move_x > 0:
            result[:, :move_x] = fill_value
            result[:move_y, move_x:] = im[-move_y:, :-move_x]
        elif move_x < 0:
            result[:, move_x:] = fill_value
            result[:move_y, :move_x] = im[-move_y:, -move_x:]
        else:
            result[:move_y, :] = im[-move_y:, :]
    else:
        if move_x > 0:
            result[:, :move_x] = fill_value
            result[:, move_x:] = im[:, :-move_x]
        elif move_x < 0:
            result[:, move_x:] = fill_value
            result[:, :move_x] = im[:, -move_x:]
        else:
            result[:, :] = im[:, :]
    
    return result

I tested the functionality and speed with the following program:

import numpy as np
import cv2
import time
import random


def warp_affine(im, move_x, move_y, w, h):
    result = cv2.warpAffine(im, np.float32([[1, 0, move_x], [0, 1, move_y]]), (w, h))
    return result


def flip(im):
    result = cv2.flip(im, 1)
    return result


def shift_array(im, move_x, move_y, w, h, fill_value=0):
    result = np.empty_like(im)

    if move_y > 0:
        result[:move_y, :] = fill_value
        if move_x > 0:
            result[:, :move_x] = fill_value
            result[move_y:, move_x:] = im[:-move_y, :-move_x]
        elif move_x < 0:
            result[:, move_x:] = fill_value
            result[move_y:, :move_x] = im[:-move_y, -move_x:]
        else:
            result[move_y:, :] = im[:-move_y, :]
    elif move_y < 0:
        result[move_y:, :] = fill_value
        if move_x > 0:
            result[:, :move_x] = fill_value
            result[:move_y, move_x:] = im[-move_y:, :-move_x]
        elif move_x < 0:
            result[:, move_x:] = fill_value
            result[:move_y, :move_x] = im[-move_y:, -move_x:]
        else:
            result[:move_y, :] = im[-move_y:, :]
    else:
        if move_x > 0:
            result[:, :move_x] = fill_value
            result[:, move_x:] = im[:, :-move_x]
        elif move_x < 0:
            result[:, move_x:] = fill_value
            result[:, :move_x] = im[:, -move_x:]
        else:
            result[:, :] = im[:, :]
    
    return result


if __name__ == "__main__":

    iterations = 100_000

    im = cv2.imread("input.png")
    print(f"Image shape: {im.shape}")

    h, w, c = im.shape
    

    # Compare functionality

    moves_to_test = [
        (0, 0),
        (0, 10),
        (0, -10),

        (10, 0),
        (10, 10),
        (10, -10),

        (-10, 0),
        (-10, 10),
        (-10, -10),
    ]

    for move_x, move_y in moves_to_test:
        np.testing.assert_array_equal(warp_affine(im, move_x, move_y, w, h), shift_array(im, move_x, move_y, w, h))


    # Compare timings

    t1 = time.time_ns()

    for _ in range(iterations):

        flip(im)

    print(f"flip: {(time.time_ns() - t1)/1e9} s")


    t1 = time.time_ns()

    for _ in range(iterations):
        move_x = random.randint(-w, w)
        move_y = random.randint(-h, h)

        warp_affine(im, move_x, move_y, w, h)

    print(f"warp: {(time.time_ns() - t1)/1e9} s")


    t1 = time.time_ns()

    for _ in range(iterations):
        move_x = random.randint(-w, w)
        move_y = random.randint(-h, h)

        shift_array(im, move_x, move_y, w, h)

    print(f"shift: {(time.time_ns() - t1)/1e9} s")

This gives output:

Image shape: (640, 640, 3)
flip: 4.852631847 s
warp: 47.472902211 s
shift: 3.944873514 s

Do you think this is good enough?

If the for loop concerns you, I could also do one random translation and check what translated objects are in the image and don't overlap with other objects and copy those (keeping the chance value p in mind).

@glenn-jocher
Copy link
Member

Hi @Arno1235,

Great work on optimizing the augmentation process! It's impressive to see that your shift_array function is not only functionally equivalent to warp_affine but also faster. This is a valuable improvement, as efficiency is key when training models.

Your benchmarking results are promising, and it seems like your approach could be a good fit for the YOLOv5 project. If you've ensured that the functionality is consistent and that there are no edge cases or bugs, this could indeed be good enough to consider integrating.

Regarding the for loop, your idea to perform a single random translation and then check for overlaps is a good one. It could further optimize the process by reducing the number of operations needed.

If you're ready, you might want to proceed by submitting a pull request with your changes. Make sure to include your test cases and performance benchmarks so that we can review the full impact of your contribution.

Thanks for your dedication to improving YOLOv5! 😊👍

@Arno1235
Copy link
Author

Hi @glenn-jocher,

I implemented the array shifting and made it only do a single translation in the code.
How can I include my test cases and performance benchmarks in the code?

@glenn-jocher
Copy link
Member

Hi @Arno1235,

Fantastic to hear that you've implemented the array shifting with a single translation! To include your test cases and performance benchmarks, you can follow these steps:

  1. Documenting in Code Comments: Include inline comments in your code explaining the purpose of each test case and the expected outcomes. For performance benchmarks, you can add comments on top of the functions or in a separate block to explain the performance gains observed.

  2. Unit Tests: If you've written unit tests, you can include them in the tests directory of the YOLOv5 repository. Make sure they follow the structure and style of existing tests.

  3. Performance Benchmarks: For performance benchmarks, you can create a markdown file or a section in the existing documentation that details your benchmarking methodology, the environment in which the tests were run (hardware, software versions, etc.), and the results you obtained.

  4. Pull Request Description: When you submit your pull request, use the description to provide a summary of the changes, the rationale behind them, and the impact on performance. You can include snippets of your benchmark results here as well.

  5. Commit Messages: Write clear and descriptive commit messages for each of your changes. This helps reviewers understand the context of each change and makes the revision history more informative.

Remember to ensure that your tests are reproducible and that your benchmarks accurately reflect the performance improvements. This will help the reviewers during the pull request process.

Looking forward to seeing your contribution! 😊🚀

@Arno1235
Copy link
Author

Hi @glenn-jocher

  1. I added comments to my code.
  2. I did not write any unit tests and don't see a tests directory in the repository.
  3. I don't see any performance benchmarks for other augmentations.

I think this pull request is ready to be reviewed and merged if it is approved.
Is there anything else you need from me?

Thanks

@glenn-jocher
Copy link
Member

Hi @Arno1235,

Thank you for adding comments to your code and for preparing your pull request. Here's what you can do next:

  1. Pull Request (PR): Go ahead and submit your PR if you haven't already. Make sure to provide a clear and detailed description of your changes, the reasoning behind them, and any performance improvements you've observed.

  2. Unit Tests: While there may not be a dedicated tests directory, it's good practice to include tests for new functionality. You can create a new test file that follows the naming convention of existing files and includes tests for your new augmentation method.

  3. Performance Benchmarks: If there are no existing performance benchmarks for augmentations, you can still include your benchmark results in the PR description. This will provide evidence of the efficiency gains from your changes.

  4. Documentation: If your changes are significant, consider updating the relevant documentation to reflect the new augmentation behavior. This helps users understand and utilize the new feature correctly.

Once you've submitted your PR, the maintainers will review your changes. They may request additional changes or clarifications, so be prepared to engage in the review process.

It sounds like you've done a thorough job, and if everything is in order, there shouldn't be anything else you need to do for now. Just be responsive to any feedback you might receive during the review process.

Thanks for your contribution, and we're looking forward to reviewing your work! 😊👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants