Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss computation sometimes cause nan values #13416

Open
2 tasks done
tobymuller233 opened this issue Nov 15, 2024 · 2 comments
Open
2 tasks done

Loss computation sometimes cause nan values #13416

tobymuller233 opened this issue Nov 15, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@tobymuller233
Copy link

tobymuller233 commented Nov 15, 2024

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

These days when I'm trying to fine tune my model after pruning by training for several epochs, I found that loss value becomes nan from time to time. By setting breakpoints and checking, I found that there's a bug in metrics.py
Sometimes, if the prediction of some bounding box has a width or height of 0, it turns out to be nan values! Since in CIoU computation, h2 and h1 are used as dividers here.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@tobymuller233 tobymuller233 added the bug Something isn't working label Nov 15, 2024
@UltralyticsAssistant
Copy link
Member

👋 Hello @tobymuller233, thank you for your interest in YOLOv5 🚀! It seems like you're encountering a nan values issue during training, and there might be a potential bug in the metrics.py file. To assist, we'll need a bit more information.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us understand and debug the issue. This would include steps to replicate the bug, relevant sections of your code, and any specific error messages.

Additionally, it would be helpful to know more about your environment setup, such as the version of Python, PyTorch, and any other dependencies you are using.

If you have any further insights, like dataset characteristics or specific conditions that might trigger this issue, do share those as well.

Please note that this is an automated response, and an Ultralytics engineer will review your issue and provide further assistance soon. Thank you for your patience and help in improving YOLOv5! 🚀✨

@pderrenger
Copy link
Member

@tobymuller233 thank you for reporting this potential issue with loss computation. You've identified an important edge case where predictions with zero width or height could cause NaN values during CIoU loss calculation.

Before proceeding with a PR, please verify this behavior using the latest version of YOLOv5 as there have been several loss computation improvements. If you can provide a minimal reproducible example (MRE) following our MRE guide, it would help us investigate the issue more effectively.

For now, you could add a small epsilon value to prevent division by zero in the height calculations. However, we should also investigate why the model is predicting zero-sized bounding boxes during training, as this may indicate other underlying issues with the training process or data.

If you'd like to submit a PR, please ensure it includes:

  1. The MRE demonstrating the issue
  2. Your proposed fix
  3. Test cases verifying the solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants