Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read yolo11 tflite output tensor #13396

Open
1 task done
francesco-clementi-92 opened this issue Nov 1, 2024 · 18 comments
Open
1 task done

How to read yolo11 tflite output tensor #13396

francesco-clementi-92 opened this issue Nov 1, 2024 · 18 comments
Labels
exports Model exports (ONNX, TensorRT, TFLite, etc.) question Further information is requested

Comments

@francesco-clementi-92
Copy link

Search before asking

Question

Hello, I read every online post on how to decode yolo tflite output tensor data.
My current setup is react-native + react-native-vision-camera + vision-camera-resize-plugin + react-native-fast-tflite.

I have done the training of a yolo11n models on a custom dataset and generated first an int8 with nms tflite model, then only an int8, then a simple tflite, but with each one of them my life is getting harder and harder. I cannot use ultralytics on javascript so I need to decode manually the output.

Currenlty I'm trying to run my model which as:
input: tensor: float32[1,320,320,3]
output: tensor: float32[1,5,2100]

So no quantization applied.
Resizing the frame is as easy as doing:

resize(frame, {
          scale: {
            width: 320,
            height: 320
          },
          pixelFormat: 'rgb',
          dataType: 'float32'
        });

Then I run the tflite model:
const outputTensor = model.runSync([data]);

Now the hardest part.

My model has only one class.

From the output, I understand that there is one input batch (1), 5 detection attributes [x, y, width, height, confidence] and 2100 detection candidates.

I don't know why, but it seems that the output is a one dimensional array.

If I only focus on the confidence with this code:

for (let i = 0; i < output[0].length; i += 5) {
          const confidence = predictions[i+4];
}

I get confidence score bigger than 1, so it's not possible.

Any help would be appreciated :)

Additional

No response

@francesco-clementi-92 francesco-clementi-92 added the question Further information is requested label Nov 1, 2024
@UltralyticsAssistant UltralyticsAssistant added the exports Model exports (ONNX, TensorRT, TFLite, etc.) label Nov 1, 2024
@UltralyticsAssistant
Copy link
Member

👋 Hello @francesco-clementi-92, thank you for reaching out about your YOLOv5 inquiry 🚀! We appreciate your interest and your detailed description of the issue you're facing.

For your current situation working with the YOLO model outputs in a React Native environment, it can indeed be tricky to manually decode the TFLite output tensors. Since the output tensor you are referring to seems to be producing unexpected values, such as confidence scores greater than 1, this could be due to the tensor's interpretation or values being misaligned or improperly scaled.

If this is a 🐛 Bug Report, could you please provide a minimum reproducible example? This will help us to understand and potentially replicate the issue.

If this is related to your custom training setup or inference ❓ Question, including additional information like your model configuration, sample inputs, and any logs or errors you're encountering can be very helpful. Also, ensure you have followed best practices for dataset preparation and training.

Requirements

To ensure everything is set up correctly, please use Python>=3.8.0 with all dependencies installed. You can follow these basic steps to set up your environment:

git clone the repository
cd into the directory
pip install dependencies

Environments

Please verify that your environment is up-to-date and configured correctly. YOLOv5 can be run in various environments, including local machines with required packages, or cloud-based solutions where everything is pre-configured.

Status

Ensure that you have the latest updates from the repository, and if you need more assistance, our community and contributors are here to help!

Lastly, we're excited about our latest development, YOLOv8, which might bring significant improvements and added features to your workflow, so consider exploring it as well!

This is an automated response to guide you while an Ultralytics engineer will also assist soon. Feel free to continue providing any additional information that might help us assist you better. 😊

@francesco-clementi-92
Copy link
Author

Just to give an example, these are the first values of the output:
0.62, 0.75, 0.84, 0.93, 0.63, 0.76, 0.83, 0.94, 0.61, 0.74, 0.84, 0.95, 0.62, 0.75, 0.85, 0.95, 0.62, 0.75, 0.85, 0.95, 0.62, 0.75, 0.86, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.88, 0.95, 0.62, 0.74, 0.88, 0.95, 0.62, 0.74, 0.88, 0.95, 0.62, 0.74, 0.88, 0.95, 0.62, 0.74, 0.88, 0.95, 0.62, 0.74, 0.88, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.61, 0.74, 0.87, 0.95, 0.61, 0.74, 0.87, 0.95, 0.61, 0.74, 0.87, 0.95, 0.62, 0.74, 0.87, 0.95, 0.61, 0.74, 0.86, 0.95, 0.61, 0.74, 0.86, 0.95, 0.62, 0.74, 0.86, 0.94, 0.62, 0.75, 0.86, 0.94, 0.62, 0.84, 0.62, 0.87, 0.62, 0.89, 0.63, 0.91, 0.64, 0.94, 0.64, 0.94, 0.64, 0.95, 0.64, 0.95, 0.64, 0.95, 0.64, 0.95, 0.64, 0.95, 0.64, 0.95, 0.64, 0.95, 0.64, 0.94, 0.63, 0.94, 0.63, 0.94, 0.63, 0.94, 0.63, 0.95, 0.67, 0.95, 0.72, 0.95, 0.71, 0.79, 0.84, 0.86, 0.87, 0.86, 0.86, 0.85, 0.84, 0.85, 0.61, 0.61, 0.61, 0.61, 0.61, 0.61, 0.61, 0.61, 0.63, 0.63, 0.64, 0.64, 0.64, 0.63, 0.64, 0.64, 0.66, 0.66, 0.66, 0.66, 0.66, 0.66, 0.66, 0.66, 0.68, 0.68, 0.68, 0.68, 0.68, 0.68, 0.69, 0.69, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.73, 0.74, 0.74, 0.74, 0.74, 0.74, 0.74, 0.74, 0.76, 0.76, 0.76, 0.76, 0.76, 0.76, 0.76, 0.76, 0.78, 0.79, 0.79, 0.79, 0.79, 0.79, 0.79, 0.79, 0.81, 0.81, 0.81, 0.81, 0.81, 0.81, 0.81, 0.81, 0.83, 0.83, 0.83, 0.83, 0.83, 0.83, 0.84, 0.84, 0.87, 0.86, 0.86, 0.86, 0.86, 0.86, 0.87, 0.86, 0.90, 0.90, 0.90, 0.90, 0.90, 0.90, 0.90, 0.88, 0.91, 0.92, 0.92, 0.92, 0.92, 0.92, 0.92, 0.90, 0.93, 0.94, 0.94, 0.94, 0.94, 0.94, 0.94, 0.91, 0.96, 0.97, 0.97, 0.97, 0.97, 0.96, 0.96, 0.94, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.96, 0.60, 0.61, 0.60, 0.65, 0.66, 0.65, 0.64, 0.70, 0.71, 0.69, 0.68, 0.75, 0.76, 0.74, 0.71, 0.80, 0.81, 0.77, 0.75, 0.86, 0.86, 0.79, 0.80, 0.91, 0.91, 0.84, 0.98, 0.94, 0.94, 0.97, 0.77, 0.71, 0.64, 0.90, 0.78, 0.66, 0.93, 0.78, 0.65, 0.92, 0.78, 0.65, 0.91, 0.79, 0.64, 0.91, 0.79, 0.63, 0.89, 0.78, 0.62, 0.88, 0.77, 0.62, 0.87, 0.77, 0.61, 0.87, 0.77, 0.62, 0.87, 0.77, 0.62, 0.88, 0.77, 0.62, 0.88, 0.77, 0.62, 0.88, 0.78, 0.61, 0.88, 0.78, 0.61, 0.88, 0.77, 0.60, 0.88, 0.77, 0.88, 0.77, 0.85, 0.75, 0.74, 0.65, 0.99, 1.00, 1.00, 0.99, 0.99, 1.00, 1.00, 0.99, 1.01, 1.03, 0.60, 0.60, 0.60, 0.61, 0.78, 0.70, 0.90, 0.81, 0.93, 0.86, 0.94, 0.90, 0.95, 0.92, 0.66

@pderrenger
Copy link
Member

@francesco-clementi-92 it looks like the output values you're seeing might be due to incorrect scaling or normalization. Ensure that your model's output is being interpreted correctly, particularly the confidence scores, which should be between 0 and 1. You might need to apply a sigmoid function to the confidence values if they aren't already scaled. If the issue persists, please verify with the latest version of the YOLOv5 package.

@francesco-clementi-92
Copy link
Author

francesco-clementi-92 commented Nov 2, 2024

@pderrenger the biggest problem is that this is the raw output of the model without any post processing:
First 100 values on a blank frame:

0.07709208875894547, 0.09165245294570923, 0.11451394855976105, 0.13115988671779633, 0.1484365314245224, 0.16014990210533142, 0.17435741424560547, 0.19580629467964172, 0.21989166736602783, 0.24521130323410034, 0.2714287042617798, 0.2972862124443054, 0.3223932087421417, 0.34756606817245483, 0.3730508089065552, 0.39814263582229614, 0.4229440689086914, 0.447511225938797, 0.4722057282924652, 0.4971911907196045, 0.5219627618789673, 0.5466349124908447, 0.5716182589530945, 0.597079873085022, 0.621817409992218, 0.6457564830780029, 0.6702650785446167, 0.6963977813720703, 0.7242867350578308, 0.7538096904754639, 0.7815120220184326, 0.8057592511177063, 0.8140174150466919, 0.8274385929107666, 0.8440864086151123, 0.8604170083999634, 0.8708576560020447, 0.887539267539978, 0.9101880788803101, 0.9339421987533569, 0.07525163143873215, 0.09528513252735138, 0.12499171495437622, 0.14248234033584595, 0.16264256834983826, 0.1723497211933136, 0.18563015758991241, 0.20670416951179504, 0.22407814860343933, 0.24771633744239807, 0.27366143465042114, 0.3006134033203125, 0.32692086696624756, 0.3519880175590515, 0.377312034368515, 0.40226417779922485, 0.4272139370441437, 0.4521748423576355, 0.4766745865345001, 0.5012761950492859, 0.525689959526062, 0.550513744354248, 0.5753107070922852, 0.6004764437675476, 0.6250371932983398, 0.6491496562957764, 0.6741708517074585, 0.7003508806228638, 0.730495810508728, 0.7592941522598267, 0.7832859754562378, 0.8056077361106873, 0.807020902633667, 0.8183192610740662, 0.8336712718009949, 0.8487650156021118, 0.8622002601623535, 0.8811202049255371, 0.9119153618812561, 0.9429823756217957, 0.05552614480257034, 0.07274599373340607, 0.10990569740533829, 0.1296156346797943, 0.147246316075325, 0.15442319214344025, 0.162235289812088, 0.18164685368537903, 0.20589706301689148, 0.23072099685668945, 0.25872546434402466, 0.2860266864299774, 0.31157827377319336, 0.3371202349662781, 0.3629078269004822, 0.3879757523536682, 0.4124123156070709, 0.4363778531551361, 0.4607257544994354, 0.48541396856307983

react-native-fast-tflite seems to use c++ to run tensorflow and the output is a onedimensional array.

If I use ultralytics package to run the same model it works really well, so I'm having an hard time understanding how to decode the output. It's a yolo11n trained on a custom dataset exported with:
!yolo export model=/content/runs/detect/train/weights/best.pt format=tflite imgsz=320

Any idea?

@pderrenger
Copy link
Member

It seems like the issue might be related to the lack of post-processing on the raw model output. YOLO models typically require non-max suppression (NMS) to filter out overlapping boxes and extract meaningful detections. Since you're using a custom setup, you might need to implement NMS manually in your React Native environment. If the issue persists, try verifying with the latest version of the YOLOv5 package to ensure compatibility.

@francesco-clementi-92
Copy link
Author

Apply manually NMS is not a problem, but the first step is to understand the output structure

@pderrenger
Copy link
Member

To understand the output structure, you should interpret the output tensor as a series of detection candidates, each consisting of bounding box coordinates and confidence scores. Typically, the format is [x_center, y_center, width, height, confidence] for each detection. You may need to reshape the one-dimensional array into this format to process it correctly.

@francesco-clementi-92
Copy link
Author

francesco-clementi-92 commented Nov 4, 2024

I can try to reshape the array, but I think that looping the one-dimensional array with

for (let i=0;i<outputTensor[0].lenght;i+=5){
  const x = outputTensor[0][i]
  const y = outputTensor[0][i+1]
  const w = outputTensor[0][i+2]
  const h = outputTensor[0][i+3]
  const confidence = outputTensor[0][i+4]
}

Should be the same as reshaping the array, or am I missing something?

Also if you can notice from the output, the values are incremental, is it normal?

In the meantime thanks for your time :)

@pderrenger
Copy link
Member

Looping through the array as you've shown is correct for extracting the values, but ensure you're interpreting them correctly. The incremental values might indicate an issue with the model's output or input preprocessing. Verify your input data is correctly normalized and matches the model's expected input format.

@Throws100
Copy link

Hi Francesco,
I am currently facing the same problem. Did you manage to convert the raw output into a different format?
I would be super interested in your code, as I do not know exactly how to post-process the output data. Thanks:)

@pderrenger
Copy link
Member

Hi, to convert the raw YOLO output into a usable format, you should first reshape or loop through the tensor to extract [x_center, y_center, width, height, confidence], then apply filtering such as confidence thresholds and post-processing like Non-Max Suppression (NMS). If you're working with TFLite, ensure normalization and input/output formats match the model's requirements. For guidance on exporting and interpreting YOLO models, check this resource for additional context.

@BotiCb
Copy link

BotiCb commented Nov 29, 2024

Hello! I'm facing the same issues. I tried processing the raw output of the tflite model as you described but even with covered camera sensor I get a lot of confidence scores above 0.5 and even higher than 1. I use exactly the same setup as Francesco. If I run the model in Python it works fine and also the React Native environment works well with another tflite model.

@francesco-clementi-92
Copy link
Author

francesco-clementi-92 commented Nov 29, 2024

Yolo11 didn't work for me at all. I ended up using yolo5 with int8 quantization and convert the raw output on android.

This is the code I use to perform inference on a frame processor:

override fun callback(frame: Frame, arguments: Map<String, Any>?): Any? {
    val width = frame.width
    val height = frame.height
    val bmFrame = BitmapUtils.getBitmap(frame)
    val size = Pair(640, 640)
    val tensorImage = preprocessImage(bmFrame, size)

    val outputShape = interpreter.getOutputTensor(0).shape() // [1, 25200, 6]
    val numDetections = outputShape[1]
    //Log.i(TAG, "numDetections $numDetections")
    val numClasses = outputShape[2] - 5

    val outputBuffer = ByteBuffer.allocateDirect(numDetections * outputShape[2]).order(
      ByteOrder.nativeOrder()
    )

    interpreter.run(tensorImage, outputBuffer)

    val boxes = mutableListOf<RectF>()
    val scores = mutableListOf<Float>()
    val classIds = mutableListOf<Int>()
    val threshold = 0.6f

    for (i in 0 until numDetections) {
      val offset = i * (numClasses + 5)

      // Get raw values first
      val rawXCenter = outputBuffer.get(offset).toUByte().toFloat()
      val rawYCenter = outputBuffer.get(offset + 1).toUByte().toFloat()
      val rawWidth = outputBuffer.get(offset + 2).toUByte().toFloat()
      val rawHeight = outputBuffer.get(offset + 3).toUByte().toFloat()
      val rawScore = outputBuffer.get(offset + 4).toUByte().toFloat()  // This is objectness score

      val scaleThreshold = interpreter.getOutputTensor(0).quantizationParams().scale

      val xCenter = rawXCenter * scaleThreshold
      val yCenter = rawYCenter * scaleThreshold
      val boxWidth = rawWidth * scaleThreshold
      val boxHeight = rawHeight * scaleThreshold
      val score = rawScore * scaleThreshold


      if (score > threshold) {
        // Convert normalized coordinates to image coordinates
        val xCenterPx = xCenter * width
        val yCenterPx = yCenter * height
        val boxWidthPx = boxWidth * width
        val boxHeightPx = boxHeight * height

        val xMin = (xCenterPx - boxWidthPx/2)
        val yMin = (yCenterPx - boxHeightPx/2)
        val xMax = (xCenterPx + boxWidthPx/2)
        val yMax = (yCenterPx + boxHeightPx/2)

        Log.d(TAG, """
            Valid detection found:
            Final Box: ($xMin, $yMin) to ($xMax, $yMax)
        """.trimIndent())

        if (xMin < xMax && yMin < yMax) {
          boxes.add(
            RectF(
              xMin.coerceIn(0f, xMin),
              yMin.coerceIn(0f, yMin),
              xMax.coerceIn(0f, size.first.toFloat()),
              yMax.coerceIn(0f, size.first.toFloat()),
            )
          )
          scores.add(score)
          //Log.d(TAG, "✅ Box added to results")
        }
      }
    }

    // After the loop, log summary
    Log.d(
      TAG, """
          Total detections processed: $numDetections
          Detections passed thresholds: ${boxes.size}
          Final boxes: ${boxes.joinToString()}
          Final scores: ${scores.joinToString()}
          Final classIds: ${classIds.joinToString()}
      """.trimIndent()
    )


    if (boxes.isNotEmpty()) {
      Log.i(TAG, "Found ${boxes.size} valid detections before NMS")

      // Apply NMS
      val nmsThreshold = 0.4f
      val nmsIndices = nonMaxSuppression(boxes, scores, nmsThreshold)

      val finalBoxes = nmsIndices.map { boxes[it] }

      Log.i(TAG, "After NMS: ${finalBoxes.size} detections")

The code is still ongoing and needs to be optimized

@pderrenger
Copy link
Member

Thank you for sharing your detailed workflow! It seems like you’ve got a well-structured approach to handling YOLOv5 with INT8 quantization and processing the output manually in Android. For those looking to decode YOLO TFLite outputs, your implementation provides a great starting point. As you've highlighted, applying quantization scales and handling post-processing steps (e.g., NMS) are critical. If you encounter any specific issues or optimizations around TensorFlow Lite, feel free to consult Ultralytics documentation or discuss here. Great work adapting the model to your needs—credit to the community and Ultralytics for enabling versatile solutions!

@BotiCb
Copy link

BotiCb commented Nov 29, 2024

I managed the solve the issue. Since we have this model output: float32[1,5,2100] this method will not work:
for (let i=0;i<outputTensor[0].lenght;i+=5){
const x = outputTensor[0][i]
const y = outputTensor[0][i+1]
const w = outputTensor[0][i+2]
const h = outputTensor[0][i+3]
const confidence = outputTensor[0][i+4]
}
This should work if the model output would be [1,2100, 5]. We have to get those values as if the output matrix have been transposed.
Consequently the first 2100 values represent the xc values of every detection, the second 2100 values the yc values and so on. Here is the implementation: (numDetections=2100)

for (let i = 0; i < numDetections; i++) {
const xc = outputTensor[0][i];
const yc = outputTensor[0][i + numDetections];
const w = outputTensor[0][i + numDetections * 2];
const h = outputTensor[0][i + numDetections * 3];
const confidence = outputTensor[0][i + numDetections * 4];
detections.push({
xc,
yc,
w,
h,
confidence,
});
}

@pderrenger
Copy link
Member

Thank you for sharing your solution! You're absolutely correct—when the output format is float32[1,5,2100], the values are grouped by attribute (e.g., all xc first, then yc, etc.). Your updated implementation properly handles this transposed structure. This insight will be very useful to others working with similar outputs. Great job resolving the issue!

@francesco-clementi-92
Copy link
Author

I managed the solve the issue. Since we have this model output: float32[1,5,2100] this method will not work: for (let i=0;i<outputTensor[0].lenght;i+=5){ const x = outputTensor[0][i] const y = outputTensor[0][i+1] const w = outputTensor[0][i+2] const h = outputTensor[0][i+3] const confidence = outputTensor[0][i+4] } This should work if the model output would be [1,2100, 5]. We have to get those values as if the output matrix have been transposed. Consequently the first 2100 values represent the xc values of every detection, the second 2100 values the yc values and so on. Here is the implementation: (numDetections=2100)

for (let i = 0; i < numDetections; i++) { const xc = outputTensor[0][i]; const yc = outputTensor[0][i + numDetections]; const w = outputTensor[0][i + numDetections * 2]; const h = outputTensor[0][i + numDetections * 3]; const confidence = outputTensor[0][i + numDetections * 4]; detections.push({ xc, yc, w, h, confidence, }); }

This explain also why the values are incremental! Thanks, makes definitely sense!

@pderrenger
Copy link
Member

Thank you for sharing your findings and implementation! Your explanation about handling the transposed output structure of float32[1,5,2100] will be immensely useful for others encountering the same issue. Great work troubleshooting and adapting your approach—credit goes to the community and Ultralytics for making this type of collaboration possible!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exports Model exports (ONNX, TensorRT, TFLite, etc.) question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants