Hacktoberfest 2024 | Google Vision OCR 🤝 Workflows #692

PawelPeczek-Roboflow · 2024-09-30T12:15:40Z

Google Vision OCR in Workflows

Are you ready to make a meaningful contribution this Hacktoberfest? We are looking to integrate Google Vision OCR into our Workflows ecosystem! This new OCR block, will be a valuable addition, addressing a common challenge that many users face.

Join us in expanding our ecosystem and empowering users to effortlessly extract text and structure from their documents. Whether you’re a seasoned contributor or new to open source, your skills and ideas can help make this project a success. Let’s collaborate and bring this essential functionality to life!

🚧 Task description 🏗️

The task is to integrate OCR from Google Vision API into Workflows ecosystem
API should be adopted in a way that allow sending API key as Workflow input parameter, rather than using Google service account credentials - see Google Vision auth docs
We prefer light integration to REST API through requests library - 📖 REST API docs - in particular this may be useful - we do only want to enable TEXT_DETECTION and DOCUMENT_TEXT_DETECTION
output should be parsed into sv.Detections(...) object - recognised text should be label, additional metadata about structure (like category of region) should be added into data field of sv.Detections(...)
please raise any issues with the task in the discussion below

Cheatsheet

Contributor guide
Workflows docs
Creating Workflow block - tutorial
Workflow Kinds, in particular the kind wrapping sv.Detections(...) for object-detection predictions as a reference

Scaffolding for the block

💻 Code snippet

from typing import List, Literal, Optional, Type, Union

from pydantic import ConfigDict
import supervision as sv
import requests

from inference.core.workflows.execution_engine.entities.base import (
    OutputDefinition,
    WorkflowImageData,
)
from inference.core.workflows.execution_engine.entities.types import (
    StepOutputImageSelector,
    WorkflowImageSelector,
    OBJECT_DETECTION_PREDICTION_KIND,
)
from inference.core.workflows.prototypes.block import (
    BlockResult,
    WorkflowBlock,
    WorkflowBlockManifest,
)


class BlockManifest(WorkflowBlockManifest):
    model_config = ConfigDict(
        json_schema_extra={
            "name": "Google Vision OCR",
            "version": "v1",
            "short_description": "TODO",
            "long_description": "TODO",
            "license": "Apache-2.0",
            "block_type": "model",
        },
        protected_namespaces=(),
    )
    type: Literal["roboflow_core/google_vision_ocr@v1"]
    image: Union[WorkflowImageSelector, StepOutputImageSelector]
    ocr_type: Literal["text_detection", "ocr_text_detection"]

    @classmethod
    def describe_outputs(cls) -> List[OutputDefinition]:
        return [
            OutputDefinition(
                name="predictions", kind=[OBJECT_DETECTION_PREDICTION_KIND]
            ),
        ]

    @classmethod
    def get_execution_engine_compatibility(cls) -> Optional[str]:
        return ">=1.0.0,<2.0.0"


class RoboflowObjectDetectionModelBlockV1(WorkflowBlock):

    @classmethod
    def get_manifest(cls) -> Type[WorkflowBlockManifest]:
        return BlockManifest

    def run(
        self,
        image: WorkflowImageData,
        ocr_type: Literal["text_detection", "ocr_text_detection"]
    ) -> BlockResult:
        results = requests.post(...)
        return {
            "predictions": sv.Detections(...)
        }

The text was updated successfully, but these errors were encountered:

brunopicinin · 2024-10-01T05:26:21Z

I forked the project and started to develop a new block, but one thing is not clear to me.

Given the following image: https://testsigma.com/blog/wp-content/uploads/What-is-the-OCR-Test-How-to-Create-Automate-It.png

Passing this image to Google API as such:

POST https://vision.googleapis.com/v1/images:annotate?key=[YOUR_API_KEY] HTTP/1.1

Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
Content-Type: application/json

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://testsigma.com/blog/wp-content/uploads/What-is-the-OCR-Test-How-to-Create-Automate-It.png"
        }
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}

Results in the following response:

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "OCR test\nOCR",
          "boundingPoly": {
            "vertices": [
              {
                "x": 265,
                "y": 261
              },
              {
                "x": 940,
                "y": 261
              },
              {
                "x": 940,
                "y": 324
              },
              {
                "x": 265,
                "y": 324
              }
            ]
          }
        },
        {
          "description": "OCR",
          "boundingPoly": {
            "vertices": [
              {
                "x": 265,
                "y": 281
              },
              {
                "x": 382,
                "y": 282
              },
              {
                "x": 382,
                "y": 321
              },
              {
                "x": 265,
                "y": 320
              }
            ]
          }
        },
        {
          "description": "test",
          "boundingPoly": {
            "vertices": [
              {
                "x": 396,
                "y": 282
              },
              {
                "x": 505,
                "y": 283
              },
              {
                "x": 505,
                "y": 322
              },
              {
                "x": 396,
                "y": 321
              }
            ]
          }
        },
        {
          "description": "OCR",
          "boundingPoly": {
            "vertices": [
              {
                "x": 756,
                "y": 261
              },
              {
                "x": 940,
                "y": 262
              },
              {
                "x": 940,
                "y": 324
              },
              {
                "x": 756,
                "y": 323
              }
            ]
          }
        }
      ],
      "fullTextAnnotation": {
        ...
      }
    }
  ]
}

Should the block output sv.Detections(...) with the full text match only, the word matches only, or both?

PawelPeczek-Roboflow · 2024-10-01T08:51:34Z

Hi @brunopicinin,
At first, thanks for taking the challenge 💪

Regarding the question - good point, I believe that it would be good to have Workflow block output that would simply dump the whole recognised text + output with sv.Detections(...) that would denote each parsed region

brunopicinin · 2024-10-02T00:28:56Z

Created a PR for this issue: #709

PawelPeczek-Roboflow · 2024-10-02T07:25:21Z

Amazing 💪 taking review now

PawelPeczek-Roboflow · 2024-10-02T08:00:09Z

posted PR review, great thanks for contribution

PawelPeczek-Roboflow added Hacktoberfest 2024 good first issue Good for newcomers labels Sep 30, 2024

PawelPeczek-Roboflow mentioned this issue Oct 1, 2024

OCR Block v2 #706

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hacktoberfest 2024 | Google Vision OCR 🤝 Workflows #692

Hacktoberfest 2024 | Google Vision OCR 🤝 Workflows #692

PawelPeczek-Roboflow commented Sep 30, 2024 •

edited

Loading

brunopicinin commented Oct 1, 2024

PawelPeczek-Roboflow commented Oct 1, 2024

brunopicinin commented Oct 2, 2024

PawelPeczek-Roboflow commented Oct 2, 2024

PawelPeczek-Roboflow commented Oct 2, 2024

Hacktoberfest 2024 | Google Vision OCR 🤝 Workflows #692

Hacktoberfest 2024 | Google Vision OCR 🤝 Workflows #692

Comments

PawelPeczek-Roboflow commented Sep 30, 2024 • edited Loading

Google Vision OCR in Workflows

🚧 Task description 🏗️

Cheatsheet

Scaffolding for the block

brunopicinin commented Oct 1, 2024

PawelPeczek-Roboflow commented Oct 1, 2024

brunopicinin commented Oct 2, 2024

PawelPeczek-Roboflow commented Oct 2, 2024

PawelPeczek-Roboflow commented Oct 2, 2024

PawelPeczek-Roboflow commented Sep 30, 2024 •

edited

Loading