OCR
Extract text from images
OCR, or Optical Character Recognition, is the process of extracting text from images.Most OCR models will return the text in the image along with the bounding boxes of the text, which represent where the text is located in the image.
Running OCR with supercontrast
is easy, all you have to do is provide the image URL and the provider you want to use.
In this example, we’ll specify that for Task.OCR
, we want to use Provider.GCP
, which uses Google’s Vision AI API.
Each task has it’s own request and response schema. For Task.OCR
, the request schema is defined by OCRRequest
and the response schema is defined by OCRResponse
. The OCRResponse
has a list of OCRBoundingBox
objects, each representing a bounding box of text in the image.
image
: a string that can be either a URL to an image or a local file path.
all_text
: a string of all the text in the imagebounding_boxes
: a list ofOCRBoundingBox
objects, each representing a bounding box of text in the image
text
: a string of the text in the bounding boxcoordinates
: a list of tuples, each representing a coordinate in the bounding box