Transloadit

Recognize text in images (OCR)

🤖/image/ocr recognizes text in images and returns it in a machine-readable format.

With this Robot you can detect and extract text from images using optical character recognition (OCR).

For example, you can use the results to obtain the content of traffic signs, name tags, package labels and many more. You can also pass the text down to other Robots to filter images that contain (or do not contain) certain phrases. For images of dense documents, results may vary and be less accurate than for small pieces of text in photos.

Usage example

Recognize text in an uploaded image and save it to a text file:

{
  "steps": {
    "recognized": {
      "robot": "/image/ocr",
      "use": ":original",
      "provider": "gcp",
      "format": "text"
    }
  }
}

Parameters

  • output_meta

    Record<string, boolean> | boolean

    Allows you to specify a set of metadata that is more expensive on CPU power to calculate, and thus is disabled by default to keep your Assemblies processing fast.

    For images, you can add "has_transparency": true in this object to extract if the image contains transparent parts and "dominant_colors": true to extract an array of hexadecimal color codes from the image.

    For videos, you can add the "colorspace: true" parameter to extract the colorspace of the output video.

    For audio, you can add "mean_volume": true to get a single value representing the mean average volume of the audio file.

    You can also set this to false to skip metadata extraction and speed up transcoding.

  • result

    boolean (default: false)

    Whether the results of this Step should be present in the Assembly Status JSON

  • queue

    "batch"

    Setting the queue to 'batch', manually downgrades the priority of jobs for this step to avoid consuming Priority job slots for jobs that don't need zero queue waiting times

  • force_accept

    boolean (default: false)
      Force a Robot to accept a file type it would have ignored.
    

    By default Robots ignore files they are not familiar with. 🤖/video/encode, for example, will happily ignore input images.

    With the force_accept parameter set to true you can force Robots to accept all files thrown at them. This will typically lead to errors and should only be used for debugging or combatting edge cases.

  • use

    string | Array<string> | Array<object> | object

    Specifies which Step(s) to use as input.

    • You can pick any names for Steps except ":original" (reserved for user uploads handled by Transloadit)
    • You can provide several Steps as input with arrays:
      {
        "use": [
          ":original",
          "encoded",
          "resized"
        ]
      }
      
  • provider

    · required

    Which AI provider to leverage.

    Transloadit outsources this task and abstracts the interface so you can expect the same data structures, but different latencies and information being returned. Different cloud vendors have different areas they shine in, and we recommend to try out and see what yields the best results for your use case.

    AWS supports detection for the following languages: English, Arabic, Russian, German, French, Italian, Portuguese and Spanish. GCP allows for a wider range of languages, with varying levels of support which can be found on the official documentation.

  • granularity

    "full" | "list" (default: "full")

    Whether to return a full response including coordinates for the text ("full"), or a flat list of the extracted phrases ("list"). This parameter has no effect if the format parameter is set to "text".

  • format

    "json" | "meta" | "text" (default: "json")

    In what format to return the extracted text.

    • "json" returns a JSON file.
    • "meta" does not return a file, but stores the data inside Transloadit's file object (under ${file.meta.recognized_text}, which is an array of strings) that's passed around between encoding Steps, so that you can use the values to burn the data into videos, filter on them, etc.
    • "text" returns the recognized text as a plain UTF-8 encoded text file.

Demos

Related blog posts