Asynchronous extraction

Extract data asynchronously

Most of the times processing of the documents is not time critical.
That is why we also provide an asynchronous endpoint for processing the documents.
Currently the processing is handled with the process - poll method, meaning you will have to check on intervals if the document processing has finished.

πŸ“˜

Use webhooks to recieve a notification when data extraction is finished

To optimize the asynchronous document processing implement webhooks to never poll for data again!
Check out the Webhook section on how to get started.

The request for asynchronous processing is the same as the synchronous extract data request, the only difference is, that you will immediately get the response with the extraction_id of the process.
You will then use this extraction_id to poll for the status and results of the extraction.

You can try out the async extraction with the following recipe - there are currently only examples in Python, other languages will be added soon.



If the process trigger was successfully, you will get a HTTP 202 Accepted response with a body which will contain the extraction_id of the asynchronous process.
{
    "extraction_id": "0d14338251a6db69bfec36face27f7edcab7322"
}

To poll the data you can then use the extraction_id from the response :

You will always get a successful response from the poll endpoint (if a catastrophe didn't happen!)
The polled data response will always have the same format with the following properties:

  • error
  • result
  • status

Example:

{
  "error": {},
  "result": {
    "customer": "customer-id",
    "extracted_fields": [
      {
        "data_type": "AUTHOR",
        "name": "supplier_name",
        "values": [
          {
            "confidence_score": 0.958,
            "height": -1,
            "page_number": -1,
            "value": "ScaleGrid",
            "width": -1,
            "x": -1,
            "y": -1
          }
        ]
      }, ...
    ],
    "file_name": "invoice.pdf",
    "line_items": [],
    "object_id": "0d143385c4fb3ec7b73256be40c4ce02b01bf097",
    "vat_rates": []
  },
  "status": "SUCCESS"
}

The error property will include any errors that might occur during the processing part. Most errors will be related to the input file if it was not valid. The errors will have the standard error format which also occurs on all the other endpoints with properties:

  • code
  • message
  • details

The result property will be an empty object if the processing was not finished.
After the process is completed it will include the results of the extraction in the same format as the synchronous endpoint. You can read more about the response here.

The status property will include the current status of the process. It has 4 predefined states:

  • IN_PROGRESS
  • SUCCESS
  • ERROR
  • EXPIRED

πŸ“˜

EXPIRED status

The process gets an EXPIRED status 48 hours after the process has finished.
This means that you have 48 hours to poll the data and access the results. Afterwards the data will be deleted.