Invoice with metadata
Extracting additional metadata from invoices with examples in Python, Node, Java, and C#
Overview
This guide covers how to extract metadata from multiple supplier invoices with examples in Python, Node, Java, and C#.
You will extract the following fields:
- Name of the supplier
- Name of the receiver
- Invoice number
- Purchase order number
- Issue date
- Pay due date
- Total amount
This guide shows you how to
- Create invoice-metadata document type
- Add multiple suppliers
- Execute training
- Extract data from documents
- Continuously improve models after extraction
Get your API Key
The Authorization header for your API key is: <<apiKey>>
(Login if you do not see one).
You can also obtain the API key by visiting the Settings page.
1. Create a new document type
Before you start extracting data, you need to define a document type. Navigate to the Dashboard page and click on the New document type button in the top right corner of the table. Next, select the Metadata invoice card. The wizard will already pre-fill all the needed extraction fields along with the document type configuration.
Click on the Create document type.
This will create a new document type with name metadata-invoice with the following fields:
- supplier_name
- invoice_number
- purchase_order_number
- receiver_name
- issue_date
- pay_due_date
- total_amount
2. Add suppliers
typless is a tool for automation. That's why you need to fill the data set and train it first. To automate a new supplier you need to first add its invoices to the data set. Download invoices - one from Awesome Company and another from Good services.
To add document to the dataset use the add-document endpoint or use training room where you can easily upload a file and fill out necessary infromation.
Dataset is created by uploading an original file with the correct value for each field defined inside the document type:
As you can see, to achieve high accuracy typless only needs the values that are on the document. Nevertheless, there are some rules to keep in mind when providing values.
Applying these to Amazing Company example we changed three fields:
- total_amount value was converted with number type rules from 15,00 to 15.0000
- issue_date value was converted with date type rules from the word Feb 1, 2021 to 2021-02-01
- pay_due_date value was converted with date type rules from the word Mar 31, 2021 to 2021-03-31
The same rules were also applied to the Good services example.
You will have two suppliers added to your document type after you ran both code examples.
3. Execute training
Training is executed automatically every day at 10 PM CET
For all of your suppliers with new documents in the dataset of all your document types.
Free of charge
To immediately the see results you can trigger the training process on the Dashboard page. Look for the metadata-invoice document type in the list, and click on the .
The training for the example should finish in a matter of seconds.
Need more information? Read more about training.
4. Extract data from documents
After the training is finished, you can start precisely extracting data from documents from trained suppliers. Here you have two new invoices from the trained suppliers:
Download them and extract the data using the recipes:
Need a more in-depth explanation of the response?
You can read about it here.
5. Continuously improve models
typless embraces the fact that the world is changing all the time.
That's why you can improve models on the fly by providing correct data after extraction.
Let's say your company has a new partner Best supplier. You don't need to start over with building the dataset. You can simply extract and send the correct data after they are verified by your users.
You can learn more about providing feedback on the building dataset page.
Closed workflow loop - improve models live!
Use every action from your users to adapt and improve typless models without any extra costs.
To send feedback use the add-document-feedback with object_id.
Running typless live
The only thing that you need to do to automate your manual data entry is to integrate those simple API calls into your system.
Have any questions or you need some help? Contact us in chat.
Updated 8 months ago