Entity Extraction in KAPTO: Simple and Unbounded Fields

KAPTO brings a powerful feature to the table – the ability to identify and extract entities from documents based on a predefined entity schema. This process, known as entity extraction, involves pinpointing and categorizing specified entities such as names, dates, or custom-defined categories within a document.

The extraction model in KAPTO is specifically trained to recognize these entities. Provided with an entity schema – a structured blueprint of the types of entities to identify – KAPTO's extraction model sifts through a document, recognizes these entities, and extracts their corresponding values.

The extracted entity information is made easily accessible through a specific API in KAPTO. This API is designed to fetch the identified entities and their respective values from a document. Users can make a simple API call to retrieve a detailed list of entities discovered in a document, along with their associated values.

It's important to note that within the entity schema, fields can be designated as "simple" or "unbounded".

  • Simple Fields: These are fields that usually have a single "instance" in the document itself, regardless of how many times this information appears within the document. For example, the supplier's name in a shipping note would typically be considered a simple field.

  • Unbounded Fields: These are elements that form part of a structure that needs to be captured multiple times. For example, line items in a shipping note or an invoice, or related parties in a writ of summons, would be considered unbounded fields. These can appear multiple times within a document and each occurrence holds unique and significant value.

In summary, KAPTO's entity extraction capability, coupled with the differentiation between simple and unbounded fields, provides a sophisticated tool for interpreting and extracting valuable information from complex documents.