Automate Document Classification

Manually selecting doc types is a thing of the past. Quickly regain control of sprawling, unstructured document collections by automatically organizing them into logical groups based on similarity rankings in models that you train and control.

Feature Collection with ESP

Grooper's ESP engine identifies the distinguishing features of each page to group collections of images together as classified documents. ESP uses three key feature collection mechanisms:

Lexical

NLP examines the language of the complete document to understand its meaning and determine what it is.

Rules-Based

Finds unique key words or phrases that positively identify a document, like a Title or Section Heading.

Visual

Computer vision identifies structured forms based on what they look like without having to read from OCR.

ESP Separation

Train document examples and see how the ESP Separation engine interprets the content of each page, Groops pages into documents, and simulates page breaking and classification.

  • A simple "train-by-example" interface lets you quickly teach ESP how to identify each document.
  • Real-time confidence scores show you both the document type and assumed page number for each page in a batch.
  • Estimated Page Index (EPI) identifies page numbers on your documents. This information is used by ESP to determine if an unknown page is likely part of a surrounding document.

Classification

Provide document examples and watch as Grooper begins to learn the correct Doc Type for each instrument provided. When doing batch testing, unclassified items (those with low confidence scores) can be flagged and sent to a queue for additional training.

Previous Synthetic OCR
Next Natural Language Processing