Image Optimization

Grooper enhances images displayed to users throughout your workflow, removes image artifacts known to interfere with OCR, provides crisp versions for permanent archival, and analyzes page structure to assist with automated decision making downstream.

Our first goal was to

Prepare Images for Optimal OCR

Our R&D team has proven that good OCR starts with images free of non-text artifacts. Grooper has dozens of features that remove everything that isn’t text to ensure you get distraction-free OCR. Let's look at a few examples.

Safe and clean

Halftone Removal

Dithering and other halftone patterns are a direct result of legacy document imaging platforms poorly converting color images to black and white. These artifacts must be eliminated to prevent massive errors in OCR results, particularly with punctuation like periods and commas.

    Before

    Halftone artifacts completely surround text we'd like to capture. OCR stands very little chance at seeing these characters.

    After

    Grooper recognizes dithered patterns and safely removes them without eliminating legitimate punctuation that is close to letters on the page.

Seriously brilliant

Border Removal

Borders have historically been very tricky to remove when the black region doesn't extend all the way to the edge of the page. Grooper understands how to address a variety of uncommon border scenarios to cleanly remove them.

Wait, is this Photoshop?

Inpainted Removal

You work with full-color documents every day. Why shouldn't your image processing do the same? Our object removal strategies break out of the black and white color space and allow you to remove artifacts from color images like they were never there.

Pixel-perfect

Line Detection & Removal

Lines are used all throughout standardized forms, table structures, and pages with "fill-in-the-blank" comb boxes to provide visual cues that increase legibility for readers. These lines, particularly the short, vertical ones, are commonly picked up by OCR Engines as characters. Grooper can erase these with ease.

  • Dropout is performed using a very precise, pixel-by-pixel mask of lines rather than a generic point-to-point/thickness approach. This technique leaves no edge artifacts behind, providing a cleaner OCR image.
  • Works well with very short lines, even those smaller than many of the letters on your page. Grooper knows the difference between lines that should be removed and characters like "l, I, and 1".
  • Characters connected to lines are detected and preserved, keeping valuable data for your OCR process.
Previous Electronic Document Processing
Next Synthetic OCR