Scan Technology

Our scanning systems are specially designed for scanning large-format books. We scan the double pages without waistband distortion. Our capacity is 250,000 pages per month.

The choice of the output format is determined by the original:

  • Black/white: daily newspapers printed in black and white
  • Grayscale: Gravure artwork such as magazines, magazines and catalogues
  • Color: Color printed newspapers, books, magazines

We use the right scanning technology for every medium.

The move to our new production facilities also involved the conversion of the scanner drives from pneumatics to electronics. Today we produce with a previously unattainable scanning quality. The proof pressure can be controlled more precisely and the susceptibility to faults is extremely low. As a result, we achieved a considerable increase in production both in quantity and quality.
In practice it has been proven that with a resolution of 300 dpi for subsequent text recognition the best results are achieved. The background and the spaces between the letters without dirt particles are displayed in pure white. The quality of the scans is decisive for all further steps in the workflow.

scannen

The scan quality is essential for all further steps up to the recognition of the articles

  • Distortion-free scan of double page bound book
  • Correct alignment of pages
  • Clean background, pixel removal even between letters and lines
  • High-contrast reproduction of the characters
  • Visual inspection of the pages; manual correction of any existing errors
  • Software check for error-free naming and completeness
  • A scan resolution of 300 dpi brings the best OCR results from our experience.
  • Black/white pages are scanned bitonal, colored pages in color accordingly
  • Our scanning performance is currently around 250,000 pages per month.

How the pages are processed from us

The scanned pages pass through an intelligent image processing (IBB), which removes the dirt particles within the page and draws a clean defined border. Delivery takes place as TIF or JPG, followed by OCR with optimized layout recognition, which ensures good results when separating the individual articles. We then deliver the pages as a PDF with underlying text. In order to forwards the contents completely, we store all page information as native XMLs.

The metadata of each page is generated from the file name

  • Issue number of the newspaper
  • Abbreviation of the Newspaper
  • Publication date
  • Page number

This data is given to each article when the individual articles are separated, in addition to the coordinates of the articles. This information is marked and selected by our layout recognition system so that it is available for the later article recognition and the PPS-Finder.