Picture Extraction & Facsimile

We automatically cut out editorial prictures/photos from newspapers. The following metadata is generated:

  • Picture ID
  • Picture caption
  • Coordinates of the picture
  • Issue number¬†
  • Page number
  • Publication date

Metadata can be delivered in HTML or XML according to customer requirements. Advertising images are filtered to 85-90%.

Pages without pictures:

The pictures are removed from the pages so that the copyright of the photographers is not infringed.


For digiPaper and ePaper, we automatically cut out the articles and pictures and generate the article text in XML format. We adapt the XML schema individually to the customer’s requirements.