Article Segmentation

The Automatic-Article-Segmentation (AAS) – a software developed by PPS – structures the blocks generated from the FR native XMLs using complex algorithms and various analysis methods and assigns them to the correct reading order. Advertisements are also filtered out and obituaries are recognized and tagged accordingly. Furthermore the AAS recognizes uptitle, main titles, subtitles, opening credits and the article text as well as picture captions by a typographic analysis and tagged them accordingly.
On request we also deliver msh Web:digiPaper-, DC-X- and fink & PARTNER huGO-compliant.

We analyse the following article elements:

  • uptitle
  • title
  • subtitles
  • description
  • picture caption
  • image
  • article text
  • authors
  • department
  • columns

Article types we recognize

Editorial articles 98%
Ads 65%
Obituaries 87%
Images and captions 90%

Average accuracy of our recognition in daily newspapers

Reading order of the article 80%
Layout analysis 87%
OCR recognition 99.9%
Depending on the layout of the page, the quality of the individual page elements and the print quality.