Newspaper Digitization

Why digitize?

Newspapers in particular document not only international but also regional and local history, politics, culture and social issues. Recently, the interest of young people in local information has also been growing. This is also a reason to improve access to the newspaper archives of the regional press. Neither Google nor other international search engines will be able to find information from the distribution area of regional newspapers. Hardly any other source covers this as well as the local newspapers.

Reasons for the digitalization of newspapers:

  • An important aspect that usually receives little attention is saving time with high-quality research results. The search in the digital archive takes seconds, the search in the books can take days until you find what you are looking for.
  • Marketing via birthday newspapers
  • Marketing via ePaper
  • Use the digital archive as an advertising medium. Each page accessed is associated with advertising, which generates additional advertising revenue.
  • The search for company histories can be made to order and used as a book present for board members, managing directors, etc.
  • For the production of association stories, can also be produced and marketed on order as a book to association anniversaries
  • For genealogical research
  • Historical research
  • An enrichment of history teaching in schools could be the free use of the newspaper archive, which leads to a broadening of the use and reader loyalty as well as to the acquisition of subscribers.

XML & PDF

Information is like energy: it goes up in smoke if you don’t treat it right. Just like energy, information cannot be extracted from nothing. It is therefore important to obtain every possible piece of information throughout the entire workflow. This starts with scanning. Only with carefully scanned documents that contain all available information as far as possible and suppress irrelevant information such as dirt, can a high-quality text recognition (OCR) take place. But even after OCR, all available information should be preserved.
One format promises this in particular: XML. In XML, any components can be identified and stored by logical markers. In contrast to PDF, even seemingly unimportant elements that would otherwise be lost can be saved and reactivated for later use. This is especially important for article recognition. Not only the text, but the entire context in which the article is located must be preserved. PDF is designed for an optimal reproduction of the appearance. However, it is almost impossible to find this context automatically only from the appearance.