Image-walking

It is easy to underestimate the process of isolating the glyphs within a document. To start with, the possible variety of documents that can be presented for transcription is enormous. They can be dark, smudged, torn, with or without a border, straight or crooked, rows can be sloping or straight, writing can slope to the…

Advertisements

Getting to a Trained Model

As mentioned previously, in order to train the model you need to identify the shapes (characters) that you need to use. Essentially you are trying to deconstruct a document into its constituent character shapes and then to attach the meaning of those shapes. The result should be that if you have identified a sufficiently representative sample of…

Training and Transcribing

The choice of Machine Learning (ML) as the recognition method necessitates a specific approach to the recognition problem. 'Learning' refers to the computer ML model being provided with a set of examples and their meanings such that when later the model is presented with new examples, it can successfully interpret those examples and ascribe the correct meaning to…

Document vs. an Image of a Document

It may be obvious to many but it will not be so obvious to many more. When you look at a scanned or photographed image of a document on a computer screen you see pretty much the same as when you look at the original document and we have no trouble reading it assuming the…

How Did It Start?

This whole idea started in 2015 with an interest in handwriting recognition. Since 2013 we had been engaged in digitising an archive of old documents, and the process of manually transcribing the documents, or even of extracting meaningful information from them, was daunting and I thought that if this could be done automatically, even if…