Developer Guide
Project design
The design of Xournal++ HTR tries to bridge the gap between both delivering a production ready product and allowing contributors to experiment with new algorithms.
The project design involves a Lua plugin and a Python backend, see the following figure. First, the production ready product is delivered by means of an Xournal++ plugin. The plugin is fully integrated in Xournal++ and calls a Python backend that performs the actual transcription. The Python backend allows selection of various recognition models and is thereby fully extendable with new models.
sequenceDiagram
User in Xpp-->>Xpp HTR Plugin: starts transcription process using currently open file
Xpp HTR Plugin -->> Xpp HTR Lua Plugin: calls
Xpp HTR Lua Plugin -->>Xpp HTR Python Backend: constructs command using CLI
Xpp HTR Python Backend -->> Xpp HTR Python Backend: Does OCR & stores PDF
Xpp HTR Python Backend-->>User in Xpp: Gives back control to UI
Developing a usable HTR systems requires experimentation. The project structure is set up to accommodate this need. Note that ideas on improved project structures are appreciated.
The experimentation is carried out in terms of "concepts". Each concept explores a different approach to HTR and possibly improves over previous concepts, but not necessarily to allow for freedom in risky experiments. Concept 1 is already implemented and uses a computer vision approach that is explained below.
Future concepts might explore:
- Retrain computer vision models from concept 1 using native online data representation of Xournal++
- Use sequence-to-sequence models to take advantage of native online data representation of Xournal++; e.g. use OnlineHTR
- Use data augmentation to increase effective size of training data
- Use of language models to correct for spelling mistakes
Concept 1
This concept uses computer vision based algorithms to first detect words on a page and then to read those words.
The following shows a video demo on YouTube using real-life handwriting data from a Xournal file:
Despite not being perfect, the main take away is that the performance is surprisingly good given that the underlying algorithm has not been optimised for Xournal++ data at all.
The performance is sufficiently good to be useful for the Xournal++ user base.
Feel free to play around with the demo yourself using this code after installing this project. The "concept 1" is also what is currently used in the plugin and shown in the 90 seconds demo.
Next steps to improve the performance of the handwritten text recognition even further could be:
- Re-train the algorithm on Xournal++ specific data, while potentially using data augmentation.
- Use language model to improve text encoding.
- Use sequence-to-sequence algorithm that makes use of Xournal++'s data format. This translates into using online HTR algorithms.
I would like to acknowledge Harald Scheidl in this concept as he wrote the underlying algorithms and made them easily usable through his HTRPipeline repository - after all I just feed his algorithm Xournal++ data in concept 1. Go check out his great content!