Xournal++ HTR
Developing handwritten text recognition for Xournal++.
Your contributions are greatly appreciated!
TODO: Make this documentation the central part and adapt README for example like here. Replicate sections of current README here therefore.
Xournal++ HTR in 90 seconds
Why handwritten text recognition for Xournal++?
When moving from physical to digital note taking, one great benefit is searchability. Hence, when starting to use digital handwritten notes, those should be searchable as well as to not compromise on the benefits of digital note taking. This feature is called handwritten text recognition.
The handwritten text recognition feature has been part of many commercial note taking apps for ages.
However, there exists no such handwritten text recognition feature for any open source handwriting application.
Hence, the goal of Xournal++ HTR is to bring handwritten text recognition to Xournal++, which is one of the most popular open-source applications to take digital handwritten notes.
Content of these websites
These websites document Xournal++ HTR. In the navigation bar, you can find instructions on how to install the project, use the project and more advanced topics like how you can contribute code and own models. Many of the documents come with small videos to get you going quicker.
To assist you in training your own models, Xournal++ HTR comes with many helper functions and convenience code infrastructure.
Cite
If you are using Xournal++ HTR for your research, I'd appreciate if you could cite it. Use:
@software{Lellep_Xournalpp_HTR,
author = {Lellep, Martin},
title = {xournalpp_htr},
url = {https://github.com/PellelNitram/xournalpp_htr},
license = {GPL-2.0},
}
(Also consider starring the project on GitHub.)
Funding
TODO: Add a buy me a coffee link.
Project description
Taking handwritten notes digitally comes with many benefits but lacks searchability of your notes. Hence, there is a need to make your handwritten notes searchable. This can be achieved with "handwritten text recognition" (HTR), which is the process of assigning searchable text to written strokes.
While many commercial note taking apps feature great HTR systems to make your notes searchable and there are a number of existing open-source implementations of various algorithms, there is no HTR feature available in an open-source note taking application that is privacy aware and processes your data locally.
The purpose of the Xournal++ HTR project is to change that!
Xournal++ HTR strives to bring open-source on-device handwriting recognition to Xournal++ as it is one of the most adopted open-source note taking apps and thereby HTR can be delivered to the largest possible amount of users.
Training
Installation
Follow the above installation procedure and replace the step pip install -r requirements.txt
by both pip install -r requirements.txt
and pip install -r requirements_training.txt
to install both the inference and training dependencies.
Project design
The design of Xournal++ HTR tries to bridge the gap between both delivering a production ready product and allowing contributors to experiment with new algorithms.
The project design involves a Lua plugin and a Python backend, see the following figure. First, the production ready product is delivered by means of an Xournal++ plugin. The plugin is fully integrated in Xournal++ and calls a Python backend that performs the actual transcription. The Python backend allows selection of various recognition models and is thereby fully extendable with new models.
Design of xournalpp_htr.
An alternative figure is shown below: (todo: restructure readme)
sequenceDiagram
User in Xpp-->>Xpp HTR Plugin: starts process using currently open file
Xpp HTR Lua Plugin -->>Xpp HTR Python Backend: constructs command using CLI
Xpp HTR Python Backend -->> Xpp HTR Python Backend: Does OCR & stores PDF
Xpp HTR Python Backend-->>User in Xpp: Gives back control to UI
Developing a usable HTR systems requires experimentation. The project structure is set up to accommodate this need. Note that ideas on improved project structures are appreciated.
The experimentation is carried out in terms of "concepts". Each concept explores a different approach to HTR and possibly improves over previous concepts, but not necessarily to allow for freedom in risky experiments. Concept 1 is already implemented and uses a computer vision approach that is explained below.
Future concepts might explore: - Retrain computer vision models from concept 1 using native data representation of Xournal++ - Use sequence-to-sequence models to take advantage of native data representation of Xournal++ - Use data augmentation to increase effective size of training data - Use of language models to correct for spelling mistakes
Concept 1
This concept uses computer vision based algorithms to first detect words on a page and then to read those words.
The following shows a video demo on YouTube using real-life handwriting data from a Xournal file:
Despite not being perfect, the main take away is that the performance is surprisingly good given that the underlying algorithm has not been optimised for Xournal++ data at all.
The performance is sufficiently good to be useful for the Xournal++ user base.
Feel free to play around with the demo yourself using this code after installing this project. The "concept 1" is also what is currently used in the plugin and shown in the 90 seconds demo.
Next steps to improve the performance of the handwritten text recognition even further could be: - Re-train the algorithm on Xournal++ specific data, while potentially using data augmentation. - Use language model to improve text encoding. - Use sequence-to-sequence algorithm that makes use of Xournal++'s data format. This translates into using online HTR algorithms.
I would like to acknowledge Harald Scheidl in this concept as he wrote the underlying algorithms and made them easily usable through his HTRPipeline repository - after all I just feed his algorithm Xournal++ data in concept 1. Go check out his great content!
Code quality
We try to keep up code quality as high as practically possible. For that reason, the following steps are implemented:
- Testing. Xournal++ HTR uses
pytest
for implementing unit, regression and integration tests. - Linting. Xournal++ HTR uses
ruff
for linting and code best practises.ruff
is implemented as git pre-commit hook. Sinceruff
as pre-commit hook is configured externally withpyproject.toml
, you can use the same settings in your IDE if you wish to speed up the process. - Formatting. Xournal++ HTR uses
ruff-format
for consistent code formatting.ruff-format
is implemented as git pre-commit hook. Sinceruff-format
as pre-commit hook is configured externally withpyproject.toml
, you can use the same settings in your IDE if you wish to speed up the process.