Skip to content

Xournal++ HTR

Developing handwritten text recognition for Xournal++.

Your contributions are greatly appreciated!

TODO: Make this documentation the central part and adapt README for example like here. Replicate sections of current README here therefore.

Xournal++ HTR in 90 seconds

Why handwritten text recognition for Xournal++?

When moving from physical to digital note taking, one great benefit is searchability. Hence, when starting to use digital handwritten notes, those should be searchable as well as to not compromise on the benefits of digital note taking. This feature is called handwritten text recognition.

The handwritten text recognition feature has been part of many commercial note taking apps for ages.

However, there exists no such handwritten text recognition feature for any open source handwriting application.

Hence, the goal of Xournal++ HTR is to bring handwritten text recognition to Xournal++, which is one of the most popular open-source applications to take digital handwritten notes.

Content of these websites

These websites document Xournal++ HTR. In the navigation bar, you can find instructions on how to install the project, use the project and more advanced topics like how you can contribute code and own models. Many of the documents come with small videos to get you going quicker.

To assist you in training your own models, Xournal++ HTR comes with many helper functions and convenience code infrastructure.

Cite

If you are using Xournal++ HTR for your research, I'd appreciate if you could cite it. Use:

@software{Lellep_Xournalpp_HTR,
  author = {Lellep, Martin},
  title = {xournalpp_htr},
  url = {https://github.com/PellelNitram/xournalpp_htr},
  license = {GPL-2.0},
}

(Also consider starring the project on GitHub.)

Funding

TODO: Add a buy me a coffee link.

Project description

Taking handwritten notes digitally comes with many benefits but lacks searchability of your notes. Hence, there is a need to make your handwritten notes searchable. This can be achieved with "handwritten text recognition" (HTR), which is the process of assigning searchable text to written strokes.

While many commercial note taking apps feature great HTR systems to make your notes searchable and there are a number of existing open-source implementations of various algorithms, there is no HTR feature available in an open-source note taking application that is privacy aware and processes your data locally.

The purpose of the Xournal++ HTR project is to change that!

Xournal++ HTR strives to bring open-source on-device handwriting recognition to Xournal++ as it is one of the most adopted open-source note taking apps and thereby HTR can be delivered to the largest possible amount of users.

Training

Installation

Follow the above installation procedure and replace the step pip install -r requirements.txt by both pip install -r requirements.txt and pip install -r requirements_training.txt to install both the inference and training dependencies.

Project design

The design of Xournal++ HTR tries to bridge the gap between both delivering a production ready product and allowing contributors to experiment with new algorithms.

The project design involves a Lua plugin and a Python backend, see the following figure. First, the production ready product is delivered by means of an Xournal++ plugin. The plugin is fully integrated in Xournal++ and calls a Python backend that performs the actual transcription. The Python backend allows selection of various recognition models and is thereby fully extendable with new models.

Design of xournalpp_htr.

An alternative figure is shown below: (todo: restructure readme)

sequenceDiagram
    User in Xpp-->>Xpp HTR Plugin: starts process using currently open file
    Xpp HTR Lua Plugin -->>Xpp HTR Python Backend: constructs command using CLI
    Xpp HTR Python Backend -->> Xpp HTR Python Backend: Does OCR & stores PDF
    Xpp HTR Python Backend-->>User in Xpp: Gives back control to UI

Developing a usable HTR systems requires experimentation. The project structure is set up to accommodate this need. Note that ideas on improved project structures are appreciated.

The experimentation is carried out in terms of "concepts". Each concept explores a different approach to HTR and possibly improves over previous concepts, but not necessarily to allow for freedom in risky experiments. Concept 1 is already implemented and uses a computer vision approach that is explained below.

Future concepts might explore: - Retrain computer vision models from concept 1 using native data representation of Xournal++ - Use sequence-to-sequence models to take advantage of native data representation of Xournal++ - Use data augmentation to increase effective size of training data - Use of language models to correct for spelling mistakes

Concept 1

This concept uses computer vision based algorithms to first detect words on a page and then to read those words.

The following shows a video demo on YouTube using real-life handwriting data from a Xournal file:

Xournal++ HTR - Concept 1 - Demo

Despite not being perfect, the main take away is that the performance is surprisingly good given that the underlying algorithm has not been optimised for Xournal++ data at all.

The performance is sufficiently good to be useful for the Xournal++ user base.

Feel free to play around with the demo yourself using this code after installing this project. The "concept 1" is also what is currently used in the plugin and shown in the 90 seconds demo.

Next steps to improve the performance of the handwritten text recognition even further could be: - Re-train the algorithm on Xournal++ specific data, while potentially using data augmentation. - Use language model to improve text encoding. - Use sequence-to-sequence algorithm that makes use of Xournal++'s data format. This translates into using online HTR algorithms.

I would like to acknowledge Harald Scheidl in this concept as he wrote the underlying algorithms and made them easily usable through his HTRPipeline repository - after all I just feed his algorithm Xournal++ data in concept 1. Go check out his great content!

Code quality

We try to keep up code quality as high as practically possible. For that reason, the following steps are implemented:

  • Testing. Xournal++ HTR uses pytest for implementing unit, regression and integration tests.
  • Linting. Xournal++ HTR uses ruff for linting and code best practises. ruff is implemented as git pre-commit hook. Since ruff as pre-commit hook is configured externally with pyproject.toml, you can use the same settings in your IDE if you wish to speed up the process.
  • Formatting. Xournal++ HTR uses ruff-format for consistent code formatting. ruff-format is implemented as git pre-commit hook. Since ruff-format as pre-commit hook is configured externally with pyproject.toml, you can use the same settings in your IDE if you wish to speed up the process.