Roadmap

On this page, we outline the project's intended roadmap. The roadmap helps us strategically manage our time and resources.

Below, we present our roadmap. It may evolve over time, so we will preserve previous versions to maintain transparency.

Roadmap as of 2025-11-08

Our project currently faces two main challenges: deployment and prediction quality.

Deployment: Since the plugin runs on the edge, the deployment is really the installation on the user's device. This installation has been a problem because we only tested it on Linux and the installation process requires technical knowledge. These issues have recently been mitigated through a new online demo that allows users to try Xournal++ HTR without installation. Following the successful PyInstaller proof of concept, this setup will likely become the preferred installation method in the future.
Prediction Quality: We want to improve prediction quality. Improving prediction performance requires a way to measure real-life accuracy. To achieve this, two components are planned:
- A crowdsourced benchmark dataset, compiled from donated demo input data.
- An end-to-end evaluation metric to assess both text detection and transcription quality.

Additionally, we currently rely on a third-party library called htr_pipeline for model delivery. While effective, it complicates installation and model management. We plan to replace it with an in-house implementation. The first component, the WordDetectorNet, has already been internalized (see explanation of model, code and demo). This transition away from htr_pipeline will not be backward compatible.

Roadmap as of 2025-05-03

Visual Overview

flowchart LR
    A0(
        Conduct
        dataset
        research
    )
    A(
        Reimplement
        <a href="https://github.com/githubharald/HTRPipeline">htr_pipeline</a>
    )
    B(
        Classic algos w
        <a href="https://github.com/PellelNitram/OnlineHTR">OnlineHTR</a>
    )
    C(
        Start own
        modeling
    )
    D(
        Introduce
        quality
        measures
    )
    E(
        Graph NN w
        <a href="https://github.com/PellelNitram/OnlineHTR">OnlineHTR</a>
    )
    F(
        Make
        installation
        easier
    )
    G(
        Explore offline
        recognition models
        like <a href="https://arxiv.org/abs/1904.01941">CRAFT</a>
    )
    A --> F
    F --> D
    D --> A0
    A0 --> C
    C --> B
    C --> E
    C --> G

Explanation

This project has many potential directions, with the primary goal of delivering optimal value to users. While we are eager to implement advanced machine learning algorithms, we must first focus on usability improvements.

Our main mid-term objective is to simplify the installation process, as users have reported it is too complex.

Explanation of the steps:

Reimplement htr_pipeline:
We currently use the excellent htr_pipeline by Harald Scheidl for machine learning, but it being an external dependency complicates installation and them hosting model weights on Dropbox is not suitable for our needs. To address this, we plan to integrate these models directly into our project. Since the original repository lacks a license, we'll implement our own version, drawing inspiration from the existing work. This approach will deliver an easy-to-install product quickly, as we already know the requirements & model details. Additionally, it enhances our understanding of training models for both online and offline handwriting data. With our own models, we'll automate model retrieval and establish a model registry, likely using Hugging Face, as part of adhering to MLOps best practices. Experimentation with new algorithms will benefit from the model registry and will occur subsequently, as it is more time-consuming.
Make installation easier:
We aim to make the installation process seamless across platforms, including Linux and Windows, with future support for Mac if access becomes available to us. Implementing a model registry will streamline model management and deployment, aiding future model development and enhancing ease of use while aligning with best practices.
Introduce quality measures:
To identify the best model, we need to quantify performance. Ideally, one metric will suffice, but two may be necessary if recognition and transcription remain separate tasks.
Classic algos w OnlineHTR:
The plan is to use OnlineHTR for transcription alongside classical (non-data-driven) algorithms for recognition.
Graph NN w OnlineHTR:
We aim to use OnlineHTR for transcription and a graph neural network for recognition. This approach seeks to develop a high-performing model that operates on the native online representation of handwriting.