Architecture
This page describes the architecture of Xournal++ HTR from the perspective of its three user-facing entry points: the CLI, the Xournal++ plugin, and the web demo.
Overview
All three entry points feed into the same core HTR pipeline, which converts a Xournal++ document (.xoj/.xopp) into a searchable PDF:
graph LR
A["Xournal++ Plugin<br/>(Lua)"] -->|subprocess| CLI
CLI["CLI<br/>(run_htr.py)"] --> P["HTR Pipeline<br/>(pipeline.py)"]
D["Web Demo<br/>(Gradio)"] --> P
P --> PDF["Searchable PDF"]
Entry Points
CLI (xournalpp_htr/run_htr.py)
The command-line interface. Parses arguments via utils.parse_arguments() and delegates to pipeline.export_xournalpp_to_pdf_with_htr().
python xournalpp_htr/run_htr.py \
-if input.xopp \
-of output.pdf \
[-m MODEL_NAME] \
[-pid PREDICTION_IMAGE_DIR] \
[-sp]
Xournal++ Plugin (plugin/main.lua)
A Lua plugin that integrates into Xournal++ as a menu item (Tools > Xournal++ HTR, shortcut Ctrl+F1). It prompts the user for a save location and then calls run_htr.py via os.execute. Configuration (Python path, script path, model) is stored in plugin/config.lua.
Web Demo (scripts/demo.py)
A Gradio app deployed as a HuggingFace Space (via Dockerfile). It provides a browser-based UI where users upload .xoj/.xopp files and step through the pipeline interactively. Unlike the CLI and plugin, the demo calls the pipeline functions directly (not via pipeline.py) and displays the first page as a preview. It optionally logs interactions to Supabase for analytics and data donation.
Core Pipeline
The pipeline in pipeline.py runs three sequential steps:
graph TD
INPUT[".xoj / .xopp file"] --> PIPELINE["export_xournalpp_to_pdf_with_htr()"]
subgraph "Step 1: Export"
PIPELINE --> S1["export_to_pdf_with_xournalpp()"]
S1 -->|"xournalpp CLI"| TMP["Temporary PDF<br/>(no text layer)"]
end
subgraph "Step 2: Recognise"
PIPELINE --> DOC["get_document()"]
DOC -->|"Document object"| CP["compute_predictions()"]
CP --> RENDER["save_page_as_image()<br/>(matplotlib, 150 DPI)"]
RENDER --> HTR["read_page()<br/>(htr_pipeline)"]
HTR --> PRED["Predictions dict"]
end
subgraph "Step 3: Embed"
PIPELINE --> WRITE["write_predictions_to_PDF()"]
TMP --> WRITE
PRED --> WRITE
WRITE -->|"PyMuPDF"| OUT["Output PDF<br/>(with text layer)"]
end
Step 1: Export to PDF
utils.export_to_pdf_with_xournalpp() shells out to the xournalpp CLI to convert the input file into a temporary PDF. This preserves the original visual layout (drawings, backgrounds, etc.).
Step 2: HTR Predictions
-
Parse document --
documents.get_document()decompresses the gzip XML and parses it with BeautifulSoup into aDocumentcontainingPage>Layer>Strokeobjects. A factory function dispatches toXournalDocument(.xoj) orXournalppDocument(.xopp). -
Render pages -- Each page is rendered to a 150 DPI grayscale image via matplotlib using the stroke coordinates.
-
Run HTR -- The external
htr_pipelinelibrary processes each image:- Word detection -- An ONNX model locates word regions (scaled to 40%, with 5px margin).
- Line clustering -- DBSCAN groups detected words into lines (discarding lines with fewer than 2 words).
- Text recognition -- A second ONNX model recognises each word via CTC decoding.
Output is a dictionary mapping page indices to lists of predictions (text + bounding box coordinates in image pixels).
Step 3: Embed Text in PDF
xio.write_predictions_to_PDF() uses PyMuPDF to add text boxes to the temporary PDF from Step 1. Coordinates are converted from 150 DPI image pixels to 72 DPI PDF points. Text is rendered invisibly (render_mode=3) by default, making the PDF searchable without visual clutter. In debug mode (--show-predictions), text and bounding boxes are drawn visibly.
Module Structure
xournalpp_htr/
run_htr.py # CLI entry point
pipeline.py # Orchestrates Steps 1-3
documents.py # .xoj/.xopp parsing (Document ABC, Page, Layer, Stroke)
models.py # HTR inference wrapper (compute_predictions)
utils.py # Argument parsing, xournalpp CLI export
xio.py # PDF I/O (PyMuPDF), example loading (HuggingFace Hub)
scripts/
demo.py # Gradio web demo
plugin/
main.lua # Xournal++ plugin
config.lua # Plugin configuration
external/htr_pipeline/
HTRPipeline/htr_pipeline/
__init__.py # read_page() -- main inference API
reader/ # ONNX text recognition + CTC decoding
word_detector/ # ONNX word detection + line clustering
models/ # Pre-trained ONNX model files
Key Data Structures
Document model (from documents.py):
Document(ABC) -- holdspages: list[Page],DPI: int, renders pages to imagesXournalDocument-- for.xojfilesXournalppDocument-- for.xoppfiles
Page--meta_data(width, height),background,layers: list[Layer]Layer--strokes: list[Stroke]Stroke--x: np.array,y: np.array,meta_data(color, width, ...)
Predictions (from models.py):
{
page_index: [
{"page_index": int, "text": str,
"xmin": float, "xmax": float, "ymin": float, "ymax": float},
...
],
...
}
Bounding box coordinates are in image pixels at 150 DPI.
External Dependencies
| Dependency | Purpose |
|---|---|
xournalpp (CLI) |
Exports .xoj/.xopp to PDF (Step 1) |
beautifulsoup4 + lxml |
Parses document XML |
matplotlib |
Renders pages to images for HTR |
opencv-python |
Image loading and processing |
onnxruntime |
Runs word detection and text recognition models |
scikit-learn |
DBSCAN clustering for line detection |
pymupdf |
Embeds text into PDF (Step 3) |
gradio |
Web demo UI (demo only) |
huggingface_hub |
Downloads example files (demo only) |