ADR 006 – Model Registry and Training Environment

Date: 2026-05-09
Status: Accepted
PRD: None
Drivers: Martin Lellep (@PellelNitram)
Deciders: Martin Lellep (@PellelNitram)

Context

ADR 002 adopted the HuggingFace ecosystem and noted two possible strategies for hosting custom model architectures on the HF Hub: (a) subclassing PreTrainedModel for full transformers integration, or (b) using the Hub as plain artifact storage. ADR 003 established compute_predictions(document, pipeline) as the central inference API, with models downloaded from HF Hub on first use.

HF Hub is already used in this project for dataset distribution: snapshot_download in xio.py fetches the IAM-OnDB training dataset, the benchmark dataset, and the examples dataset. This ADR extends that existing infrastructure to cover trained model artifacts.

The project is now at the point of training its first custom model: Carbune, a bidirectional LSTM stack with CTC loss. Carbune operates on online ink strokes (x/y/t time series from a stylus), a non-standard input domain that the transformers library has no built-in support for. It is implemented as a PyTorch Lightning LightningModule (LitModule1) and uses a custom AlphabetMapper tokeniser and a greedy CTC decoder — none of which map onto standard transformers abstractions. This is the root reason why off-the-shelf PreTrainedModel integration is impractical. This ADR decides:

How custom models are stored on and retrieved from HF Hub.
What format trained models are exported to for inference.
How training code and its dependencies are organised within the repository.
How HF Hub model repositories are named.

Constraints: - The training framework is not fixed: different models may use Lightning, HF Trainer, or raw PyTorch. - The inference artifact must work independently of the training framework used. - The inference environment must be lean (PyInstaller compatibility is a future goal; see issues #66, #67). - A pipeline may use multiple models (ADR 003); naming must reflect individual models, not pipelines.

Decision

1. HF Hub as plain artifact storage

Custom models are stored on HF Hub as raw files — not by subclassing PreTrainedModel or using PyTorchModelHubMixin. Each model repository contains the inference artifact(s) and whatever supporting files the model requires to run (e.g. config.json, alphabet, tokeniser). The model builder decides what to upload alongside the primary artifact; there is no enforced schema for supporting files. Files are downloaded at inference time via hf_hub_download.

A typical post-training upload looks like:

from huggingface_hub import HfApi

api = HfApi()
api.upload_file(path_or_fileobj="exports/model.onnx", path_in_repo="model.onnx",
                repo_id="PellelNitram/xournalpp-htr-carbune")
api.upload_file(path_or_fileobj="exports/config.json", path_in_repo="config.json",
                repo_id="PellelNitram/xournalpp-htr-carbune")

To keep consumers free from knowing which files to fetch — or even which repository each model lives in — every model class implements the following abstract base class:

from abc import ABC, abstractmethod
from typing import ClassVar

class HFHubInferenceModel(ABC):
    HF_REPO_ID: ClassVar[str]

    def __init__(self, revision: str):
        self.revision = revision

    @classmethod
    @abstractmethod
    def from_pretrained(cls, revision: str = "main") -> "HFHubInferenceModel": ...

    def __repr__(self) -> str:
        return f"{type(self).__name__}(repo={self.HF_REPO_ID!r}, revision={self.revision!r})"

The HF_REPO_ID class attribute binds each subclass to its repository. revision is stored on the instance so callers can introspect which version is loaded (useful for logging and reproducibility), and the default __repr__ surfaces both pieces of information for debug output.

Each concrete model encapsulates its own hf_hub_download calls and any supporting-file loading inside from_pretrained. A typical implementation:

import json
import onnxruntime as ort
from huggingface_hub import hf_hub_download

class CarbuneModel(HFHubInferenceModel):
    HF_REPO_ID = "PellelNitram/xournalpp-htr-carbune"

    @classmethod
    def from_pretrained(cls, revision: str = "main") -> "CarbuneModel":
        onnx_path = hf_hub_download(cls.HF_REPO_ID, "model.onnx", revision=revision)
        config_path = hf_hub_download(cls.HF_REPO_ID, "config.json", revision=revision)
        with open(config_path) as f:
            config = json.load(f)
        return cls(
            session=ort.InferenceSession(onnx_path),
            config=config,
            revision=revision,
        )

Consumers then load any model with a single, parameter-free call:

model = CarbuneModel.from_pretrained()

The model class is the user-facing identity; consumers do not need to know the HF Hub repository ID or which files to fetch. Pinning to a specific version is opt-in via the revision argument (CarbuneModel.from_pretrained(revision="v1.2.0")).

This gives a uniform, from_pretrained-shaped loading interface without depending on transformers. The ABC deliberately does not define a predict() / __call__() method — the inference signature varies too much across models (different input domains, batch shapes, output types), and the central inference API is compute_predictions(document, pipeline) per ADR 003. The ABC's responsibility is model lifecycle (loading and version introspection), not inference shape.

Subsequent calls to hf_hub_download with the same arguments hit the local cache (~/.cache/huggingface/hub/) and do not re-download unless the file changed on the Hub.

2. ONNX as the inference export format

After training, models are exported to ONNX and this export is the canonical inference artifact. The training checkpoint (Lightning .ckpt, HF Trainer output, etc.) may also be uploaded to the same HF Hub repository for resuming training, but inference always uses the ONNX export.

Rationale: ONNX is training-framework-agnostic, works with onnxruntime (a lean dependency), and is more amenable to PyInstaller bundling than full PyTorch.

3. Per-model training extras and subfolders

Training code lives under xournalpp_htr/training/<model-name>/ and its dependencies are declared as a named optional extra in pyproject.toml:

[project.optional-dependencies]
training-carbune = ["lightning", "hydra-core", ...]
training-<next-model> = ["transformers", "datasets", ...]

Install options: - uv add xournalpp_htr — inference only (lean) - uv add xournalpp_htr[training-carbune] — inference + Carbune training

A [training] umbrella extra (installing every model's training dependencies at once) was considered and rejected. It would grow unboundedly as new models are added, conflate incompatible framework versions, and is unnecessary in practice — a developer typically works on one model at a time and can install multiple per-model extras explicitly (uv add xournalpp_htr[training-carbune,training-<other>]) on the rare occasions when more than one is needed at once.

Shared training utilities (CTC decoder, evaluation metrics, dataset loaders used across models) live in xournalpp_htr/training/shared/ with no extra dependencies beyond the base package.

Each training subfolder's __init__.py guards against missing dependencies:

try:
    import lightning
except ImportError as e:
    raise ImportError(
        "Carbune training requires additional dependencies. "
        "Install with: uv add xournalpp_htr[training-carbune]"
    ) from e

4. HF Hub repository naming

Individual model repositories follow the convention:

PellelNitram/xournalpp-htr-<model-name>

Examples: PellelNitram/xournalpp-htr-carbune, PellelNitram/xournalpp-htr-word-detector.

The model card documents the dataset the model was trained on. A pipeline (as defined in ADR 003) may reference one or more model repositories; pipeline-to-model mapping is done inside each pipeline implementation directly — there is no global registry, lookup table, or algorithmic resolution. Each pipeline simply imports and instantiates the concrete model classes it needs (e.g. CarbuneModel.from_pretrained()).

Rationale

Plain artifact storage over PreTrainedModel: subclassing PreTrainedModel would require a significant rewrite of the Carbune architecture and a fixed training framework, while providing little benefit for a model with a non-standard input domain (online ink strokes). The plain storage approach unblocks model sharing immediately with no model code changes.

ONNX over Lightning checkpoints at inference: load_from_checkpoint binds inference to Lightning and to the LitModule1 class definition. ONNX removes that binding entirely: any training framework can produce the export, and any runtime that supports onnxruntime can consume it. This is also the path of least resistance for future PyInstaller packaging.

Per-model extras with no umbrella: different models need incompatible frameworks; a union extra would bloat every training environment and grow unboundedly as new models are added. Named per-model extras keep environments minimal and make dependency intent explicit. No umbrella [training] extra is provided — developers who need multiple model environments can combine extras explicitly.

PyTorchModelHubMixin deferred: this would give from_pretrained / push_to_hub on custom nn.Modules without a full PreTrainedModel rewrite, and is the preferred long-term path — the goal is a uniform from_pretrained interface across both off-the-shelf transformers models and custom architectures. The blocker is concrete: the Carbune network (Carbune2020NetAttempt1) is referenced in the Hydra config but has not been extracted into its own class — the LSTM layers and linear head live directly inside LitModule1. Until the network is separated from the Lightning training wrapper, the mixin cannot be applied. This upgrade should be revisited once the model architecture stabilises.

Consequences

Pros

Training framework is fully flexible — Lightning, HF Trainer, or raw PyTorch are all valid.
Inference has minimal dependencies: onnxruntime + huggingface_hub.
Model sharing is unblocked immediately without any model code changes.
Per-model extras keep training environments lean and explicit.
ONNX export is compatible with the PyInstaller packaging path (issues #66, #67).
Naming convention is simple and consistent across all models.
The HFHubInferenceModel.from_pretrained() interface gives consumers a uniform, parameter-free loading API across all custom models without requiring transformers. The class itself identifies the model; consumers never deal with repository IDs or file lists.

Cons

ONNX export must be written and validated for each model. ONNX export traces the model with example inputs and freezes the operations into a static graph, so Python-level control flow whose path depends on tensor values at runtime is captured as whichever branch the trace happened to take. For Carbune this affects mostly the CTC decoder (greedy/beam-search loops over output probabilities), so the decoder will likely run in Python outside the ONNX graph rather than being exported. Variable-length sequence handling also requires explicit dynamic axes declarations. Workarounds when control flow is needed inside the graph: use torch.jit.script before export to preserve control flow, use the newer torch.onnx.export(..., dynamo=True) path, or restructure the model so that dynamic logic lives outside the exported portion.
Consumers cannot use transformers.from_pretrained directly. The HFHubInferenceModel.from_pretrained() ABC gives a morally equivalent interface, but consumers must still know which concrete model class to instantiate (there is no AutoModel equivalent that dispatches from a repo ID alone).
Supporting files (alphabet, config) alongside the ONNX are model-specific with no enforced schema — the model builder is responsible for documenting what is required.
Developers who need to work on multiple models simultaneously must combine per-model extras explicitly (uv add xournalpp_htr[training-a,training-b]); there is no single command to install every training environment at once.

Issue #71 (eval dataset storage) is largely resolved by existing infrastructure: the benchmark dataset already lives on HF Hub (PellelNitram/xournalpp_htr_benchmark) and is consumed via snapshot_download in xio.py. The existing dataset will be extended rather than replaced; no new eval dataset format decision is needed.
Issues #66, #67 (installation modes, PyInstaller): the per-model extras and ONNX inference format directly support the planned installation mode split. The lean base install (uv add xournalpp_htr) corresponds to the end-user installation mode.
Issues #62, #69 (CLI shape, HTR entry point): the inference loading pattern decided here (hf_hub_download + onnxruntime) is what the future CLI entry point will use internally.
Issue #73 (training delivery): Training is run on a GPU-enabled machine with the per-model extras installed via uv. Cloud or CI training is explicitly out of scope for now (and likely permanently); a possible future direction is on-demand cloud training via Vertex AI or similar, but no commitment is made here. Docker is not the default delivery mechanism — it will only be introduced if a future model's training environment proves too painful to reproduce with uv alone. The decision is to start with a uv environment and reach for Docker only when dependency hell forces the issue.

Alternatives

Subclass PreTrainedModel: full transformers integration, inference via standard from_pretrained. Rejected: requires rewriting the model interface, fixes the training framework, and provides little benefit for a non-standard input domain.
PyTorchModelHubMixin: adds from_pretrained / push_to_hub to any nn.Module without transformers. Deferred: requires extracting the network from the Lightning wrapper first.
TorchScript instead of ONNX: also framework-agnostic but harder to bundle with PyInstaller and requires full PyTorch at inference time. Rejected in favour of ONNX.
Single [training] extra: simpler at first but bloated when multiple incompatible training frameworks coexist, and grows unboundedly as new models are added. Rejected in favour of per-model named extras with no umbrella.
[training] umbrella extra alongside per-model extras: gives a one-shot "install everything" command. Rejected for the same growth reason — even as a convenience, an extra that pulls in every framework version every project ever supported will eventually become unusable.