ADR 006 – Model Registry and Training Environment
- Date: 2026-05-09
- Status: Accepted
- PRD: None
- Drivers: Martin Lellep (@PellelNitram)
- Deciders: Martin Lellep (@PellelNitram)
Context
ADR 002 adopted the HuggingFace
ecosystem and noted two possible strategies for hosting custom model architectures on the
HF Hub: (a) subclassing
PreTrainedModel for full
transformers integration, or (b) using the Hub as
plain artifact storage. ADR 003 established
compute_predictions(document, pipeline) as the central inference API, with models downloaded from HF
Hub on first use.
HF Hub is already used in this project for dataset distribution:
snapshot_download
in xio.py fetches the
IAM-OnDB training dataset, the
benchmark dataset, and the
examples dataset. This ADR
extends that existing infrastructure to cover trained model artifacts.
The project is now at the point of training its first custom model:
Carbune, a bidirectional LSTM stack with CTC loss. Carbune
operates on online ink strokes (x/y/t time series from a stylus), a non-standard input domain that
the transformers library has no built-in support for. It is implemented as a
PyTorch Lightning LightningModule
(LitModule1) and
uses a custom AlphabetMapper tokeniser and a greedy CTC decoder — none of which map onto standard
transformers abstractions. This is the root reason why off-the-shelf PreTrainedModel integration is
impractical. This ADR decides:
- How custom models are stored on and retrieved from HF Hub.
- What format trained models are exported to for inference.
- How training code and its dependencies are organised within the repository.
- How HF Hub model repositories are named.
Constraints: - The training framework is not fixed: different models may use Lightning, HF Trainer, or raw PyTorch. - The inference artifact must work independently of the training framework used. - The inference environment must be lean (PyInstaller compatibility is a future goal; see issues #66, #67). - A pipeline may use multiple models (ADR 003); naming must reflect individual models, not pipelines.
Decision
1. HF Hub as plain artifact storage
Custom models are stored on HF Hub as raw files — not by subclassing PreTrainedModel or using
PyTorchModelHubMixin.
Each model repository contains the inference artifact(s) and whatever supporting files the model
requires to run (e.g. config.json, alphabet, tokeniser). The model builder decides what to upload
alongside the primary artifact; there is no enforced schema for supporting files. Files are downloaded
at inference time via
hf_hub_download.
A typical post-training upload looks like:
from huggingface_hub import HfApi
api = HfApi()
api.upload_file(path_or_fileobj="exports/model.onnx", path_in_repo="model.onnx",
repo_id="PellelNitram/xournalpp-htr-carbune")
api.upload_file(path_or_fileobj="exports/config.json", path_in_repo="config.json",
repo_id="PellelNitram/xournalpp-htr-carbune")
To keep consumers free from knowing which files to fetch — or even which repository each model lives in — every model class implements the following abstract base class:
from abc import ABC, abstractmethod
from typing import ClassVar
class HFHubInferenceModel(ABC):
HF_REPO_ID: ClassVar[str]
def __init__(self, revision: str):
self.revision = revision
@classmethod
@abstractmethod
def from_pretrained(cls, revision: str = "main") -> "HFHubInferenceModel": ...
def __repr__(self) -> str:
return f"{type(self).__name__}(repo={self.HF_REPO_ID!r}, revision={self.revision!r})"
The HF_REPO_ID class attribute binds each subclass to its repository. revision is stored on the
instance so callers can introspect which version is loaded (useful for logging and reproducibility),
and the default __repr__ surfaces both pieces of information for debug output.
Each concrete model encapsulates its own hf_hub_download calls and any supporting-file loading
inside from_pretrained. A typical implementation:
import json
import onnxruntime as ort
from huggingface_hub import hf_hub_download
class CarbuneModel(HFHubInferenceModel):
HF_REPO_ID = "PellelNitram/xournalpp-htr-carbune"
@classmethod
def from_pretrained(cls, revision: str = "main") -> "CarbuneModel":
onnx_path = hf_hub_download(cls.HF_REPO_ID, "model.onnx", revision=revision)
config_path = hf_hub_download(cls.HF_REPO_ID, "config.json", revision=revision)
with open(config_path) as f:
config = json.load(f)
return cls(
session=ort.InferenceSession(onnx_path),
config=config,
revision=revision,
)
Consumers then load any model with a single, parameter-free call:
model = CarbuneModel.from_pretrained()
The model class is the user-facing identity; consumers do not need to know the HF Hub repository ID
or which files to fetch. Pinning to a specific version is opt-in via the revision argument
(CarbuneModel.from_pretrained(revision="v1.2.0")).
This gives a uniform, from_pretrained-shaped loading interface without depending on transformers.
The ABC deliberately does not define a predict() / __call__() method — the inference signature
varies too much across models (different input domains, batch shapes, output types), and the central
inference API is compute_predictions(document, pipeline) per ADR 003.
The ABC's responsibility is model lifecycle (loading and version introspection), not inference shape.
Subsequent calls to hf_hub_download with the same arguments hit the local cache
(~/.cache/huggingface/hub/) and do not re-download unless the file changed on the Hub.
2. ONNX as the inference export format
After training, models are exported to ONNX and this export is the canonical
inference artifact. The training checkpoint (Lightning .ckpt, HF Trainer output, etc.) may also be
uploaded to the same HF Hub repository for resuming training, but inference always uses the ONNX export.
Rationale: ONNX is training-framework-agnostic, works with
onnxruntime (a lean dependency), and is more amenable to PyInstaller
bundling than full PyTorch.
3. Per-model training extras and subfolders
Training code lives under xournalpp_htr/training/<model-name>/ and
its dependencies are declared as a named
optional extra
in pyproject.toml:
[project.optional-dependencies]
training-carbune = ["lightning", "hydra-core", ...]
training-<next-model> = ["transformers", "datasets", ...]
Install options:
- uv add xournalpp_htr — inference only (lean)
- uv add xournalpp_htr[training-carbune] — inference + Carbune training
A [training] umbrella extra (installing every model's training dependencies at once) was considered
and rejected. It would grow unboundedly as new models are added, conflate incompatible framework
versions, and is unnecessary in practice — a developer typically works on one model at a time and can
install multiple per-model extras explicitly (uv add xournalpp_htr[training-carbune,training-<other>])
on the rare occasions when more than one is needed at once.
Shared training utilities (CTC decoder, evaluation metrics, dataset loaders used across models) live in
xournalpp_htr/training/shared/ with no extra dependencies beyond the base package.
Each training subfolder's __init__.py guards against missing dependencies:
try:
import lightning
except ImportError as e:
raise ImportError(
"Carbune training requires additional dependencies. "
"Install with: uv add xournalpp_htr[training-carbune]"
) from e
4. HF Hub repository naming
Individual model repositories follow the convention:
PellelNitram/xournalpp-htr-<model-name>
Examples: PellelNitram/xournalpp-htr-carbune, PellelNitram/xournalpp-htr-word-detector.
The model card documents the dataset the model was trained on. A pipeline (as defined in
ADR 003) may reference one or more model repositories;
pipeline-to-model mapping is done inside each pipeline implementation directly — there is no global
registry, lookup table, or algorithmic resolution. Each pipeline simply imports and instantiates the
concrete model classes it needs (e.g. CarbuneModel.from_pretrained()).
Rationale
Plain artifact storage over PreTrainedModel: subclassing PreTrainedModel would require a
significant rewrite of the Carbune architecture and a fixed training framework, while providing little
benefit for a model with a non-standard input domain (online ink strokes). The plain storage approach
unblocks model sharing immediately with no model code changes.
ONNX over Lightning checkpoints at inference: load_from_checkpoint binds inference to Lightning and
to the LitModule1 class definition. ONNX removes that binding entirely: any training framework can
produce the export, and any runtime that supports onnxruntime can consume it. This is also the path of
least resistance for future PyInstaller packaging.
Per-model extras with no umbrella: different models need incompatible frameworks; a union extra
would bloat every training environment and grow unboundedly as new models are added. Named per-model
extras keep environments minimal and make dependency intent explicit. No umbrella [training] extra is
provided — developers who need multiple model environments can combine extras explicitly.
PyTorchModelHubMixin deferred: this would give from_pretrained / push_to_hub on custom
nn.Modules without a full PreTrainedModel rewrite, and is the preferred long-term path — the goal is
a uniform from_pretrained interface across both off-the-shelf transformers models and custom
architectures. The blocker is concrete: the Carbune network (Carbune2020NetAttempt1) is referenced in
the Hydra config but has not been extracted into its own class — the LSTM layers and linear head live
directly inside LitModule1. Until the network is separated from the Lightning training wrapper, the
mixin cannot be applied. This upgrade should be revisited once the model architecture stabilises.
Consequences
Pros
- Training framework is fully flexible — Lightning, HF Trainer, or raw PyTorch are all valid.
- Inference has minimal dependencies:
onnxruntime+huggingface_hub. - Model sharing is unblocked immediately without any model code changes.
- Per-model extras keep training environments lean and explicit.
- ONNX export is compatible with the PyInstaller packaging path (issues #66, #67).
- Naming convention is simple and consistent across all models.
- The
HFHubInferenceModel.from_pretrained()interface gives consumers a uniform, parameter-free loading API across all custom models without requiringtransformers. The class itself identifies the model; consumers never deal with repository IDs or file lists.
Cons
- ONNX export must be written and validated for each model. ONNX export traces the model with example
inputs and freezes the operations into a static graph, so Python-level control flow whose path depends
on tensor values at runtime is captured as whichever branch the trace happened to take. For Carbune
this affects mostly the CTC decoder (greedy/beam-search loops over output probabilities), so the
decoder will likely run in Python outside the ONNX graph rather than being exported. Variable-length
sequence handling also requires explicit dynamic axes declarations. Workarounds when control flow is
needed inside the graph: use
torch.jit.scriptbefore export to preserve control flow, use the newertorch.onnx.export(..., dynamo=True)path, or restructure the model so that dynamic logic lives outside the exported portion. - Consumers cannot use
transformers.from_pretraineddirectly. TheHFHubInferenceModel.from_pretrained()ABC gives a morally equivalent interface, but consumers must still know which concrete model class to instantiate (there is noAutoModelequivalent that dispatches from a repo ID alone). - Supporting files (alphabet, config) alongside the ONNX are model-specific with no enforced schema — the model builder is responsible for documenting what is required.
- Developers who need to work on multiple models simultaneously must combine per-model extras
explicitly (
uv add xournalpp_htr[training-a,training-b]); there is no single command to install every training environment at once.
Related Decisions
- Issue #71 (eval dataset storage) is
largely resolved by existing infrastructure: the benchmark dataset already lives on HF Hub
(
PellelNitram/xournalpp_htr_benchmark) and is consumed viasnapshot_downloadinxio.py. The existing dataset will be extended rather than replaced; no new eval dataset format decision is needed. - Issues #66,
#67 (installation modes, PyInstaller):
the per-model extras and ONNX inference format directly support the planned installation mode split.
The lean base install (
uv add xournalpp_htr) corresponds to the end-user installation mode. - Issues #62,
#69 (CLI shape, HTR entry point): the
inference loading pattern decided here (
hf_hub_download+onnxruntime) is what the future CLI entry point will use internally. - Issue #73 (training delivery):
Training is run on a GPU-enabled machine with the per-model extras installed via
uv. Cloud or CI training is explicitly out of scope for now (and likely permanently); a possible future direction is on-demand cloud training via Vertex AI or similar, but no commitment is made here. Docker is not the default delivery mechanism — it will only be introduced if a future model's training environment proves too painful to reproduce withuvalone. The decision is to start with auvenvironment and reach for Docker only when dependency hell forces the issue.
Alternatives
- Subclass
PreTrainedModel: full transformers integration, inference via standardfrom_pretrained. Rejected: requires rewriting the model interface, fixes the training framework, and provides little benefit for a non-standard input domain. PyTorchModelHubMixin: addsfrom_pretrained/push_to_hubto anynn.Modulewithouttransformers. Deferred: requires extracting the network from the Lightning wrapper first.- TorchScript instead of ONNX: also framework-agnostic but harder to bundle with PyInstaller and requires full PyTorch at inference time. Rejected in favour of ONNX.
- Single
[training]extra: simpler at first but bloated when multiple incompatible training frameworks coexist, and grows unboundedly as new models are added. Rejected in favour of per-model named extras with no umbrella. [training]umbrella extra alongside per-model extras: gives a one-shot "install everything" command. Rejected for the same growth reason — even as a convenience, an extra that pulls in every framework version every project ever supported will eventually become unusable.