Skip to content

RapidOcrOptions documentation example doesn't work with custom models from monkt/paddleocr-onnx #2742

@putassu

Description

@putassu

Bug

RapidOcrOptions documentation example doesn't work with custom models from monkt/paddleocr-onnx

Steps to reproduce

The documentation example for RapidOcrOptions in [[docs/reference/models/ocr.md](https://docling-project.github.io/docling/examples/rapidocr_with_custom_models/)] shows how to use custom RapidOCR models, but the example is incomplete and doesn't work as written.

from docling.datamodel.pipeline_options import RapidOcrOptions

ocr_options = RapidOcrOptions(
    det_model_path=det_path,
    rec_model_path=rec_path,
    rec_keys_path=dict_path,
    cls_model_path=cls_path,
)

With models from monkt/paddleocr-onnx repository, the code fails with:

RuntimeError: No class found with the name 'rapidocr'
ConfigKeyError: Missing key dict_url

Root Cause:

The RapidOcrModel class in Docling requires the rapidocr library (not rapidocr_onnxruntime), which has a specific configuration format. The models from monkt/paddleocr-onnx are incompatible with the current RapidOcrModel implementation.

Expected Behavior:

Either:

The documentation should clarify that RapidOcrOptions only works with models from RapidAI/RapidOCR repository
Or support for rapidocr_onnxruntime should be added to use custom models like monkt/paddleocr-onnx
Current Workaround:

Docling version

2.64.0

Python version

3.12.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions