-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Bug
RapidOcrOptions documentation example doesn't work with custom models from monkt/paddleocr-onnx
Steps to reproduce
The documentation example for RapidOcrOptions in [[docs/reference/models/ocr.md](https://docling-project.github.io/docling/examples/rapidocr_with_custom_models/)] shows how to use custom RapidOCR models, but the example is incomplete and doesn't work as written.
from docling.datamodel.pipeline_options import RapidOcrOptions
ocr_options = RapidOcrOptions(
det_model_path=det_path,
rec_model_path=rec_path,
rec_keys_path=dict_path,
cls_model_path=cls_path,
)With models from monkt/paddleocr-onnx repository, the code fails with:
RuntimeError: No class found with the name 'rapidocr'
ConfigKeyError: Missing key dict_urlRoot Cause:
The RapidOcrModel class in Docling requires the rapidocr library (not rapidocr_onnxruntime), which has a specific configuration format. The models from monkt/paddleocr-onnx are incompatible with the current RapidOcrModel implementation.
Expected Behavior:
Either:
The documentation should clarify that RapidOcrOptions only works with models from RapidAI/RapidOCR repository
Or support for rapidocr_onnxruntime should be added to use custom models like monkt/paddleocr-onnx
Current Workaround:
Docling version
2.64.0
Python version
3.12.5