State-of-the-art Polish text recognition. Fine-tuned for correct handling of Polish diacritics (a, c, e, l, n, o, s, z, z). First release for ongoing R&D.
10,000 synthetic Polish document images across 7 categories:
Training: 1 epoch, AdamW optimizer, linear LR schedule
Framework: PEFT 0.18.0 + Transformers
| Metric | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| Character Error Rate (CER) | 5.58% | 1.60% | v 71.3% |
| Word Error Rate (WER) | 13.37% | 7.21% | v 46.1% |
| Exact Match | 74% | 76% | ^ 2% |
Key improvement: Resolved Polish diacritic confusion (l, e, s, etc.)
This is the first fine-tune in ongoing R&D. We need your help to push Polish OCR to the next level.
Real Polish documents needed: invoices, receipts, historical documents, handwritten notes, street signs.
Help us evaluate Rys OCR on more Polish-specific benchmarks and compare with other models.
Collaborate on next iterations: architecture experiments, training strategies, deployment optimization.