--- license: apache-2.0 library_name: ultralytics tags: - object-detection - yolo - yolov11 - text-detection - ocr - document-analysis - ultralytics datasets: - DonkeySmall/Yolo-Text-Detection metrics: - precision - recall - map50 - map50-95 model-index: - name: YOLO11n Text results: - task: type: object-detection name: Text Detection dataset: type: DonkeySmall/Yolo-Text-Detection name: YOLO Text Detection split: validation metrics: - type: precision value: 0.957 name: Precision - type: recall value: 0.936 name: Recall - type: map50 value: 0.976 name: mAP@50 - type: map50-95 value: 0.818 name: mAP@50-95 --- # YOLO11n Text A fine-tuned YOLO11n model for detecting text regions in images. This model is optimized for detecting text bounding boxes in documents, screenshots, UI interfaces, and natural scene images. ## Model Description This model is based on [Ultralytics YOLO11n](https://docs.ultralytics.com/models/yolo11/) (nano variant) and has been fine-tuned specifically for text detection tasks. It detects text regions as bounding boxes, which can be used as input for OCR pipelines or UI automation tasks. ### Model Architecture - **Base Model**: YOLO11n (nano) - **Parameters**: 2,590,035 - **Layers**: 181 - **Input Size**: 640x640 - **Classes**: 1 (text) ## Training Details ### Dataset - **Source**: [DonkeySmall/Yolo-Text-Detection](https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection) - **Training Images**: 22,661 - **Validation Images**: 2,518 - **Total Images**: 25,179 - **Format**: YOLO (normalized xywh) ### Training Configuration | Parameter | Value | |-----------|-------| | Epochs | 50 | | Batch Size | 16 | | Image Size | 640 | | Optimizer | SGD (auto) | | Learning Rate | 0.01 → 0.0003 | | Momentum | 0.937 | | Weight Decay | 0.0005 | | Warmup Epochs | 3.0 | | AMP | Enabled | | Workers | 8 | ### Augmentation | Augmentation | Value | |--------------|-------| | HSV Hue | 0.015 | | HSV Saturation | 0.7 | | HSV Value | 0.4 | | Translation | 0.1 | | Scale | 0.5 | | Horizontal Flip | 0.5 | | Mosaic | 1.0 | | Erasing | 0.4 | | Auto Augment | randaugment | ### Hardware - **GPU**: NVIDIA GeForce RTX 5070 Ti (16GB VRAM) - **Training Time**: ~1.75 hours (6,267 seconds) - **Framework**: Ultralytics 8.3.240, PyTorch 2.9.1+cu128 ## Performance Metrics ### Final Results (Epoch 50) | Metric | Value | |--------|-------| | **Precision** | 95.7% | | **Recall** | 93.6% | | **mAP@50** | 97.6% | | **mAP@50-95** | 81.8% | | Box Loss | 0.619 | | Class Loss | 0.376 | | DFL Loss | 0.828 | ### Training Progress | Epoch | mAP@50 | mAP@50-95 | Precision | Recall | |-------|--------|-----------|-----------|--------| | 1 | 89.1% | 64.3% | 86.0% | 82.7% | | 10 | 95.9% | 76.8% | 93.5% | 90.7% | | 20 | 96.9% | 79.5% | 94.8% | 92.0% | | 30 | 97.3% | 80.8% | 95.1% | 93.1% | | 40 | 97.6% | 81.5% | 95.6% | 93.5% | | 50 | 97.6% | 81.8% | 95.7% | 93.6% | ## Usage ### Installation ```bash pip install ultralytics ``` ### Inference ```python from ultralytics import YOLO # Load the model model = YOLO("best.pt") # Run inference results = model.predict( source="image.jpg", conf=0.25, iou=0.7, imgsz=640 ) # Process results for result in results: boxes = result.boxes for box in boxes: # Get bounding box coordinates (x1, y1, x2, y2) xyxy = box.xyxy[0].tolist() confidence = box.conf[0].item() print(f"Text box: {xyxy}, confidence: {confidence:.2f}") ``` ### Batch Processing ```python from ultralytics import YOLO from pathlib import Path model = YOLO("best.pt") # Process folder of images results = model.predict( source="path/to/images/", conf=0.25, save=True, # Save annotated images save_txt=True # Save YOLO format labels ) ``` ### Export to Other Formats ```python from ultralytics import YOLO model = YOLO("best.pt") # Export to ONNX model.export(format="onnx", imgsz=640, simplify=True) # Export to TensorRT (for NVIDIA GPUs) model.export(format="engine", imgsz=640, half=True) # Export to CoreML (for Apple devices) model.export(format="coreml", imgsz=640) ``` ## Model Files | File | Description | |------|-------------| | `best.pt` | Best checkpoint (highest mAP@50) | | `args.yaml` | Training configuration | | `results.csv` | Training metrics per epoch | | `results.png` | Training curves visualization | | `confusion_matrix.png` | Confusion matrix | | `BoxPR_curve.png` | Precision-Recall curve | ## Recommended Inference Parameters | Parameter | Recommended | Description | |-----------|-------------|-------------| | `conf` | 0.25 | Confidence threshold | | `iou` | 0.7 | NMS IoU threshold | | `imgsz` | 640-1024 | Input image size | | `max_det` | 300 | Maximum detections per image | ## Use Cases - **OCR Preprocessing**: Detect text regions before applying OCR - **Document Analysis**: Locate text areas in scanned documents - **UI Automation**: Find text elements in application screenshots - **Scene Text Detection**: Detect text in natural images - **PDF Processing**: Extract text region locations ## Limitations - Optimized for horizontal text; may have reduced accuracy on rotated text - Trained primarily on document and UI images - Single class (text) - does not distinguish between text types - Best performance at 640px input size ## Citation ```bibtex @software{yolo11n_text, author = {Ultralytics}, title = {YOLO11n Text}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection} } @software{ultralytics_yolo, author = {Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing}, title = {Ultralytics YOLO}, year = {2023}, publisher = {GitHub}, url = {https://github.com/ultralytics/ultralytics} } ``` ## License This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Acknowledgments - [Ultralytics](https://ultralytics.com/) for the YOLO11 architecture - [DonkeySmall](https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection) for the training dataset - HuggingFace for model hosting