---
license: apache-2.0
library_name: ultralytics
tags:
  - object-detection
  - yolo
  - yolov11
  - text-detection
  - ocr
  - document-analysis
  - ultralytics
datasets:
  - DonkeySmall/Yolo-Text-Detection
metrics:
  - precision
  - recall
  - map50
  - map50-95
model-index:
  - name: YOLO11n Text
    results:
      - task:
          type: object-detection
          name: Text Detection
        dataset:
          type: DonkeySmall/Yolo-Text-Detection
          name: YOLO Text Detection
          split: validation
        metrics:
          - type: precision
            value: 0.957
            name: Precision
          - type: recall
            value: 0.936
            name: Recall
          - type: map50
            value: 0.976
            name: mAP@50
          - type: map50-95
            value: 0.818
            name: mAP@50-95
---

# YOLO11n Text

A fine-tuned YOLO11n model for detecting text regions in images. This model is optimized for detecting text bounding boxes in documents, screenshots, UI interfaces, and natural scene images.

## Model Description

This model is based on [Ultralytics YOLO11n](https://docs.ultralytics.com/models/yolo11/) (nano variant) and has been fine-tuned specifically for text detection tasks. It detects text regions as bounding boxes, which can be used as input for OCR pipelines or UI automation tasks.

### Model Architecture

- **Base Model**: YOLO11n (nano)
- **Parameters**: 2,590,035
- **Layers**: 181
- **Input Size**: 640x640
- **Classes**: 1 (text)

## Training Details

### Dataset

- **Source**: [DonkeySmall/Yolo-Text-Detection](https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection)
- **Training Images**: 22,661
- **Validation Images**: 2,518
- **Total Images**: 25,179
- **Format**: YOLO (normalized xywh)

### Training Configuration

| Parameter | Value |
|-----------|-------|
| Epochs | 50 |
| Batch Size | 16 |
| Image Size | 640 |
| Optimizer | SGD (auto) |
| Learning Rate | 0.01 → 0.0003 |
| Momentum | 0.937 |
| Weight Decay | 0.0005 |
| Warmup Epochs | 3.0 |
| AMP | Enabled |
| Workers | 8 |

### Augmentation

| Augmentation | Value |
|--------------|-------|
| HSV Hue | 0.015 |
| HSV Saturation | 0.7 |
| HSV Value | 0.4 |
| Translation | 0.1 |
| Scale | 0.5 |
| Horizontal Flip | 0.5 |
| Mosaic | 1.0 |
| Erasing | 0.4 |
| Auto Augment | randaugment |

### Hardware

- **GPU**: NVIDIA GeForce RTX 5070 Ti (16GB VRAM)
- **Training Time**: ~1.75 hours (6,267 seconds)
- **Framework**: Ultralytics 8.3.240, PyTorch 2.9.1+cu128

## Performance Metrics

### Final Results (Epoch 50)

| Metric | Value |
|--------|-------|
| **Precision** | 95.7% |
| **Recall** | 93.6% |
| **mAP@50** | 97.6% |
| **mAP@50-95** | 81.8% |
| Box Loss | 0.619 |
| Class Loss | 0.376 |
| DFL Loss | 0.828 |

### Training Progress

| Epoch | mAP@50 | mAP@50-95 | Precision | Recall |
|-------|--------|-----------|-----------|--------|
| 1 | 89.1% | 64.3% | 86.0% | 82.7% |
| 10 | 95.9% | 76.8% | 93.5% | 90.7% |
| 20 | 96.9% | 79.5% | 94.8% | 92.0% |
| 30 | 97.3% | 80.8% | 95.1% | 93.1% |
| 40 | 97.6% | 81.5% | 95.6% | 93.5% |
| 50 | 97.6% | 81.8% | 95.7% | 93.6% |

## Usage

### Installation

```bash
pip install ultralytics
```

### Inference

```python
from ultralytics import YOLO

# Load the model
model = YOLO("best.pt")

# Run inference
results = model.predict(
    source="image.jpg",
    conf=0.25,
    iou=0.7,
    imgsz=640
)

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        # Get bounding box coordinates (x1, y1, x2, y2)
        xyxy = box.xyxy[0].tolist()
        confidence = box.conf[0].item()
        print(f"Text box: {xyxy}, confidence: {confidence:.2f}")
```

### Batch Processing

```python
from ultralytics import YOLO
from pathlib import Path

model = YOLO("best.pt")

# Process folder of images
results = model.predict(
    source="path/to/images/",
    conf=0.25,
    save=True,  # Save annotated images
    save_txt=True  # Save YOLO format labels
)
```

### Export to Other Formats

```python
from ultralytics import YOLO

model = YOLO("best.pt")

# Export to ONNX
model.export(format="onnx", imgsz=640, simplify=True)

# Export to TensorRT (for NVIDIA GPUs)
model.export(format="engine", imgsz=640, half=True)

# Export to CoreML (for Apple devices)
model.export(format="coreml", imgsz=640)
```

## Model Files

| File | Description |
|------|-------------|
| `best.pt` | Best checkpoint (highest mAP@50) |
| `args.yaml` | Training configuration |
| `results.csv` | Training metrics per epoch |
| `results.png` | Training curves visualization |
| `confusion_matrix.png` | Confusion matrix |
| `BoxPR_curve.png` | Precision-Recall curve |

## Recommended Inference Parameters

| Parameter | Recommended | Description |
|-----------|-------------|-------------|
| `conf` | 0.25 | Confidence threshold |
| `iou` | 0.7 | NMS IoU threshold |
| `imgsz` | 640-1024 | Input image size |
| `max_det` | 300 | Maximum detections per image |

## Use Cases

- **OCR Preprocessing**: Detect text regions before applying OCR
- **Document Analysis**: Locate text areas in scanned documents
- **UI Automation**: Find text elements in application screenshots
- **Scene Text Detection**: Detect text in natural images
- **PDF Processing**: Extract text region locations

## Limitations

- Optimized for horizontal text; may have reduced accuracy on rotated text
- Trained primarily on document and UI images
- Single class (text) - does not distinguish between text types
- Best performance at 640px input size

## Citation

```bibtex
@software{yolo11n_text,
  author = {Ultralytics},
  title = {YOLO11n Text},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection}
}

@software{ultralytics_yolo,
  author = {Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing},
  title = {Ultralytics YOLO},
  year = {2023},
  publisher = {GitHub},
  url = {https://github.com/ultralytics/ultralytics}
}
```

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

## Acknowledgments

- [Ultralytics](https://ultralytics.com/) for the YOLO11 architecture
- [DonkeySmall](https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection) for the training dataset
- HuggingFace for model hosting