MVTec LOCO Foreground Segmentation Tool
This tool uses U²-Net for generating binary foreground masks from the MVTec LOCO anomaly detection dataset.
Overview
The mvtec_loco_fg_segmentation.py script processes the entire MVTec LOCO dataset and generates binary foreground masks for all images. It uses the U²-Net model to perform salient object detection and converts the probability maps to binary masks.
Features
- Complete Dataset Processing: Processes all categories (breakfast_box, screw_bag, juice_bottle, splicing_connectors, pushpins)
- Flexible Structure: Handles both test and train splits with all subdirectories (good, logical_anomalies, structural_anomalies)
- Binary Mask Output: Generates clean binary masks (0/255) in L mode (grayscale)
- Configurable Parameters: Customizable threshold, categories, splits, and processing options
- GPU/CPU Support: Automatic detection and utilization of available hardware
Requirements
Environment Setup
# Create conda environment
conda create -n u2net python=3.8 -y
conda activate u2net
# Install dependencies
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --index-url https://download.pytorch.org/whl/cu116
pip install opencv-python scikit-image matplotlib numpy pillow
Model Weights
Option 1: Automatic Download (Recommended)
# Install HuggingFace Hub
pip install huggingface_hub
# The model will be automatically downloaded when you run the script
python mvtec_loco_fg_segmentation.py
Option 2: Manual Download
- Download
u2net.pth(176.3 MB) from GoogleDrive - Place it in:
./saved_models/u2net/u2net.pth
Option 3: Download from HuggingFace
# Download only the model
python download_from_hf.py --model-only
# Or download the complete repository
python download_from_hf.py --complete-repo
Dataset Structure
Ensure your MVTec LOCO dataset follows this structure:
mvtec_loco_anomaly_detection/
├── breakfast_box/
│ ├── test/
│ │ ├── good/
│ │ ├── logical_anomalies/
│ │ └── structural_anomalies/
│ └── train/
│ └── good/
├── screw_bag/
│ ├── test/
│ └── train/
└── ... (other categories)
Usage
Basic Usage
# Process entire dataset with default settings
python mvtec_loco_fg_segmentation.py
# Show help
python mvtec_loco_fg_segmentation.py -h
Advanced Usage
# Specify custom dataset and model paths
python mvtec_loco_fg_segmentation.py \
--dataset_path /path/to/mvtec_loco \
--model_path /path/to/u2net.pth
# Process specific categories only
python mvtec_loco_fg_segmentation.py \
--categories breakfast_box juice_bottle
# Process only test split
python mvtec_loco_fg_segmentation.py \
--splits test
# Use different threshold for binary mask generation
python mvtec_loco_fg_segmentation.py \
--threshold 0.3
# Custom output directory name
python mvtec_loco_fg_segmentation.py \
--output_dir custom_masks
# Optimize processing with multiple workers
python mvtec_loco_fg_segmentation.py \
--num_workers 4 \
--batch_size 4
Command Line Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
--dataset_path |
str | /root/hy-data/datasets/mvtec_loco_anomaly_detection |
Path to MVTec LOCO dataset root |
--model_path |
str | ./saved_models/u2net/u2net.pth |
Path to U2NET model weights |
--output_dir |
str | fg_mask |
Output directory name for masks |
--threshold |
float | 0.5 |
Threshold for binary mask generation |
--categories |
list | all 5 categories |
Categories to process |
--splits |
list | ['test', 'train'] |
Dataset splits to process |
--batch_size |
int | 1 |
Batch size for processing |
--num_workers |
int | 1 |
Number of data loading workers |
Output Structure
The script generates masks in the following structure:
mvtec_loco_anomaly_detection/
├── fg_mask/ # Generated masks directory
│ ├── breakfast_box/
│ │ ├── test/
│ │ │ ├── good/
│ │ │ │ ├── 000.png # Binary mask (0/255 values)
│ │ │ │ ├── 001.png
│ │ │ │ └── ...
│ │ │ ├── logical_anomalies/
│ │ │ └── structural_anomalies/
│ │ └── train/
│ │ └── good/
│ └── ... (other categories)
└── ... (original dataset)
Mask Properties
- Format: PNG images
- Mode: L (grayscale, single channel)
- Values: Binary (0 for background, 255 for foreground)
- Size: Same as original images
- Threshold: Configurable (default 0.5)
Performance Notes
- GPU Recommended: Processing is significantly faster with CUDA-enabled GPU
- Memory Usage: Each image requires ~200MB GPU memory during processing
- Processing Time: ~2-3 seconds per image on modern GPU
- Total Images: ~5000+ images in complete dataset
Troubleshooting
Common Issues
- CUDA Out of Memory: Reduce batch size or use CPU processing
- Model Not Found: Ensure u2net.pth is in correct directory
- Dataset Path Error: Verify MVTec LOCO dataset structure
- Permission Errors: Check write permissions for output directory
Error Messages
ERROR: Dataset path not found: Check dataset path and extractionERROR: Model path not found: Download and place u2net.pth correctlyERROR: Invalid categories: Use valid category names
Examples Output
The script provides detailed progress information:
Configuration:
Dataset path: /root/hy-data/datasets/mvtec_loco_anomaly_detection
Model path: ./saved_models/u2net/u2net.pth
Output directory: fg_mask
Binary threshold: 0.5
Categories: ['breakfast_box', 'screw_bag', 'juice_bottle', 'splicing_connectors', 'pushpins']
Splits: ['test', 'train']
...load U2NET---
Processing category: breakfast_box
Processing breakfast_box/test/good
Found 102 images
Processing 1/102: 000.png
Processing 20/102: 019.png
...
Citation
If you use this tool in your research, please cite the original U²-Net paper:
@InProceedings{Qin_2020_PR,
title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
journal = {Pattern Recognition},
volume = {106},
pages = {107404},
year = {2020}
}
License
This tool extends the original U²-Net implementation. Please refer to the original repository for license information.