What Is Intersection over Union?
Picture two rectangles of transparent colored film laid on a table -- one red (your prediction) and one blue (the ground truth). Where they overlap, you see purple. IoU asks: what fraction of the total colored area is purple? If the rectangles are perfectly aligned, the answer is 1.0 (100% overlap). If they do not touch at all, the answer is 0.0. This single number captures how well a predicted bounding box matches the true object location.
Technically, Intersection over Union (also called the Jaccard index for sets) is defined as:
where and are two bounding box regions. IoU ranges from 0 (no overlap) to 1 (perfect overlap). It is symmetric: .
How It Works
Computing IoU for Axis-Aligned Boxes
Given two boxes defined by their corners:
- Box A:
- Box B:
Step 1: Compute intersection coordinates:
Step 2: Compute intersection area:
Step 3: Compute union area:
Step 4: Compute IoU:
IoU Thresholds in Evaluation
| Threshold | Name | Use Case |
|---|---|---|
| 0.5 | AP50 | Standard PASCAL VOC metric, lenient |
| 0.75 | AP75 | Strict localization quality |
| 0.5:0.95 | AP (COCO primary) | Average over 10 thresholds: 0.50, 0.55, ..., 0.95 |
A detection is a true positive if IoU with a matched ground-truth box exceeds the threshold and the class is correct; otherwise, it is a false positive.
IoU as a Loss Function
Standard IoU loss for bounding box regression:
This has a critical flaw: when boxes do not overlap (), the gradient is zero, providing no learning signal.
Generalized IoU (GIoU, 2019)
Rezatofighi et al. addressed the zero-gradient problem:
where is the smallest enclosing box of and . GIoU ranges from to $1$, providing a gradient even when boxes do not overlap.
Distance-IoU (DIoU) and Complete-IoU (CIoU, 2020)
where is the Euclidean distance between box centers and is the diagonal of the enclosing box.
CIoU adds an aspect ratio consistency term:
where measures aspect ratio consistency and is a balancing parameter.
Why It Matters
- IoU is the standard localization metric used in every major detection benchmark (PASCAL VOC, COCO, Open Images, LVIS).
- COCO's primary metric (AP averaged over IoU 0.5:0.95) incentivizes precise localization, not just approximate overlap.
- IoU-based losses (GIoU, DIoU, CIoU) consistently outperform and box regression losses by 1-3% AP because they directly optimize the evaluation metric.
- IoU thresholds define what counts as a detection, making them among the most consequential hyperparameters in the entire detection pipeline.
Key Technical Details
- Computation cost: IoU between two boxes requires ~10 arithmetic operations. Pairwise IoU for boxes is .
- Scale invariance: IoU is invariant to box scale -- a 50% overlap at $32 \times 32 scores the same as at \512 \times 512$.
- GIoU loss improves Faster R-CNN by ~1% AP and YOLOv3 by ~2-3% AP compared to smooth loss.
- CIoU loss further improves over GIoU by ~0.5-1% AP by incorporating center distance and aspect ratio.
- PASCAL VOC uses AP50 (IoU ); COCO uses AP (averaged over 0.5:0.05:0.95), which is much stricter.
- IoU 0.5 vs. 0.75: A detector scoring 50% AP50 might score only 30% AP75, revealing coarse localization.
Common Misconceptions
- "IoU 0.5 means the prediction is 50% correct." IoU 0.5 means 50% of the union area is shared, but the prediction may include significant background or miss part of the object. Visually, IoU 0.5 boxes can look quite misaligned.
- "IoU is always the best matching metric." For very small objects (e.g., $10 \times 10$ pixels), a shift of a few pixels causes a large IoU drop, even though the detection is essentially correct. Some benchmarks use pixel distance for very small objects.
- "L1 or L2 loss on box coordinates is equivalent to IoU." These losses treat each coordinate independently and are not scale-invariant. A 10-pixel error matters much more for a $30 \times 30 box than a \300 \times 300$ box; IoU captures this naturally.
Connections to Other Concepts
- Non-Maximum Suppression: Uses IoU to determine which overlapping boxes to suppress.
- R-CNN: IoU thresholds determine positive/negative assignment during training (e.g., IoU for positives, IoU for negatives).
- DETR (Detection Transformer): Uses Generalized IoU in its matching cost and training loss.
- Focal Loss: Training sample assignment relies on IoU between anchors and ground-truth boxes.
- Sliding Window and Region Proposals: Proposal recall is evaluated at specific IoU thresholds.
Further Reading
- Everingham et al., "The PASCAL Visual Object Classes (VOC) Challenge" (2010) -- Established IoU-based AP evaluation for detection.
- Lin et al., "Microsoft COCO: Common Objects in Context" (2014) -- Introduced the averaged AP metric over multiple IoU thresholds.
- Rezatofighi et al., "Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression" (2019) -- GIoU.
- Zheng et al., "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression" (2020) -- DIoU and CIoU losses.