You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

156 lines
6.8 KiB

  1. # Supported object detection evaluation protocols
  2. The Tensorflow Object Detection API currently supports three evaluation protocols,
  3. that can be configured in `EvalConfig` by setting `metrics_set` to the
  4. corresponding value.
  5. ## PASCAL VOC 2010 detection metric
  6. `EvalConfig.metrics_set='pascal_voc_detection_metrics'`
  7. The commonly used mAP metric for evaluating the quality of object detectors,
  8. computed according to the protocol of the PASCAL VOC Challenge 2010-2012. The
  9. protocol is available
  10. [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/devkit_doc_08-May-2010.pdf).
  11. ## Weighted PASCAL VOC detection metric
  12. `EvalConfig.metrics_set='weighted_pascal_voc_detection_metrics'`
  13. The weighted PASCAL metric computes the mean average precision as the average
  14. precision when treating all classes as a single class. In comparison,
  15. PASCAL metrics computes the mean average precision as the mean of the
  16. per-class average precisions.
  17. For example, the test set consists of two classes, "cat" and "dog", and there
  18. are ten times more boxes of "cat" than those of "dog". According to PASCAL VOC
  19. 2010 metric, performance on each of the two classes would contribute equally
  20. towards the final mAP value, while for the Weighted PASCAL VOC metric the final
  21. mAP value will be influenced by frequency of each class.
  22. ## PASCAL VOC 2010 instance segmentation metric
  23. `EvalConfig.metrics_set='pascal_voc_instance_segmentation_metrics'`
  24. Similar to Pascal VOC 2010 detection metric, but computes the intersection over
  25. union based on the object masks instead of object boxes.
  26. ## Weighted PASCAL VOC instance segmentation metric
  27. `EvalConfig.metrics_set='weighted_pascal_voc_instance_segmentation_metrics'`
  28. Similar to the weighted pascal voc 2010 detection metric, but computes the
  29. intersection over union based on the object masks instead of object boxes.
  30. ## COCO detection metrics
  31. `EvalConfig.metrics_set='coco_detection_metrics'`
  32. The COCO metrics are the official detection metrics used to score the
  33. [COCO competition](http://cocodataset.org/) and are similar to Pascal VOC
  34. metrics but have a slightly different implementation and report additional
  35. statistics such as mAP at IOU thresholds of .5:.95, and precision/recall
  36. statistics for small, medium, and large objects.
  37. See the
  38. [pycocotools](https://github.com/cocodataset/cocoapi/tree/master/PythonAPI)
  39. repository for more details.
  40. ## COCO mask metrics
  41. `EvalConfig.metrics_set='coco_mask_metrics'`
  42. Similar to the COCO detection metrics, but computes the
  43. intersection over union based on the object masks instead of object boxes.
  44. ## Open Images V2 detection metric
  45. `EvalConfig.metrics_set='oid_V2_detection_metrics'`
  46. This metric is defined originally for evaluating detector performance on [Open
  47. Images V2 dataset](https://github.com/openimages/dataset) and is fairly similar
  48. to the PASCAL VOC 2010 metric mentioned above. It computes interpolated average
  49. precision (AP) for each class and averages it among all classes (mAP).
  50. The difference to the PASCAL VOC 2010 metric is the following: Open Images
  51. annotations contain `group-of` ground-truth boxes (see [Open Images data
  52. description](https://github.com/openimages/dataset#annotations-human-bboxcsv)),
  53. that are treated differently for the purpose of deciding whether detections are
  54. "true positives", "ignored", "false positives". Here we define these three
  55. cases:
  56. A detection is a "true positive" if there is a non-group-of ground-truth box,
  57. such that:
  58. * The detection box and the ground-truth box are of the same class, and
  59. intersection-over-union (IoU) between the detection box and the ground-truth
  60. box is greater than the IoU threshold (default value 0.5). \
  61. Illustration of handling non-group-of boxes: \
  62. ![alt
  63. groupof_case_eval](img/nongroupof_case_eval.png "illustration of handling non-group-of boxes: yellow box - ground truth bounding box; green box - true positive; red box - false positives.")
  64. * yellow box - ground-truth box;
  65. * green box - true positive;
  66. * red boxes - false positives.
  67. * This is the highest scoring detection for this ground truth box that
  68. satisfies the criteria above.
  69. A detection is "ignored" if it is not a true positive, and there is a `group-of`
  70. ground-truth box such that:
  71. * The detection box and the ground-truth box are of the same class, and the
  72. area of intersection between the detection box and the ground-truth box
  73. divided by the area of the detection is greater than 0.5. This is intended
  74. to measure whether the detection box is approximately inside the group-of
  75. ground-truth box. \
  76. Illustration of handling `group-of` boxes: \
  77. ![alt
  78. groupof_case_eval](img/groupof_case_eval.png "illustration of handling group-of boxes: yellow box - ground truth bounding box; grey boxes - two detections of cars, that are ignored; red box - false positive.")
  79. * yellow box - ground-truth box;
  80. * grey boxes - two detections on cars, that are ignored;
  81. * red box - false positive.
  82. A detection is a "false positive" if it is neither a "true positive" nor
  83. "ignored".
  84. Precision and recall are defined as:
  85. * Precision = number-of-true-positives/(number-of-true-positives + number-of-false-positives)
  86. * Recall = number-of-true-positives/number-of-non-group-of-boxes
  87. Note that detections ignored as firing on a `group-of` ground-truth box do not
  88. contribute to the number of true positives.
  89. The labels in Open Images are organized in a
  90. [hierarchy](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html).
  91. Ground-truth bounding-boxes are annotated with the most specific class available
  92. in the hierarchy. For example, "car" has two children "limousine" and "van". Any
  93. other kind of car is annotated as "car" (for example, a sedan). Given this
  94. convention, the evaluation software treats all classes independently, ignoring
  95. the hierarchy. To achieve high performance values, object detectors should
  96. output bounding-boxes labelled in the same manner.
  97. The old metric name is DEPRECATED.
  98. `EvalConfig.metrics_set='open_images_V2_detection_metrics'`
  99. ## OID Challenge Object Detection Metric 2018
  100. `EvalConfig.metrics_set='oid_challenge_detection_metrics'`
  101. The metric for the OID Challenge Object Detection Metric 2018, Object Detection
  102. track. The description is provided on the [Open Images Challenge
  103. website](https://storage.googleapis.com/openimages/web/challenge.html).
  104. The old metric name is DEPRECATED.
  105. `EvalConfig.metrics_set='oid_challenge_object_detection_metrics'`
  106. ## OID Challenge Visual Relationship Detection Metric 2018
  107. The metric for the OID Challenge Visual Relationship Detection Metric 2018, Visual
  108. Relationship Detection track. The description is provided on the [Open Images
  109. Challenge
  110. website](https://storage.googleapis.com/openimages/web/challenge.html). Note:
  111. this is currently a stand-alone metric, that can be used only through the
  112. `metrics/oid_vrd_challenge_evaluation.py` util.