You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

105 lines
5.1 KiB

  1. ## Run an Instance Segmentation Model
  2. For some applications it isn't adequate enough to localize an object with a
  3. simple bounding box. For instance, you might want to segment an object region
  4. once it is detected. This class of problems is called **instance segmentation**.
  5. <p align="center">
  6. <img src="img/kites_with_segment_overlay.png" width=676 height=450>
  7. </p>
  8. ### Materializing data for instance segmentation {#materializing-instance-seg}
  9. Instance segmentation is an extension of object detection, where a binary mask
  10. (i.e. object vs. background) is associated with every bounding box. This allows
  11. for more fine-grained information about the extent of the object within the box.
  12. To train an instance segmentation model, a groundtruth mask must be supplied for
  13. every groundtruth bounding box. In additional to the proto fields listed in the
  14. section titled [Using your own dataset](using_your_own_dataset.md), one must
  15. also supply `image/object/mask`, which can either be a repeated list of
  16. single-channel encoded PNG strings, or a single dense 3D binary tensor where
  17. masks corresponding to each object are stacked along the first dimension. Each
  18. is described in more detail below.
  19. #### PNG Instance Segmentation Masks
  20. Instance segmentation masks can be supplied as serialized PNG images.
  21. ```shell
  22. image/object/mask = ["\x89PNG\r\n\x1A\n\x00\x00\x00\rIHDR\...", ...]
  23. ```
  24. These masks are whole-image masks, one for each object instance. The spatial
  25. dimensions of each mask must agree with the image. Each mask has only a single
  26. channel, and the pixel values are either 0 (background) or 1 (object mask).
  27. **PNG masks are the preferred parameterization since they offer considerable
  28. space savings compared to dense numerical masks.**
  29. #### Dense Numerical Instance Segmentation Masks
  30. Masks can also be specified via a dense numerical tensor.
  31. ```shell
  32. image/object/mask = [0.0, 0.0, 1.0, 1.0, 0.0, ...]
  33. ```
  34. For an image with dimensions `H` x `W` and `num_boxes` groundtruth boxes, the
  35. mask corresponds to a [`num_boxes`, `H`, `W`] float32 tensor, flattened into a
  36. single vector of shape `num_boxes` * `H` * `W`. In TensorFlow, examples are read
  37. in row-major format, so the elements are organized as:
  38. ```shell
  39. ... mask 0 row 0 ... mask 0 row 1 ... // ... mask 0 row H-1 ... mask 1 row 0 ...
  40. ```
  41. where each row has W contiguous binary values.
  42. To see an example tf-records with mask labels, see the examples under the
  43. [Preparing Inputs](preparing_inputs.md) section.
  44. ### Pre-existing config files
  45. We provide four instance segmentation config files that you can use to train
  46. your own models:
  47. 1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_resnet_v2_atrous_coco.config" target=_blank>mask_rcnn_inception_resnet_v2_atrous_coco</a>
  48. 1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet101_atrous_coco.config" target=_blank>mask_rcnn_resnet101_atrous_coco</a>
  49. 1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet50_atrous_coco.config" target=_blank>mask_rcnn_resnet50_atrous_coco</a>
  50. 1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_v2_coco.config" target=_blank>mask_rcnn_inception_v2_coco</a>
  51. For more details see the [detection model zoo](detection_model_zoo.md).
  52. ### Updating a Faster R-CNN config file
  53. Currently, the only supported instance segmentation model is [Mask
  54. R-CNN](https://arxiv.org/abs/1703.06870), which requires Faster R-CNN as the
  55. backbone object detector.
  56. Once you have a baseline Faster R-CNN pipeline configuration, you can make the
  57. following modifications in order to convert it into a Mask R-CNN model.
  58. 1. Within `train_input_reader` and `eval_input_reader`, set
  59. `load_instance_masks` to `True`. If using PNG masks, set `mask_type` to
  60. `PNG_MASKS`, otherwise you can leave it as the default 'NUMERICAL_MASKS'.
  61. 1. Within the `faster_rcnn` config, use a `MaskRCNNBoxPredictor` as the
  62. `second_stage_box_predictor`.
  63. 1. Within the `MaskRCNNBoxPredictor` message, set `predict_instance_masks` to
  64. `True`. You must also define `conv_hyperparams`.
  65. 1. Within the `faster_rcnn` message, set `number_of_stages` to `3`.
  66. 1. Add instance segmentation metrics to the set of metrics:
  67. `'coco_mask_metrics'`.
  68. 1. Update the `input_path`s to point at your data.
  69. Please refer to the section on [Running the pets dataset](running_pets.md) for
  70. additional details.
  71. > Note: The mask prediction branch consists of a sequence of convolution layers.
  72. > You can set the number of convolution layers and their depth as follows:
  73. >
  74. > 1. Within the `MaskRCNNBoxPredictor` message, set the
  75. > `mask_prediction_conv_depth` to your value of interest. The default value
  76. > is 256. If you set it to `0` (recommended), the depth is computed
  77. > automatically based on the number of classes in the dataset.
  78. > 1. Within the `MaskRCNNBoxPredictor` message, set the
  79. > `mask_prediction_num_conv_layers` to your value of interest. The default
  80. > value is 2.