|
|
- # Preparing Inputs
-
- [TOC]
-
- To use your own dataset in Tensorflow Object Detection API, you must convert it
- into the [TFRecord file format](https://www.tensorflow.org/api_guides/python/python_io#tfrecords_format_details).
- This document outlines how to write a script to generate the TFRecord file.
-
- ## Label Maps
-
- Each dataset is required to have a label map associated with it. This label map
- defines a mapping from string class names to integer class Ids. The label map
- should be a `StringIntLabelMap` text protobuf. Sample label maps can be found in
- object_detection/data. Label maps should always start from id 1.
-
- ## Dataset Requirements
-
- For every example in your dataset, you should have the following information:
-
- 1. An RGB image for the dataset encoded as jpeg or png.
- 2. A list of bounding boxes for the image. Each bounding box should contain:
- 1. A bounding box coordinates (with origin in top left corner) defined by 4
- floating point numbers [ymin, xmin, ymax, xmax]. Note that we store the
- _normalized_ coordinates (x / width, y / height) in the TFRecord dataset.
- 2. The class of the object in the bounding box.
-
- # Example Image
-
- Consider the following image:
-
- ![Example Image](img/example_cat.jpg "Example Image")
-
- with the following label map:
-
- ```
- item {
- id: 1
- name: 'Cat'
- }
-
-
- item {
- id: 2
- name: 'Dog'
- }
- ```
-
- We can generate a tf.Example proto for this image using the following code:
-
- ```python
-
- def create_cat_tf_example(encoded_cat_image_data):
- """Creates a tf.Example proto from sample cat image.
-
- Args:
- encoded_cat_image_data: The jpg encoded data of the cat image.
-
- Returns:
- example: The created tf.Example.
- """
-
- height = 1032.0
- width = 1200.0
- filename = 'example_cat.jpg'
- image_format = b'jpg'
-
- xmins = [322.0 / 1200.0]
- xmaxs = [1062.0 / 1200.0]
- ymins = [174.0 / 1032.0]
- ymaxs = [761.0 / 1032.0]
- classes_text = ['Cat']
- classes = [1]
-
- tf_example = tf.train.Example(features=tf.train.Features(feature={
- 'image/height': dataset_util.int64_feature(height),
- 'image/width': dataset_util.int64_feature(width),
- 'image/filename': dataset_util.bytes_feature(filename),
- 'image/source_id': dataset_util.bytes_feature(filename),
- 'image/encoded': dataset_util.bytes_feature(encoded_image_data),
- 'image/format': dataset_util.bytes_feature(image_format),
- 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
- 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
- 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
- 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
- 'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
- 'image/object/class/label': dataset_util.int64_list_feature(classes),
- }))
- return tf_example
- ```
-
- ## Conversion Script Outline {#conversion-script-outline}
-
- A typical conversion script will look like the following:
-
- ```python
-
- import tensorflow as tf
-
- from object_detection.utils import dataset_util
-
-
- flags = tf.app.flags
- flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
- FLAGS = flags.FLAGS
-
-
- def create_tf_example(example):
- # TODO(user): Populate the following variables from your example.
- height = None # Image height
- width = None # Image width
- filename = None # Filename of the image. Empty if image is not from file
- encoded_image_data = None # Encoded image bytes
- image_format = None # b'jpeg' or b'png'
-
- xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
- xmaxs = [] # List of normalized right x coordinates in bounding box
- # (1 per box)
- ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
- ymaxs = [] # List of normalized bottom y coordinates in bounding box
- # (1 per box)
- classes_text = [] # List of string class name of bounding box (1 per box)
- classes = [] # List of integer class id of bounding box (1 per box)
-
- tf_example = tf.train.Example(features=tf.train.Features(feature={
- 'image/height': dataset_util.int64_feature(height),
- 'image/width': dataset_util.int64_feature(width),
- 'image/filename': dataset_util.bytes_feature(filename),
- 'image/source_id': dataset_util.bytes_feature(filename),
- 'image/encoded': dataset_util.bytes_feature(encoded_image_data),
- 'image/format': dataset_util.bytes_feature(image_format),
- 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
- 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
- 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
- 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
- 'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
- 'image/object/class/label': dataset_util.int64_list_feature(classes),
- }))
- return tf_example
-
-
- def main(_):
- writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
-
- # TODO(user): Write code to read in your dataset to examples variable
-
- for example in examples:
- tf_example = create_tf_example(example)
- writer.write(tf_example.SerializeToString())
-
- writer.close()
-
-
- if __name__ == '__main__':
- tf.app.run()
-
- ```
-
- Note: You may notice additional fields in some other datasets. They are
- currently unused by the API and are optional.
-
- Note: Please refer to the section on [Running an Instance Segmentation
- Model](instance_segmentation.md) for instructions on how to configure a model
- that predicts masks in addition to object bounding boxes.
-
- ## Sharding datasets
-
- When you have more than a few thousand examples, it is beneficial to shard your
- dataset into multiple files:
-
- * tf.data.Dataset API can read input examples in parallel improving
- throughput.
- * tf.data.Dataset API can shuffle the examples better with sharded files which
- improves performance of the model slightly.
-
- Instead of writing all tf.Example protos to a single file as shown in
- [conversion script outline](#conversion-script-outline), use the snippet below.
-
- ```python
- import contextlib2
- from object_detection.dataset_tools import tf_record_creation_util
-
- num_shards=10
- output_filebase='/path/to/train_dataset.record'
-
- with contextlib2.ExitStack() as tf_record_close_stack:
- output_tfrecords = tf_record_creation_util.open_sharded_output_tfrecords(
- tf_record_close_stack, output_filebase, num_shards)
- for index, example in examples:
- tf_example = create_tf_example(example)
- output_shard_index = index % num_shards
- output_tfrecords[output_shard_index].write(tf_example.SerializeToString())
- ```
-
- This will produce the following output files
-
- ```bash
- /path/to/train_dataset.record-00000-00010
- /path/to/train_dataset.record-00001-00010
- ...
- /path/to/train_dataset.record-00009-00010
- ```
-
- which can then be used in the config file as below.
-
- ```bash
- tf_record_input_reader {
- input_path: "/path/to/train_dataset.record-?????-of-00010"
- }
- ```
|