yigitcolakoglu
/
MyCity

# Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud

This page is a walkthrough for training an object detector using the TensorflowObject Detection API. In this tutorial, we'll be training on the Oxford-IIIT Petsdataset to build a system to detect various breeds of cats and dogs. The outputof the detector will look like the following:
![](img/oxford_pet.png)
## Setting up a Project on Google Cloud

To accelerate the process, we'll run training and evaluation on [Google CloudML Engine](https://cloud.google.com/ml-engine/) to leverage multiple GPUs. Tobegin, you will have to set up Google Cloud via the following steps (if you havealready done this, feel free to skip to the next section):
1. [Create a GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects).2. [Install the Google Cloud SDK](https://cloud.google.com/sdk/downloads) onyour workstation or laptop.This will provide the tools you need to upload files to Google Cloud Storage andstart ML training jobs.3. [Enable the ML EngineAPIs](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component&_ga=1.73374291.1570145678.1496689256).By default, a new GCP project does not enable APIs to start ML Engine trainingjobs. Use the above link to explicitly enable them.4. [Set up a Google Cloud Storage (GCS)bucket](https://cloud.google.com/storage/docs/creating-buckets). ML Enginetraining jobs can only access files on a Google Cloud Storage bucket. In thistutorial, we'll be required to upload our dataset and configuration to GCS.
Please remember the name of your GCS bucket, as we will reference it multipletimes in this document. Substitute `${YOUR_GCS_BUCKET}` with the name ofyour bucket in this document. For your convenience, you should define theenvironment variable below:
``` bashexport YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET}```
It is also possible to run locally by following[the running locally instructions](running_locally.md).
## Installing Tensorflow and the Tensorflow Object Detection API

Please run through the [installation instructions](installation.md) to installTensorflow and all it dependencies. Ensure the Protobuf libraries arecompiled and the library directories are added to `PYTHONPATH`.
## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage

In order to train a detector, we require a dataset of images, bounding boxes andclassifications. For this demo, we'll use the Oxford-IIIT Pets dataset. The rawdataset for Oxford-IIIT Pets lives[here](http://www.robots.ox.ac.uk/~vgg/data/pets/). You will need to downloadboth the image dataset [`images.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz)and the groundtruth data [`annotations.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz)to the `tensorflow/models/research/` directory and unzip them. This may takesome time.
``` bash# From tensorflow/models/research/
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gzwget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gztar -xvf images.tar.gztar -xvf annotations.tar.gz```
After downloading the tarballs, your `tensorflow/models/research/` directoryshould appear as follows:
```lang-none- images.tar.gz- annotations.tar.gz+ images/+ annotations/+ object_detection/... other files and directories```
The Tensorflow Object Detection API expects data to be in the TFRecord format,so we'll now run the `create_pet_tf_record` script to convert from the rawOxford-IIIT Pet dataset into TFRecords. Run the following commands from the`tensorflow/models/research/` directory:
``` bash# From tensorflow/models/research/
python object_detection/dataset_tools/create_pet_tf_record.py \    --label_map_path=object_detection/data/pet_label_map.pbtxt \    --data_dir=`pwd` \    --output_dir=`pwd````
Note: It is normal to see some warnings when running this script. You may ignorethem.
Two 10-sharded TFRecord files named `pet_faces_train.record-*` and`pet_faces_val.record-*` should be generated in the`tensorflow/models/research/` directory.
Now that the data has been generated, we'll need to upload it to Google CloudStorage so the data can be accessed by ML Engine. Run the following command tocopy the files into your GCS bucket (substituting `${YOUR_GCS_BUCKET}`):
```bash# From tensorflow/models/research/
gsutil cp pet_faces_train.record-* gs://${YOUR_GCS_BUCKET}/data/gsutil cp pet_faces_val.record-* gs://${YOUR_GCS_BUCKET}/data/gsutil cp object_detection/data/pet_label_map.pbtxt gs://${YOUR_GCS_BUCKET}/data/pet_label_map.pbtxt```
Please remember the path where you upload the data to, as we will need thisinformation when configuring the pipeline in a following step.
## Downloading a COCO-pretrained Model for Transfer Learning

Training a state of the art object detector from scratch can take days, evenwhen using multiple GPUs! In order to speed up training, we'll take an objectdetector trained on a different dataset (COCO), and reuse some of it'sparameters to initialize our new model.
Download our [COCO-pretrained Faster R-CNN with Resnet-101model](http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz).Unzip the contents of the folder and copy the `model.ckpt*` files into your GCSBucket.
``` bashwget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gztar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gzgsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${YOUR_GCS_BUCKET}/data/```
Remember the path where you uploaded the model checkpoint to, as we will need itin the following step.
## Configuring the Object Detection Pipeline

In the Tensorflow Object Detection API, the model parameters, trainingparameters and eval parameters are all defined by a config file. More detailscan be found [here](configuring_jobs.md). For this tutorial, we will use somepredefined templates provided with the source code. In the`object_detection/samples/configs` folder, there are skeleton object_detectionconfiguration files. We will use `faster_rcnn_resnet101_pets.config` as astarting point for configuring the pipeline. Open the file with your favouritetext editor.
We'll need to configure some paths in order for the template to work. Search thefile for instances of `PATH_TO_BE_CONFIGURED` and replace them with theappropriate value (typically `gs://${YOUR_GCS_BUCKET}/data/`). Afterwardsupload your edited file onto GCS, making note of the path it was uploaded to(we'll need it when starting the training/eval jobs).
``` bash# From tensorflow/models/research/

# Edit the faster_rcnn_resnet101_pets.config template. Please note that there
# are multiple places where PATH_TO_BE_CONFIGURED needs to be set.
sed -i "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \    object_detection/samples/configs/faster_rcnn_resnet101_pets.config
# Copy edited template to cloud.
gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \    gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config```
## Checking Your Google Cloud Storage Bucket

At this point in the tutorial, you should have uploaded the training/validationdatasets (including label map), our COCO trained FasterRCNN finetune checkpoint and your jobconfiguration to your Google Cloud Storage Bucket. Your bucket should look likethe following:
```lang-none+ ${YOUR_GCS_BUCKET}/  + data/    - faster_rcnn_resnet101_pets.config    - model.ckpt.index    - model.ckpt.meta    - model.ckpt.data-00000-of-00001    - pet_label_map.pbtxt    - pet_faces_train.record-*    - pet_faces_val.record-*```
You can inspect your bucket using the [Google Cloud Storagebrowser](https://console.cloud.google.com/storage/browser).
## Starting Training and Evaluation Jobs on Google Cloud ML Engine

Before we can start a job on Google Cloud ML Engine, we must:
1. Package the Tensorflow Object Detection code.2. Write a cluster configuration for our Google Cloud ML job.
To package the Tensorflow Object Detection code, run the following commands fromthe `tensorflow/models/research/` directory:
```bash# From tensorflow/models/research/
bash object_detection/dataset_tools/create_pycocotools_package.sh /tmp/pycocotoolspython setup.py sdist(cd slim && python setup.py sdist)```
This will create python packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz, and /tmp/pycocotools/pycocotools-2.0.tar.gz.
For running the training Cloud ML job, we'll configure the cluster to use 5training jobs and three parameters servers. Theconfiguration file can be found at `object_detection/samples/cloud/cloud.yml`.
Note: The code sample below is supported for use with 1.12 runtime version.
To start training and evaluation, execute the following command from the`tensorflow/models/research/` directory:
```bash# From tensorflow/models/research/
gcloud ml-engine jobs submit training `whoami`_object_detection_pets_`date +%m_%d_%Y_%H_%M_%S` \    --runtime-version 1.12 \    --job-dir=gs://${YOUR_GCS_BUCKET}/model_dir \    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \    --module-name object_detection.model_main \    --region us-central1 \    --config object_detection/samples/cloud/cloud.yml \    -- \    --model_dir=gs://${YOUR_GCS_BUCKET}/model_dir \    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config```
Users can monitor and stop training and evaluation jobs on the [ML EngineDashboard](https://console.cloud.google.com/mlengine/jobs).
## Monitoring Progress with Tensorboard

You can monitor progress of the training and eval jobs by running Tensorboard onyour local machine:
```bash# This command needs to be run once to allow your local machine to access your
# GCS bucket.
gcloud auth application-default login
tensorboard --logdir=gs://${YOUR_GCS_BUCKET}/model_dir```
Once Tensorboard is running, navigate to `localhost:6006` from your favouriteweb browser. You should see something similar to the following:
![](img/tensorboard.png)
Make sure your Tensorboard version is the same minor version as your Tensorflow (1.x)
You will also want to click on the images tab to see example detections made bythe model while it trains. After about an hour and a half of training, you canexpect to see something like this:
![](img/tensorboard2.png)
Note: It takes roughly 10 minutes for a job to get started on ML Engine, androughly an hour for the system to evaluate the validation dataset. It may takesome time to populate the dashboards. If you do not see any entries after halfan hour, check the logs from the [ML EngineDashboard](https://console.cloud.google.com/mlengine/jobs). Note that by defaultthe training jobs are configured to go for much longer than is necessary forconvergence.  To save money, we recommend killing your jobs once you've seenthat they've converged.
## Exporting the Tensorflow Graph

After your model has been trained, you should export it to a Tensorflow graphproto. First, you need to identify a candidate checkpoint to export. You cansearch your bucket using the [Google Cloud StorageBrowser](https://console.cloud.google.com/storage/browser). The file should bestored under `${YOUR_GCS_BUCKET}/model_dir`. The checkpoint will typicallyconsist of three files:
* `model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001`* `model.ckpt-${CHECKPOINT_NUMBER}.index`* `model.ckpt-${CHECKPOINT_NUMBER}.meta`
After you've identified a candidate checkpoint to export, run the followingcommand from `tensorflow/models/research/`:
```bash# From tensorflow/models/research/
gsutil cp gs://${YOUR_GCS_BUCKET}/model_dir/model.ckpt-${CHECKPOINT_NUMBER}.* .python object_detection/export_inference_graph.py \    --input_type image_tensor \    --pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_pets.config \    --trained_checkpoint_prefix model.ckpt-${CHECKPOINT_NUMBER} \    --output_directory exported_graphs```
Afterwards, you should see a directory named `exported_graphs` containing theSavedModel and frozen graph.
## Configuring the Instance Segmentation Pipeline

Mask prediction can be turned on for an object detection config by adding`predict_instance_masks: true` within the `MaskRCNNBoxPredictor`. Otherparameters such as mask size, number of convolutions in the mask layer, and theconvolution hyper parameters can be defined. We will use`mask_rcnn_resnet101_pets.config` as a starting point for configuring theinstance segmentation pipeline. Everything above that was mentioned about objectdetection holds true for instance segmentation. Instance segmentation consistsof an object detection model with an additional head that predicts the objectmask inside each predicted box once we remove the training and other details.Please refer to the section on [Running an Instance SegmentationModel](instance_segmentation.md) for instructions on how to configure a modelthat predicts masks in addition to object bounding boxes.
## What's Next

Congratulations, you have now trained an object detector for various cats anddogs! There different things you can do now:
1. [Test your exported model using the provided Jupyter notebook.](running_notebook.md)2. [Experiment with different model configurations.](configuring_jobs.md)3. Train an object detector using your own data.