View on GitHub

deepracer-for-cloud

Creates an AWS DeepRacing training environment which can be deployed in the cloud, or locally on Ubuntu Linux, Windows or Mac.

Installing Deepracer-for-Cloud

Requirements

Depending on your needs as well as specific needs of the cloud platform you can configure your VM to your liking. Both CPU-only as well as GPU systems are supported.

AWS:

Azure:

Local:

Installation

The package comes with preparation and setup scripts that would allow a turn-key setup for a fresh virtual machine.

git clone https://github.com/aws-deepracer-community/deepracer-for-cloud.git

For cloud setup execute:

cd deepracer-for-cloud && ./bin/prepare.sh

This will prepare the VM by partitioning additional drives as well as installing all prerequisites. After a reboot it will continuee to run ./bin/init.sh setting up the full repository and downloading the core Docker images. Depending on your environment this may take up to 30 minutes. The scripts will create a file DONE once completed.

The installation script will adapt .profile to ensure that all settings are applied on login. Otherwise run the activation with source bin/activate.sh.

For local install it is recommended not to run the bin/prepare.sh script; it might do more changes than what you want. Rather ensure that all prerequisites are set up and run bin/init.sh directly.

See also the following article for guidance.

The Init Script takes a few parameters:

Variable Description
-c <cloud> Sets the cloud version to be configured, automatically updates the DR_CLOUD parameter in system.env. Options are azure, aws or local. Default is local
-a <arch> Sets the architecture to be configured. Either cpu or gpu. Default is gpu.

Environment Setup

The initialization script will attempt to auto-detect your environment (Azure, AWS or Local), and store the outcome in the DR_CLOUD parameter in system.env. You can also pass in a -c <cloud> parameter to override it, e.g. if you want to run the minio-based local mode in the cloud.

The main difference between the mode is based on authentication mechanisms and type of storage being configured. The next chapters will review each type of environment on its own.

AWS

In AWS it is possible to set up authentication to S3 in two ways: Integrated sign-on using IAM Roles or using access keys.

IAM Role

To use IAM Roles:

Manual setup

For access with IAM user:

Azure

Minio has deprecated the gateway feature that exposed an Azure Blob Storage as an S3 bucket. Azure mode now sets up minio in the same way as in local mode.

If you want to use awscli (aws) to manually move files then use aws $DR_LOCAL_PROFILE_ENDPOINT_URL s3 ..., as this will set both --profile and --endpoint-url parameters to match your configuration.

Local

Local mode runs a minio server that hosts the data in the docker/volumes directory. It is otherwise command-compatible with the Azure setup; as the data is accessible via Minio and not via native S3.

In Local mode the script-set requires the following:

First Run

For the first run the following final steps are needed. This creates a training run with all default values in

After a while you will see the sagemaker logs on the screen.

Troubleshooting

Here are some hints for troubleshooting specific issues you may encounter

Local training troubleshooting

Issue Troubleshooting hint
Get messages like "Sagemaker is not running" Run docker -ps a to see if the containers are running or if they stopped due to some errors. If running after a fresh install, try restarting the system.
Check docker errors for specific container Run docker logs -f <containerid>
Get message "Error response from daemon: could not choose an IP address to advertise since this system has multiple addresses on interface <your_interface> ..." when running ./bin/init.sh -c local -a cpu It means you have multiple IP addresses and you need to specify one within ./bin/init.sh.
If you don't care which one to use, you can get the first one by running ifconfig | grep $(route | awk '/^default/ {print $8}') -a1 | grep -o -P '(?<=inet ).*(?= netmask).
Edit ./bin/init.sh and locate line docker swarm init and change it to docker swarm init --advertise-addr <your_IP>.
Rerun ./bin/init.sh -c local -a cpu
I don't have any of the dr-* commands Run source bin/activate.sh.