Contents

Cloud resources - race conditions

Introduction

There is a saying among Linux administrators - with great power, comes great responsibility.

The same is true if you decide to use low-level tools (like Boto3) to manage cloud resources. You are in full control of your AWS services, and your responsibility is to handle them properly.

Race condition

A race condition can be difficult to reproduce because the end result is nondeterministic and depends on the relative timing between interfering processes.

In an automated deployment of cloud resources, race conditions often happen when services depend on each other.

The root cause of race condition issues is timing. If a script immediately invokes a series of AWS API calls (via Boto3), it will try to use/modify a service that is not yet fully deployed. This will lead to a corrupted state of your cloud environment and (sometimes hard to track and reproduce) issues.

Tools

You do not have to worry about race conditions when you deploy infrastructure using services like AWS CloudFormation. In that case, the provisioning service is responsible for solving dependencies between managed resources.

But the deployment of some types of systems can not be automated using CloudFormation. For instance, Internet of Things (IoT) systems require resources that are not (fully) supported by the CloudFormation service. In those cases, you need to use low-level tools like AWS SDK for Python (Boto3).

Boto3 Clients provide a low-level interface to AWS (closely related to service APIs). Clients are generated from a JSON service definition file, so they can manage all aspects of a given service.

Examples

Typical cases when the race condition can appear:

Solution

To avoid most of above problems, you should understand relations between AWS services. Invoke API calls in proper order.

In crucial places, add artificial delays in your script - this will give AWS Cloud enough time to finish the deployment of related services.

Carefully catch and manage exceptions returned by Boto3 Clients - add automated retries when appropriate and roll-back changes in AWS environment if you detect corrupted state.

Race condition - [link]
Boto3 low-level clients - [link]
AWS CloudFormation - [link]

Support quality content❤️ Donate💰

Sign up for news: (by subscribing you accept the privacy policy)