Bearse Feature - Start Stop

April 18, 2023

The 0.0.1 release of the Start Stop feature introduces the basic implementation of the cfn-start-stop-stack gem as a bearse feature.

The start stop feature allows for scheculed stopping and starting of resources within a cloudformation stack, the schedule of which is defined by the user within a configuration.

Infrastructure

Stacks

The start stop feature consists of only a single stack which deploys the infrastructure needed for the feature namely the state machine, event triggers, roles and policies as well as the lambda functions responsible for stopping and starting the resources.

Lambdas

Retrieve Stack

This is the initial lambda function that is called by the state machine at the beginning of each iteration. It traverses both the stack and all nested stacks and returns a list of all supported resources grouped by their priority. The resource groups are then sent to either the stop or start lambda depending on whether a stop or start event is in progress.

Start

The start lambda is called when a start event is invoked, it iterates over the resources groups returned from the Retrieve Stack lambda and starts each resource group in ascending order of priority. After the resource group is started, it is sent to the Check State lambda.

Stop

The start lambda is called when a stop event is invoked, it iterates over the resources groups returned from the Retrieve Stack lambda and stops each resource group in ascending order of priority. After the resource group is stopped, it is sent to the Check State lambda.

Check State

The check state lambda is invoked from the Stop or Start lambda and is responsible for validating whether all resources in the specified resource group have reached their expected final state whether that is fully stopped or started. If not it waits for 1 minute and tries again. After the resource group has reached their expected state the lambda function passes back to either Stop or Start to begin operations on the next resource group. If there are no more resource groups to action, the lambda instead passes on to the Notification lambda function to send the slack message.

Notification

The notification lambda is invoked after the Check State lambda functions deems all resources groups have reached their final state. It simply sends a slack notification to a specified slack channel listing the resources that have been stopped or started.

Configuration

The start stop feature uses a config to define the stacks the feature will manage and schedules upon which it will do so. The config must consist of key called schedules which then contains any number of environments keys with the associated schedule and stack details.

Each environment schedule must contain the following parameters:

account - The AWS Account ID of the environment
stacks - A list of stacks that will be managed by the feature

Lastly each stack within the stacks list must contain the following parameters:

stack_name - The name of the stack that will be managed
start - The cron schedule that defines when the event for starting the stack should be triggered
stop - The cron schedule that defines when the event for stopping the stack should be triggered

An example is provided below.

{
  "config": {
    "schedules": {
      "dev": { #Environment Name
        "account": 123456789123, #AWS Account ID of env account
        "stacks": [ #List of stacks that will be managed
          {
            "stack_name": "Ec2InstanceStack", #The name of the Stack
            "start": "cron(0 5 * * ? *)",     #Cron schedule of when the stack should be started
            "stop": "cron(0 10 * * ? *)"	  #Cron schedule of when the stack should be stopped
          }
        ]
      },
      "ops": {
        "account": 987654321987,
        "stacks": [
          {
            "stack_name": "Ec2InstanceStack",
            "start": "cron(0 5 * * ? *)",
            "stop": "cron(0 10 * * ? *)"
          }
        ]
      }
    }
  }
}

As we can observe in the above example. The config provided defines schedules for two different environments dev and ops. When the feature is deployed it conditionally creates the event triggers based on the environment it deployed to, this means we can deploy the feature to different environments using the exact same config.

We can also observe the structure of each environment object above where we specify the stack name, start and stop cron schedules within the list of stacks for the given environment.

Example

We start by creating the config we will deploy our feature with. For this example we will use the following

"config": {
       "schedules": {
         "dev": {
           "account": 123456789123,
           "stacks": [
             {
               "stack_name": "TestStopStartResources",
               "start": "cron(0 5 * * ? *)",
               "stop": "cron(0 10 * * ? *)"
             },
             {
               "stack_name": "SomeOtherStack",
               "start": "cron(0 7 * * ? *)",
               "stop": "cron(0 9 * * ? *)"
             }
           ]
         },
         "ops": {
           "account": 987654321321,
           "stacks": [
             {
               "stack_name": "YetAnotherStack",
               "start": "cron(0 5 * * ? *)",
               "stop": "cron(0 10 * * ? *)"
             }
           ]
         }
       }
}

This config specifies schedules for two separate environments dev and ops, we will be deploying to dev so we are only interested in that for now. In dev we can observe that schedules for two stacks TestStopStartResources and SomeOtherStack are being created.

We deploy the feature with the specified config and wait till the deployment is completed. bearse deploy-stack --group test --feature start-stop --payload file://startstop_test_dev.json --profile dev
With our feature deployed, we can quickly observe the events created. Notice that only the events for the dev environment have been created.
From here, our feature will trigger the state machine based on the cron schedules defined in the above triggers. For this example however we will simulate this behaviour by manually triggering the state machine with a stop event.
With our state machine triggered, we can observe that it begins stopping the resources in order of priority and waiting till each group is finished before moving onto the next.
Once all resources are stopped we can observe the slack notification in the specified channel signaling the process is complete.

For further information regarding stacks, parameters, deployment details and detailed examples can be found with the documentation