Introducing Washery Database Cleaner

Washery is the latest tool from the base2 R&D team to provide the capability of cleaning an RDS database dataset. This new tools provides the orchestration of resources required to clean a production RDS snapshot and convert it into a clean/anonymised RDS snapshot to restore into a non production environment or to dump into s3 for download by developers.

Washery is designed to run in your production or DataBunker AWS account making sure your raw data never leaves those confines. It creates resources in a self contained and isolated VPC independent of your production work load.

The tool works by inputing in a RDS snapshot along with a SQL script to clean and anonymised your dataset and outputting a new snapshot or a s3 dump of the dataset. A RDS instance is created from the inputted snapshot id inside the Washery VPC, then a Fargate task is run to execute the SQL script against the RDS instance and dump the dataset to S3. Access to the database is achieved by generating a new random password and modifying the master password after the RDS instance/cluster has launched so Washery has no knowledge of the original master password.

Washery can also be used to dump production RDS snapshots to S3 for cheaper backup storage, however Washery doesn’t support maintaining the lifecycle of the backups in S3.

Database Engine Support

Washery supports MySQL, Postgres and Microsoft SQL Server database engines on RDS as well as both Postgres and MySQL on Amazon Aurora.

AWS RDS Engines Supported
MySQL
Postgres
SQL Server
Aurora MySQL
Aurora Postgres

Jenkins Support

Washery is supported by the ciinabox-pipelines library providing the capability to automate the refresh of non production databases from a production database snapshot.

Here’s an example of taking the latest production snapshot and refreshing the dev database with the new snapshot outputted by Washery.

@Library('ciinabox') _

pipeline {

    environment {
        REGION = 'ap-southeast-2'
        DEV_ACCOUNT = '000000000000'
        PROD_ACCOUNT = '111111111111'
        WASHERY_ROLE = 'washery'
        DEV_CFN_ROLE = 'ciinabox'
    }

    agent {
        docker {
            image 'ghcr.io/base2services/washery:0.2.0'
        }
    }

    stages {

        stage('retrive latest snapshot') {
            steps {
                lookupSnapshot(
                    type: 'rds',
                    accountId: env.PROD_ACCOUNT,
                    region: env.REGION,
                    role: env.ROLE,
                    resource: 'production-db',
                    snapshotType: 'manual',
                    snapshot: 'latest'
                )
            }
        }

        stage('execute washery') {
            steps {
                washery(
                    region: env.REGION,
                    accountId: env.PROD_ACCOUNT,
                    role: env.WASHERY_ROLE,
                    snapshotId: env.SNAPSHOT_ID,
                    sqlScript: 'scrubber.sql',
                    scriptBucket: 'washery.demo.ap-southeast-2.111111111111',
                    instanceType: 'instance',
                    saveSnapshot: true
                )
            }
        }

        stage('refresh dev database') {
            steps {
                cloudformation(
                    stackName: 'dev',
                    action: 'update',
                    region: env.REGION,
                    parameters: [
                        'RDSSnapshotId' : env.WASHERY_FINAL_SNAPSHOT,
                    ],
                    accountId: env.DEV_ACCOUNT,
                    role: env.DEV_CFN_ROLE
                )
            }
        }

    }

}