Washery

Washery makes it easy to clean, obfuscate and dump a RDS dataset by orchestrating the resources required to clean a production RDS snapshot and convert it into a clean/anonymised RDS snapshot to restore into a non production environment or to dump into s3 for download by developers. Washery is designed to run in your production or DataBunker AWS account making sure your raw data never leaves those confines. It creates resources in a self contained and isolated VPC independent of your production work load.

The tool works by inputing in a RDS snapshot along with a SQL script to clean and anonymised your dataset and outputting a new snapshot or a s3 dump of the dataset. A RDS instance is created from the inputted snapshot id inside the Washery VPC, then a Fargate task is run to execute the SQL script against the RDS instance and dump the dataset to S3. Access to the database is achieved by generating a new random password and modifying the master password after the RDS instance/cluster has launched so Washery has no knowledge of the original master password.

Architecture Diagram

Source: https://github.com/base2Services/washery