Bearse Feature Account Health - Metrics Monitor v0.6.4
Bearse Feature Account Health - Metrics Monitor v0.6.4
Introduction
We’re excited to announce the release of Bearse Feature Account Health v0.6.4, which introduces a powerful new Metrics Monitor capability. This release enhances our account health monitoring suite by automatically identifying AWS resources that lack proper CloudWatch monitoring coverage, ensuring no critical infrastructure goes unmonitored.
The Metrics Monitor proactively scans your AWS environment to detect resources without alarms configured, helping maintain comprehensive observability and preventing monitoring blind spots that could lead to service disruptions.
Features
Metrics Monitor - Automated Monitoring Gap Detection
The new Metrics Monitor is a sophisticated monitoring service that automatically identifies AWS resources lacking CloudWatch alarms. This feature is designed to ensure complete monitoring coverage across your infrastructure by:
Core Functionality
- Automated Resource Scanning: Continuously scans AWS resources to identify those without CloudWatch alarms
- RDS Instance Monitoring: Currently supports Amazon RDS database instances with plans for expansion to other AWS services
- Tag-Based Exclusions: Respects resource-level monitoring preferences through the
base2_monitortag - CloudWatch Integration: Publishes metrics about unmonitored resources directly to CloudWatch
- Critical Alerting: Triggers critical alerts when resources without monitoring are detected
Smart Resource Detection
The Metrics Monitor intelligently identifies resources that should be monitored by:
- Querying CloudWatch for all available metrics in supported namespaces (currently
AWS/RDS) - Cross-referencing metrics with existing alarm configurations
- Filtering out resources explicitly tagged for exclusion
- Validating resource existence to avoid false positives
Configurable Monitoring Scope
- Flexible Configuration: Easily enable or disable metrics monitoring through the
EnableMetricsMonitoringparameter - Namespace Support: Built with extensible architecture to support additional AWS service namespaces
- Dimension Awareness: Monitors specific metric dimensions (e.g.,
DBInstanceIdentifierfor RDS)
Tag-Based Resource Management
Resources can be excluded from monitoring checks using the base2_monitor tag:
- Set
base2_monitor: falseon any RDS instance to exclude it from monitoring requirements - Useful for temporary resources, development environments, or legacy systems with alternative monitoring
Comprehensive Alerting
- Critical Severity: Unmonitored resources trigger critical alerts to ensure immediate attention
- Actionable Notifications: Alerts include specific resource identifiers and monitoring gaps
- CloudWatch Integration: All alerts are delivered through existing CloudWatch alarm infrastructure
Enhanced Infrastructure Configuration
Lambda Function Architecture
- Efficient Execution: Lightweight Python-based Lambda function optimized for scanning large environments
- Resource Management: Configured with appropriate memory (128MB) and timeout (900 seconds) for comprehensive scans
- Error Handling: Robust error handling with detailed logging for troubleshooting
IAM Security Model
Implements least-privilege access with specific permissions for:
- CloudWatch Operations: Read metrics, list alarms, and publish custom metrics
- RDS Access: Describe instances and read resource tags
- Logging: Standard AWS Lambda execution role permissions
Automated Scheduling
- Hourly Execution: Runs every hour using EventBridge (CloudWatch Events) scheduling
- Non-Disruptive: Designed to operate without impacting monitored resources
- Scalable Architecture: Handles environments with hundreds of resources efficiently
Examples
Enabling Metrics Monitoring
To enable the Metrics Monitor in your Bearse deployment, configure the following parameter:
EnableMetricsMonitoring: 'true'
Excluding Resources from Monitoring
To exclude specific RDS instances from monitoring requirements, add the following tag:
Tags:
- Key: base2_monitor
Value: 'false'
CloudWatch Metrics
The Metrics Monitor publishes the following custom metric:
Namespace: MetricsMonitor
MetricName: ResourcesWithoutAlarms
Dimensions:
- Name: Service
Value: MetricsMonitor
Example Alert Configuration
The system automatically creates a CloudWatch alarm with the following configuration:
AlarmName: MetricsMonitoringAlarm
AlarmDescription: 'base2 - Resources without alarms detected'
MetricName: ResourcesWithoutAlarms
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
EvaluationPeriods: 1
Period: 3600 # 1 hour
Sample Lambda Function Output
When the Metrics Monitor detects unmonitored resources, it produces detailed logging:
Starting metrics monitoring check...
Checking metric: CPUUtilization in namespace: AWS/RDS
Found 5 resources to check for CPUUtilization in AWS/RDS with dimension DBInstanceIdentifier
Skipping resource prod-db-backup in AWS/RDS with dimension DBInstanceIdentifier because tag base2_monitor is set to false
Found 2 resources without monitoring alarms:
- production-primary-db (AWS/RDS/CPUUtilization)
- staging-replica-db (AWS/RDS/CPUUtilization)
Integration with Existing Monitoring
The Metrics Monitor seamlessly integrates with your existing monitoring infrastructure:
- SNS Integration: Critical alerts are sent through your configured SNS topics
- Log Aggregation: All monitoring activity is logged to CloudWatch Logs with 365-day retention
- Metric Collection: Custom metrics are available for dashboarding and additional alerting
Implementation Details
Supported AWS Services
- Amazon RDS: Database instances and clusters
- Future Expansion: Architecture designed to easily support additional AWS services
Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
EnableMetricsMonitoring |
true |
Enable/disable the Metrics Monitor functionality |
MonitoringNamespace |
MetricsMonitor |
CloudWatch namespace for custom metrics |
IAM Permissions Required
The Metrics Monitor requires the following AWS permissions:
cloudwatch:
- PutMetricData
- GetMetricStatistics
- ListMetrics
- DescribeAlarmsForMetric
rds:
- DescribeDBClusters
- DescribeDBInstances
- ListTagsForResource
Resource Tagging Strategy
For optimal results, implement a consistent tagging strategy:
- Monitoring Exclusions: Use
base2_monitor: falsefor resources that don’t require monitoring - Environment Tags: Consider environment-specific monitoring requirements
- Service Categories: Tag resources by service type for easier management
Conclusion
The Metrics Monitor in Bearse Feature Account Health v0.6.4 represents a significant advancement in automated infrastructure monitoring. By proactively identifying monitoring gaps, this feature helps ensure that no critical resources go unmonitored, reducing the risk of undetected service issues.
This release reinforces our commitment to comprehensive account health monitoring and provides the foundation for expanding monitoring coverage to additional AWS services. The intelligent, tag-aware approach ensures that monitoring requirements can be customized to match your specific infrastructure needs while maintaining complete visibility across your environment.
Next Steps for Users
- Deploy the Update: Update your Bearse deployment to version 0.6.4
- Configure Monitoring: Set
EnableMetricsMonitoring: 'true'in your stack parameters - Review Resources: Audit existing resources and apply
base2_monitortags as needed - Monitor Alerts: Watch for critical alerts indicating unmonitored resources
- Expand Coverage: Consider the monitoring requirements for your specific environment
The Metrics Monitor is designed to grow with your infrastructure, providing ongoing visibility and assurance that your monitoring coverage remains comprehensive as your AWS environment evolves.