Setting Up Alerts
Alert configuration is crucial for effective monitoring. This guide walks you through setting up notification channels and creating alert rules using the 9n9s web interface so you’re notified immediately when issues occur.
Overview
Section titled “Overview”9n9s alerting works in two parts:
- Notification Channels: Where alerts are sent (email, Slack, PagerDuty, etc.)
- Alert Rules: When alerts are triggered and which channels to use
Step 1: Create Notification Channels
Section titled “Step 1: Create Notification Channels”Email Notifications
Section titled “Email Notifications”-
Navigate to Notification Channels
- Log into app.9n9s.com
- Go to Organization Settings in the sidebar
- Click “Notification Channels”
-
Add Email Channel
- Click “Add Channel” button
- Select “Email” from the channel types
- Fill in the configuration:
- Channel Name: “Operations Team”
- Email Address: [email protected]
- Template: Choose “Detailed” for comprehensive alerts
- Click “Create Channel”
The email channel will now appear in your list and can be used in alert rules.
Slack Integration
Section titled “Slack Integration”Method 1: Slack App (Recommended)
-
Install Slack App
- In 9n9s, go to Organization Settings > Notification Channels
- Click “Add Channel” → “Slack”
- Click “Install Slack App”
- You’ll be redirected to Slack to authorize the app
- Select the workspace and channels you want 9n9s to access
- Return to 9n9s and your Slack channels will be available
-
Configure Channel Settings
- Select the Slack channel for notifications
- Choose message formatting options
- Set up any channel-specific preferences
- Click “Create Channel”
Method 2: Webhook
-
Create Slack Webhook
- In your Slack workspace, go to your app settings
- Create an “Incoming Webhook” integration
- Copy the webhook URL
-
Add Webhook in 9n9s
- In 9n9s, click “Add Channel” → “Slack Webhook”
- Enter:
- Channel Name: “Alerts Channel”
- Webhook URL: Paste your Slack webhook URL
- Channel: #alerts (optional, for display)
- Username: 9n9s Monitor (optional)
- Click “Create Channel”
PagerDuty Integration
Section titled “PagerDuty Integration”-
Get PagerDuty Integration Key
- In PagerDuty, go to Services → Select your service
- Go to Integrations tab
- Click “Add Integration” → “9n9s” (or “Generic API”)
- Copy the Integration Key
-
Add PagerDuty Channel in 9n9s
- Click “Add Channel” → “PagerDuty”
- Enter:
- Channel Name: “On-Call Team”
- Integration Key: Paste your PagerDuty key
- Routing Key: (if using Events API v2)
- Click “Create Channel”
SMS Notifications
Section titled “SMS Notifications”-
Add SMS Channel
- Click “Add Channel” → “SMS”
- Configure:
- Channel Name: “Critical Alerts SMS”
- Provider: Select Twilio
- Phone Numbers: Add recipient numbers (one per line)
- From Number: Your Twilio number
- Click “Create Channel”
-
Twilio Configuration
- You’ll need to provide your Twilio credentials in Organization Settings
- Go to Organization Settings > Integrations > Twilio
- Enter your Account SID and Auth Token
Step 2: Create Alert Rules
Section titled “Step 2: Create Alert Rules”Basic Alert Rule for a Specific Monitor
Section titled “Basic Alert Rule for a Specific Monitor”-
Navigate to Alert Rules
- Go to your project dashboard
- Click “Alert Rules” in the navigation
- Click “Create Alert Rule”
-
Configure Basic Rule
- Rule Name: “Database Backup Failure”
- Description: “Alert when daily backup fails”
-
Set Conditions
- Monitor Selection: Choose “Specific Monitors”
- Select your “Daily Database Backup” monitor
- Status Changes: Check “Down” and “Failed”
- Duration: Leave blank for immediate alerts
-
Configure Actions
- Click “Add Action”
- Notification Channel: Select “Operations Team” (email)
- Delay: 0 minutes (immediate)
- Priority: High
-
Save Rule
- Review your configuration
- Click “Create Alert Rule”
Tag-Based Alert Rules
Section titled “Tag-Based Alert Rules”Create rules that apply to multiple monitors based on tags:
-
Create Tag-Based Rule
- Rule Name: “Production Service Alerts”
- Description: “Alert for all production service failures”
-
Set Tag Conditions
- Monitor Selection: Choose “Monitors with tags”
- Tag Filters:
environmentequalsproductioncriticalityequalshigh
- Status Changes: Select “Down” and “Degraded”
-
Configure Multiple Actions
- Action 1: Slack alert (immediate)
- Channel: “Alerts Channel”
- Delay: 0 minutes
- Action 2: PagerDuty escalation
- Channel: “On-Call Team”
- Delay: 0 minutes
- Action 1: Slack alert (immediate)
Escalation Rules
Section titled “Escalation Rules”Set up progressive alerting with increasing urgency:
-
Create Escalation Rule
- Rule Name: “Critical Service Escalation”
- Monitor Selection: Tags where
criticalityequalscritical
-
Configure Escalation Chain
- Action 1: Slack notification
- Delay: 0 minutes
- Channel: “#alerts”
- Action 2: Email notification
- Delay: 5 minutes
- Channel: “Operations Team”
- Action 3: PagerDuty incident
- Delay: 15 minutes
- Channel: “On-Call Team”
- Action 4: SMS alerts
- Delay: 30 minutes
- Channel: “Critical Alerts SMS”
- Action 1: Slack notification
-
Set Conditions
- Only escalate if monitor is still down
- Skip escalation if incident is acknowledged
Step 3: Advanced Alert Configuration
Section titled “Step 3: Advanced Alert Configuration”Alert Grouping
Section titled “Alert Grouping”Prevent alert spam by grouping related alerts:
-
Enable Grouping in Rule Settings
- In your alert rule, go to Advanced Settings
- Enable “Group similar alerts”
- Grouping Window: 5 minutes
- Maximum Alerts: 5 alerts per group
- Group By: Tags (environment, service)
-
Deduplication Settings
- Enable “Deduplicate alerts”
- Deduplication Window: 30 minutes
- This prevents the same alert from being sent multiple times
Maintenance Windows
Section titled “Maintenance Windows”Suppress alerts during planned maintenance:
-
Schedule Maintenance
- Go to Organization Settings > Maintenance Windows
- Click “Schedule Maintenance”
- Title: “Database Upgrade”
- Start Time: Select date and time
- Duration: 4 hours
- Affected Monitors: Select specific monitors or use tags
-
Maintenance Settings
- Suppress All Alerts: Yes
- Notify Subscribers: Yes (for status page subscribers)
- Automatic Status: Set monitors to “Maintenance” status
Conditional Alerting
Section titled “Conditional Alerting”Set up time-based and condition-based alerting:
-
Business Hours Alerting
- In alert rule settings, enable “Time Windows”
- Active Days: Monday-Friday
- Active Hours: 9:00 AM - 5:00 PM
- Timezone: America/New_York
-
After-Hours Critical Only
- Create separate rule for after-hours
- Time Windows: Evenings and weekends
- Additional Conditions: Only
criticalityequalscritical - Different Channels: Direct to on-call team
Step 4: Testing Your Alerts
Section titled “Step 4: Testing Your Alerts”Test Notification Channels
Section titled “Test Notification Channels”-
Test Individual Channels
- Go to Organization Settings > Notification Channels
- Find your channel in the list
- Click the “Test” button next to it
- Enter a test message
- Choose severity level
- Click “Send Test”
-
Verify Delivery
- Check that the test message arrives
- Verify formatting and content
- Ensure links work correctly
Simulate Monitor Failures
Section titled “Simulate Monitor Failures”For Heartbeat Monitors:
-
Temporary Failure Simulation
- Go to your monitor’s detail page
- Click “Simulate” → “Failure”
- This sends a test failure pulse
- Watch for alerts in your channels
-
Late Pulse Simulation
- Click “Simulate” → “Late”
- This simulates missing a scheduled pulse
- Alerts should trigger after the grace period
For Uptime Monitors:
-
Edit Monitor Temporarily
- Go to monitor settings
- Change URL to a non-existent endpoint
- Save changes
- Wait for next check to fail
- Restore correct URL afterward
-
Modify Assertions
- Temporarily change expected status code
- Or modify content assertions to fail
- Monitor next check results
Verify Alert Delivery and Recovery
Section titled “Verify Alert Delivery and Recovery”-
Check Alert History
- Go to Project Dashboard > Alerts
- View recent alert activity
- Verify alerts were sent to correct channels
-
Test Recovery Notifications
- After simulating failure, restore monitor to working state
- Verify that recovery notifications are sent
- Check that alert is marked as resolved
Common Alert Patterns
Section titled “Common Alert Patterns”Production Environment Setup
Section titled “Production Environment Setup”-
Critical Services
- Monitors: All monitors tagged with
criticality=critical - Immediate Actions: Slack + PagerDuty
- Escalation: SMS after 15 minutes if unacknowledged
- Monitors: All monitors tagged with
-
Important Services
- Monitors: All monitors tagged with
criticality=high - Actions: Slack immediate, email after 5 minutes
- Business Hours: Different channels for business vs after hours
- Monitors: All monitors tagged with
Development and Staging
Section titled “Development and Staging”-
Development Environment
- Monitors: All monitors tagged with
environment=development - Actions: Email only, 10-minute delay
- Quiet Hours: No alerts during off-hours
- Monitors: All monitors tagged with
-
Staging Environment
- Monitors: All monitors tagged with
environment=staging - Actions: Slack alerts during business hours
- Pre-production: Higher priority during testing periods
- Monitors: All monitors tagged with
Team-Based Routing
Section titled “Team-Based Routing”-
Backend Team Alerts
- Tag Filter:
team=backendorservice=api - Channel: Backend team Slack channel
- Escalation: Team lead email after 10 minutes
- Tag Filter:
-
Infrastructure Team Alerts
- Tag Filter:
team=infrastructureorcomponent=database - Channel: Infrastructure Slack + email
- Escalation: On-call rotation via PagerDuty
- Tag Filter:
Best Practices
Section titled “Best Practices”Alert Rule Organization
Section titled “Alert Rule Organization”- Use Descriptive Names: “Production API Failures” not “Rule 1”
- Add Descriptions: Explain what the rule does and why
- Regular Review: Monthly review of alert rules and their effectiveness
- Test Regularly: Quarterly testing of alert delivery
Preventing Alert Fatigue
Section titled “Preventing Alert Fatigue”- Start Conservative: Begin with critical monitors only
- Gradual Expansion: Add more monitors as you tune thresholds
- Use Escalation: Not everything needs immediate PagerDuty alerts
- Monitor Alert Volume: Review alert frequency in dashboard
Effective Alert Content
Section titled “Effective Alert Content”- Clear Subject Lines: Include service name and status
- Actionable Information: Include links to dashboards and runbooks
- Context: Environment, duration, affected users
- Next Steps: What should the recipient do first?
Troubleshooting Alerts
Section titled “Troubleshooting Alerts”Common Issues
Section titled “Common Issues”Alerts Not Being Sent
- Check Rule Status: Ensure alert rule is enabled
- Verify Conditions: Check if conditions match actual monitor state
- Test Channels: Use test feature to verify channel connectivity
- Review Maintenance: Check for active maintenance windows
Too Many Alerts
- Adjust Thresholds: Increase grace periods for flappy monitors
- Enable Grouping: Group similar alerts together
- Review Sensitivity: Check if monitors are too sensitive
- Use Filtering: Add more specific tag filters
Missing Critical Alerts
- Check Coverage: Ensure all critical monitors have alert rules
- Verify Tagging: Confirm monitors have correct tags
- Test Rules: Use simulation to test rule execution
- Review Channels: Ensure notification channels are working
Getting Help
Section titled “Getting Help”If you’re still having issues with alerts:
- Check Status Page: Visit status.9n9s.com for service status
- Contact Support: Use the chat widget or email [email protected]
- Community: Join our Discord community for peer support
- Documentation: Check the Advanced Alerting Reference
CLI Reference
Section titled “CLI Reference”For automation and advanced configurations, you can also manage alerts via CLI:
Next Steps
Section titled “Next Steps”Once your alerts are configured:
- Set up your team: Add team members and configure permissions
- Create status pages: Provide transparency to your users
- Explore integrations: Connect with your existing tools
- Advanced alerting: Learn about complex alert configurations
Effective alerting is the foundation of reliable monitoring. Start with simple rules for your most critical services and gradually expand your alerting strategy as your team’s needs evolve.