Creating Your First Monitor
Creating your first monitor in 9n9s is designed to be quick and straightforward. This guide will walk you through the entire process, from choosing the right monitor type to receiving your first successful pulse or check result.
Before You Begin
Section titled “Before You Begin”Prerequisites:
- ✅ 9n9s account created and verified
- ✅ Organization and first project set up
- ✅ Basic understanding of your monitoring needs
Choose Your Monitor Type:
- Heartbeat Monitor: For processes that can actively signal (cron jobs, scripts, applications)
- Uptime Monitor: For checking external endpoints (websites, APIs, services)
If you’re unsure which type to use, check our Monitor Types Guide for detailed comparisons.
Creating a Heartbeat Monitor
Section titled “Creating a Heartbeat Monitor”Heartbeat monitors are perfect for monitoring scheduled tasks, background jobs, and any process that can send an HTTP request.
Step 1: Create the Monitor
Section titled “Step 1: Create the Monitor”Via Web Interface:
-
Navigate to Your Project
- Log into your 9n9s dashboard
- Select your organization and project
- Click “Add Monitor” or “Create Monitor”
-
Choose Monitor Type
- Select “Heartbeat Monitor”
- This will open the heartbeat configuration form
-
Basic Configuration
Monitor Name: "Daily Database Backup"Description: "Monitors the daily PostgreSQL backup job" -
Schedule Settings
Schedule Type: Cron ExpressionCron Expression: 0 2 * * * (Daily at 2 AM)Timezone: UTC (or your preferred timezone)Grace Period: 30 minutes -
Optional Advanced Settings
Expected Runtime:- Minimum: 5 minutes- Maximum: 60 minutesTags:- environment: production- service: database- criticality: high -
Create Monitor
- Click “Create Monitor”
- Note the unique Monitor ID (e.g.,
mon_abc123xyz) - Copy the pulse endpoint URL
Via CLI:
# Create a basic heartbeat monitor9n9s-cli heartbeat create \ --name "Daily Database Backup" \ --description "Monitors the daily PostgreSQL backup job" \ --schedule "0 2 * * *" \ --grace-period 1800 \ --timezone "UTC" \ --project-id "proj_your_project_id"
# Create with tags and expected runtime9n9s-cli heartbeat create \ --name "Daily Database Backup" \ --schedule "0 2 * * *" \ --grace-period 1800 \ --expected-runtime-min 300 \ --expected-runtime-max 3600 \ --tags "environment:production,service:database,criticality:high"Step 2: Integrate with Your Process
Section titled “Step 2: Integrate with Your Process”Now you need to modify your process to send signals to 9n9s. Here are examples for common scenarios:
Shell Script Example:
#!/bin/bash# backup.sh - PostgreSQL backup script
# Your monitor pulse endpointPULSE_URL="https://pulse.9n9s.com/mon_abc123xyz"
# Signal that backup is startingcurl -fsS "$PULSE_URL/start"
# Perform the backupecho "Starting database backup..."if pg_dump -h localhost -U backup_user -d production_db -f "/backups/backup_$(date +%Y%m%d_%H%M%S).sql"; then # Backup succeeded echo "Backup completed successfully"
# Send success signal with details BACKUP_SIZE=$(du -h /backups/backup_*.sql | tail -1 | cut -f1) curl -fsS -X POST \ -H "Content-Type: application/json" \ -d "{\"message\": \"Backup completed successfully\", \"backup_size\": \"$BACKUP_SIZE\"}" \ "$PULSE_URL"else # Backup failed echo "Backup failed!"
# Send failure signal curl -fsS -X POST \ -H "Content-Type: application/json" \ -d "{\"error\": \"pg_dump failed with exit code $?\", \"timestamp\": \"$(date -Iseconds)\"}" \ "$PULSE_URL/fail"
exit 1fiPython Script Example:
#!/usr/bin/env python3# backup.py - Database backup with monitoring
import subprocessimport requestsimport jsonimport datetimeimport sys
# Your monitor pulse endpointPULSE_URL = "https://pulse.9n9s.com/mon_abc123xyz"
def send_pulse(action=None, data=None): """Send a pulse to 9n9s""" url = PULSE_URL if action: url += f"/{action}"
try: if data: response = requests.post(url, json=data, timeout=10) else: response = requests.get(url, timeout=10) response.raise_for_status() print(f"✅ Pulse sent successfully: {action or 'success'}") except requests.RequestException as e: print(f"⚠️ Failed to send pulse: {e}")
def main(): # Signal start send_pulse("start")
try: # Perform backup timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") backup_file = f"/backups/backup_{timestamp}.sql"
print("Starting database backup...") result = subprocess.run([ "pg_dump", "-h", "localhost", "-U", "backup_user", "-d", "production_db", "-f", backup_file ], check=True, capture_output=True, text=True)
# Calculate backup size import os backup_size_mb = os.path.getsize(backup_file) / (1024 * 1024)
# Send success pulse with details send_pulse(data={ "message": "Backup completed successfully", "backup_file": backup_file, "backup_size_mb": round(backup_size_mb, 2), "duration_seconds": 300 # You could measure actual duration })
print(f"✅ Backup completed: {backup_file} ({backup_size_mb:.2f} MB)")
except subprocess.CalledProcessError as e: # Send failure pulse send_pulse("fail", { "error": f"pg_dump failed with exit code {e.returncode}", "stderr": e.stderr, "timestamp": datetime.datetime.now().isoformat() }) print(f"❌ Backup failed: {e}") sys.exit(1) except Exception as e: # Send failure pulse for unexpected errors send_pulse("fail", { "error": f"Unexpected error: {str(e)}", "error_type": type(e).__name__, "timestamp": datetime.datetime.now().isoformat() }) print(f"❌ Unexpected error: {e}") sys.exit(1)
if __name__ == "__main__": main()Cron Job Setup:
# Add to your crontab (crontab -e)# Run backup daily at 2 AM0 2 * * * /path/to/backup.sh >> /var/log/backup.log 2>&1Step 3: Test Your Monitor
Section titled “Step 3: Test Your Monitor”Manual Test:
# Test the pulse endpoint manuallycurl -fsS "https://pulse.9n9s.com/mon_abc123xyz"
# Test with payload datacurl -fsS -X POST \ -H "Content-Type: application/json" \ -d '{"test": true, "message": "Manual test pulse"}' \ "https://pulse.9n9s.com/mon_abc123xyz"
# Test failure scenariocurl -fsS -X POST \ -H "Content-Type: application/json" \ -d '{"test": true, "error": "Simulated failure"}' \ "https://pulse.9n9s.com/mon_abc123xyz/fail"Check Monitor Status:
# View monitor status via CLI9n9s-cli heartbeat get mon_abc123xyz
# View recent pulse logs9n9s-cli heartbeat logs mon_abc123xyz --limit 10Creating an Uptime Monitor
Section titled “Creating an Uptime Monitor”Uptime monitors continuously check external endpoints to ensure they’re responding correctly.
Step 1: Create the Monitor
Section titled “Step 1: Create the Monitor”Via Web Interface:
-
Navigate to Your Project
- Select “Add Monitor” → “Uptime Monitor”
-
Basic Configuration
Monitor Name: "API Health Check"Description: "Monitors the main API endpoint"URL: https://api.example.com/health -
Check Configuration
HTTP Method: GETCheck Frequency: Every 1 minuteTimeout: 10 seconds -
Assertions (Success Criteria)
Status Code: 200Response Time: Less than 2000msResponse Body Contains: "status": "healthy" -
Optional Settings
Custom Headers:- Authorization: Bearer your-api-token- User-Agent: 9n9s-Monitor/1.0Tags:- service: api- environment: production- criticality: high
Via CLI:
# Create a basic uptime monitor9n9s-cli uptime create \ --name "API Health Check" \ --url "https://api.example.com/health" \ --frequency 60 \ --timeout 10 \ --project-id "proj_your_project_id"
# Create with assertions and headers9n9s-cli uptime create \ --name "API Health Check" \ --url "https://api.example.com/health" \ --frequency 60 \ --headers "Authorization:Bearer token123" \ --assert-status 200 \ --assert-response-time 2000 \ --assert-content '"status":"healthy"' \ --tags "service:api,environment:production"Step 2: Configure Advanced Assertions
Section titled “Step 2: Configure Advanced Assertions”For more complex validation, you can configure multiple assertions:
Status Code Validation:
assertions: - type: STATUS_CODE operator: EQUALS value: 200Response Time Validation:
assertions: - type: RESPONSE_TIME operator: LESS_THAN value: 2000 # millisecondsContent Validation:
assertions: - type: RESPONSE_BODY operator: CONTAINS value: '"status":"healthy"'
- type: JSON_CONTENT path: "$.status" operator: EQUALS value: "healthy"Header Validation:
assertions: - type: RESPONSE_HEADER header: "Content-Type" operator: CONTAINS value: "application/json"Step 3: Test Your Monitor
Section titled “Step 3: Test Your Monitor”Manual Check:
# Trigger a manual check9n9s-cli uptime check mon_xyz456abc
# View check results9n9s-cli uptime get mon_xyz456abc --include-recent-checks
# View check history9n9s-cli uptime logs mon_xyz456abc --limit 5Verify Endpoint Manually:
# Test the endpoint yourselfcurl -v -H "Authorization: Bearer your-token" \ https://api.example.com/health
# Time the responsetime curl -s https://api.example.com/health > /dev/nullSetting Up Your First Alert
Section titled “Setting Up Your First Alert”Once your monitor is created, you’ll want to be notified when issues occur.
Step 1: Configure Notification Channel
Section titled “Step 1: Configure Notification Channel”Email Notification:
# Add email notification channel9n9s-cli notification-channel create \ --type email \ --name "Operations Team" \Slack Notification:
# Add Slack webhook notification9n9s-cli notification-channel create \ --type slack \ --name "Alerts Channel" \ --config '{"webhook_url": "https://hooks.slack.com/services/..."}'Step 2: Create Alert Rule
Section titled “Step 2: Create Alert Rule”Basic Alert Rule:
# Create alert rule for your monitor9n9s-cli alert-rule create \ --name "Database Backup Alerts" \ --project-id "proj_your_project_id" \ --conditions "monitor_ids:mon_abc123xyz,status_changes:DOWN" \ --actions "channel_id:chan_email123"
# Alert rule with multiple conditions9n9s-cli alert-rule create \ --name "Critical Service Alerts" \ --conditions "tags:criticality=high,status_changes:DOWN,DEGRADED" \ --actions "channel_id:chan_slack456,delay:0"Advanced Alert Rule Configuration:
alert_rules: - name: "Production Service Alerts" conditions: monitor_tags: environment: production criticality: [high, critical] status_changes: [UP_TO_DOWN, UP_TO_DEGRADED] duration_threshold: 300 # 5 minutes actions: - channel_id: "chan_pagerduty" immediate: true - channel_id: "chan_slack" delay: 0 - channel_id: "chan_email" delay: 300 # 5 minutesStep 3: Test Alerting
Section titled “Step 3: Test Alerting”Simulate Monitor Failure:
For heartbeat monitors:
# Don't send a pulse within the grace period, or send a failure pulsecurl -fsS -X POST \ -d '{"error": "Test failure", "test": true}' \ "https://pulse.9n9s.com/mon_abc123xyz/fail"For uptime monitors:
- Temporarily change the URL to a non-existent endpoint
- Modify assertions to ensure they fail
- Use a test endpoint that returns different status codes
Verify Alert Delivery:
- Check your email/Slack/other notification channels
- Verify the alert contains relevant information
- Confirm recovery notifications work when the monitor returns to “Up”
Monitor Dashboard and Insights
Section titled “Monitor Dashboard and Insights”Viewing Monitor Status
Section titled “Viewing Monitor Status”Web Dashboard:
- Navigate to your project dashboard
- View real-time monitor status
- Check recent pulse/check history
- Review performance trends
CLI Commands:
# List all monitors in project9n9s-cli monitors list --project-id "proj_your_project_id"
# Get detailed monitor information9n9s-cli heartbeat get mon_abc123xyz9n9s-cli uptime get mon_xyz456abc
# View recent activity9n9s-cli heartbeat logs mon_abc123xyz --limit 209n9s-cli uptime logs mon_xyz456abc --limit 20Understanding Monitor Metrics
Section titled “Understanding Monitor Metrics”Heartbeat Monitor Metrics:
- Last Pulse: When the last pulse was received
- Status: Current monitor state (Up, Down, Late, etc.)
- Runtime: How long the last job took to complete
- Success Rate: Percentage of successful pulses
- Average Runtime: Typical job execution time
Uptime Monitor Metrics:
- Response Time: How quickly the endpoint responds
- Uptime Percentage: Availability over time
- Check Frequency: How often checks are performed
- Assertion Results: Which validations pass/fail
Troubleshooting Common Issues
Section titled “Troubleshooting Common Issues”Heartbeat Monitor Issues
Section titled “Heartbeat Monitor Issues”“Monitor shows as Down but job is running”:
- Check if pulse URL is correct
- Verify network connectivity from job server
- Ensure job is actually sending pulses
- Check grace period settings
“Pulses not being received”:
# Test pulse endpoint manuallycurl -v "https://pulse.9n9s.com/mon_abc123xyz"
# Check for network issuesnslookup pulse.9n9s.comcurl -v https://pulse.9n9s.com/health“Job running too long/short warnings”:
- Review expected runtime settings
- Analyze actual job performance
- Adjust expected runtime bounds if needed
Uptime Monitor Issues
Section titled “Uptime Monitor Issues”“False positive failures”:
- Review assertion configuration
- Check if endpoint behavior changed
- Verify timeout settings are appropriate
- Test endpoint manually with same parameters
“SSL/TLS certificate errors”:
# Check certificate validityopenssl s_client -connect api.example.com:443 -servername api.example.com
# Check certificate expirationecho | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -datesGeneral Issues
Section titled “General Issues”“No alerts received”:
- Verify notification channels are configured correctly
- Check alert rule conditions match your monitor
- Test notification channels independently
- Review alert rule logs
“Too many alerts”:
- Adjust grace periods and thresholds
- Use alert grouping and deduplication
- Consider rate limiting for notifications
- Review monitor sensitivity settings
Next Steps
Section titled “Next Steps”Congratulations! You’ve successfully created your first monitor. Here’s what to do next:
Expand Your Monitoring
Section titled “Expand Your Monitoring”- Add More Monitors: Create monitors for other critical services
- Organize with Tags: Use consistent tagging for easy management
- Create Projects: Separate monitors by environment or service
- Monitor Dependencies: Add monitors for services your application depends on
Improve Your Setup
Section titled “Improve Your Setup”- Customize Alerts: Fine-tune alert rules and notification channels
- Add Team Members: Invite your team and configure appropriate permissions
- Set Up Status Pages: Create public status pages for transparency
- Use Infrastructure as Code: Manage monitors with Terraform or YAML
Learn Advanced Features
Section titled “Learn Advanced Features”- Payload Analysis: Learn to extract metrics from pulse payloads
- Webhook Integration: Set up custom webhooks for automated responses
- API Usage: Automate monitor management with the REST API
- Advanced Scheduling: Use complex cron expressions and timezones
Resources
Section titled “Resources”- Monitor Types Guide: Detailed comparison of monitor types
- Alerting Guide: Advanced alerting configurations
- Team Setup: Adding team members and permissions
- API Documentation: Complete API reference
You’re now ready to start monitoring your critical services with confidence. Remember that good monitoring is an iterative process - start simple and gradually add more sophisticated monitoring as you learn what works best for your systems.