Heartbeat Monitors
Heartbeat monitors are designed to track processes that can actively signal their status to 9n9s. They work like a “dead man’s switch” - your systems send regular “pulses” to indicate they’re running correctly, and 9n9s alerts you if pulses stop arriving.
How Heartbeat Monitors Work
Section titled “How Heartbeat Monitors Work”- Create a Monitor: Configure when pulses are expected (schedule + grace period)
- Send Pulses: Your process sends HTTP requests to unique pulse endpoints
- Monitor Status: 9n9s tracks timing and status of these signals
- Get Alerts: Receive notifications if pulses are late or indicate failure
Perfect For
Section titled “Perfect For”- Scheduled Tasks: Cron jobs, Windows Scheduled Tasks, Kubernetes CronJobs
- Background Workers: Queue processors, message consumers, data pipelines
- Serverless Functions: AWS Lambda, Google Cloud Functions, Azure Functions
- Data Processing: ETL jobs, backup scripts, batch processes
- CI/CD Pipelines: Build processes, deployment scripts, automated testing
Key Benefits
Section titled “Key Benefits”Rich Context Capture
Section titled “Rich Context Capture”- Send logs, metrics, and error details with each pulse
- Capture up to 1MB of payload data per pulse
- Automatic content indexing for searchability
- JSON payload parsing and metrics extraction
Flexible Scheduling
Section titled “Flexible Scheduling”- Simple intervals:
every 5 minutes,hourly,daily - Cron expressions:
0 2 * * *,*/15 * * * * - Timezone support for accurate scheduling
- Grace periods to handle natural variance
Runtime Tracking
Section titled “Runtime Tracking”- Track execution time with start/completion pulses
- Set expected runtime bounds for performance monitoring
- Detect jobs that run too fast or too slow
- Historical runtime trend analysis
Security & Reliability
Section titled “Security & Reliability”- Secret pulse URLs - no API keys needed
- High availability pulse ingestion (99.99% uptime SLA)
- Automatic retry handling in SDKs
- Rate limiting protection
Monitor States
Section titled “Monitor States”| State | Description |
|---|---|
| Up | Receiving pulses on schedule |
| Down | Missing expected pulses beyond grace period |
| Late | Pulse overdue but within grace period |
| Started | Job signaled start, awaiting completion |
| Degraded | Pulse received but runtime outside expected bounds |
| Paused | Monitoring temporarily disabled |
Getting Started
Section titled “Getting Started”Quick Setup
Section titled “Quick Setup”- Create Monitor: Use the web interface to create your first heartbeat monitor
- Configure Schedule: Set when pulses are expected (e.g.,
dailyor0 2 * * *) - Set Grace Period: Allow buffer time for natural variance (e.g.,
30m) - Get Pulse URL: Copy the unique pulse endpoint for your monitor
Basic Integration
Section titled “Basic Integration”Simple Pulse (Shell Script):
#!/bin/bash# Run your jobecho "Processing data..."python3 /path/to/your/script.py
# Signal successcurl -fsS https://pulse.9n9s.com/your-monitor-uuidWith Error Handling:
#!/bin/bashset -e
PULSE_URL="https://pulse.9n9s.com/your-monitor-uuid"
# Trap errorstrap 'curl -fsS -X POST -d "Job failed at line $LINENO" "$PULSE_URL/fail"' ERR
# Run jobecho "Starting backup..."/usr/local/bin/backup-database.sh
# Signal successcurl -fsS -X POST -d "Backup completed successfully" "$PULSE_URL"SDK Integration
Section titled “SDK Integration”Python:
from nines import Nines
nines = Nines("your-monitor-uuid")
@nines.timedef daily_backup(): # Your backup logic here run_backup()
daily_backup()Node.js:
import { Nines } from "@9n9s/sdk";
const nines = new Nines("your-monitor-uuid");
await nines.time(async () => { // Your job logic here await processData();});Go:
nines := nines.New("your-monitor-uuid")
err := nines.Time(func() error { // Your job logic here return processData()})Advanced Features
Section titled “Advanced Features”Payload Data Capture
Section titled “Payload Data Capture”Send rich context with your pulses:
# Send JSON payloadcurl -fsS -X POST \ -H "Content-Type: application/json" \ -d '{"records_processed": 1000, "duration_ms": 5432, "errors": 0}' \ https://pulse.9n9s.com/your-monitor-uuidRuntime Tracking
Section titled “Runtime Tracking”Track job execution time:
# Signal job startcurl -fsS https://pulse.9n9s.com/your-monitor-uuid/start
# Run your jobrun_long_process.sh
# Signal completion (9n9s calculates runtime)curl -fsS https://pulse.9n9s.com/your-monitor-uuidKeywords and Auto-Detection
Section titled “Keywords and Auto-Detection”Configure keyword scanning for automatic status detection:
- Success Keywords:
["completed successfully", "finished", "done"] - Failure Keywords:
["error", "failed", "exception", "timeout"]
Metrics Extraction
Section titled “Metrics Extraction”Extract time-series metrics from JSON payloads using JSONPath expressions:
{ "records_processed": 1000, "processing_time_ms": 5432, "memory_usage_mb": 256, "error_count": 0}Configure metrics:
$.records_processed→ Track volume over time$.processing_time_ms→ Monitor performance trends$.error_count→ Alert on error spikes
Best Practices
Section titled “Best Practices”Scheduling
Section titled “Scheduling”- Use realistic grace periods (account for normal variance)
- Consider system load and resource availability
- Set appropriate expected runtime bounds
- Use tags for organization and alert routing
Error Handling
Section titled “Error Handling”- Always include error handling in your scripts
- Send meaningful error messages with failure pulses
- Include relevant context (exit codes, stack traces)
- Use timeouts for external dependencies
Performance
Section titled “Performance”- Minimize pulse payload sizes where possible
- Use async/background pulse sending to avoid blocking
- Implement retry logic with exponential backoff
- Batch pulses for high-frequency operations when appropriate
Security
Section titled “Security”- Keep pulse URLs confidential (they’re secret tokens)
- Use HTTPS for all pulse requests
- Rotate monitor UUIDs if compromised
- Don’t log pulse URLs in plain text
Common Patterns
Section titled “Common Patterns”ETL Pipeline Monitoring
Section titled “ETL Pipeline Monitoring”import requestsfrom datetime import datetime
def monitor_etl_stage(stage_name, monitor_uuid): pulse_url = f"https://pulse.9n9s.com/{monitor_uuid}"
try: # Signal stage start requests.get(f"{pulse_url}/start")
# Run ETL stage result = run_etl_stage(stage_name)
# Signal success with metrics payload = { "stage": stage_name, "records_processed": result.count, "duration_seconds": result.duration, "timestamp": datetime.utcnow().isoformat() } requests.post(pulse_url, json=payload)
except Exception as e: # Signal failure error_payload = { "stage": stage_name, "error": str(e), "timestamp": datetime.utcnow().isoformat() } requests.post(f"{pulse_url}/fail", json=error_payload) raiseKubernetes CronJob
Section titled “Kubernetes CronJob”apiVersion: batch/v1kind: CronJobmetadata: name: backup-jobspec: schedule: "0 2 * * *" jobTemplate: spec: template: spec: containers: - name: backup image: backup-image:latest env: - name: PULSE_URL valueFrom: secretKeyRef: name: monitoring-secrets key: backup-pulse-url command: - /bin/sh - -c - | # Signal start curl -fsS "${PULSE_URL}/start"
# Run backup if /usr/local/bin/backup.sh; then curl -fsS -X POST -d "Backup completed" "${PULSE_URL}" else curl -fsS -X POST -d "Backup failed" "${PULSE_URL}/fail" exit 1 fi restartPolicy: OnFailureNext Steps
Section titled “Next Steps”- Create your first heartbeat monitor
- Set up alerts for when monitors fail
- Use SDKs for easier integration
- Configure advanced features like payload scanning
- See real-world examples of heartbeat monitoring