Skip to content

Heartbeat Monitors

Heartbeat monitors are designed to track processes that can actively signal their status to 9n9s. They work like a “dead man’s switch” - your systems send regular “pulses” to indicate they’re running correctly, and 9n9s alerts you if pulses stop arriving.

  1. Create a Monitor: Configure when pulses are expected (schedule + grace period)
  2. Send Pulses: Your process sends HTTP requests to unique pulse endpoints
  3. Monitor Status: 9n9s tracks timing and status of these signals
  4. Get Alerts: Receive notifications if pulses are late or indicate failure
  • Scheduled Tasks: Cron jobs, Windows Scheduled Tasks, Kubernetes CronJobs
  • Background Workers: Queue processors, message consumers, data pipelines
  • Serverless Functions: AWS Lambda, Google Cloud Functions, Azure Functions
  • Data Processing: ETL jobs, backup scripts, batch processes
  • CI/CD Pipelines: Build processes, deployment scripts, automated testing
  • Send logs, metrics, and error details with each pulse
  • Capture up to 1MB of payload data per pulse
  • Automatic content indexing for searchability
  • JSON payload parsing and metrics extraction
  • Simple intervals: every 5 minutes, hourly, daily
  • Cron expressions: 0 2 * * *, */15 * * * *
  • Timezone support for accurate scheduling
  • Grace periods to handle natural variance
  • Track execution time with start/completion pulses
  • Set expected runtime bounds for performance monitoring
  • Detect jobs that run too fast or too slow
  • Historical runtime trend analysis
  • Secret pulse URLs - no API keys needed
  • High availability pulse ingestion (99.99% uptime SLA)
  • Automatic retry handling in SDKs
  • Rate limiting protection
StateDescription
UpReceiving pulses on schedule
DownMissing expected pulses beyond grace period
LatePulse overdue but within grace period
StartedJob signaled start, awaiting completion
DegradedPulse received but runtime outside expected bounds
PausedMonitoring temporarily disabled
  1. Create Monitor: Use the web interface to create your first heartbeat monitor
  2. Configure Schedule: Set when pulses are expected (e.g., daily or 0 2 * * *)
  3. Set Grace Period: Allow buffer time for natural variance (e.g., 30m)
  4. Get Pulse URL: Copy the unique pulse endpoint for your monitor

Simple Pulse (Shell Script):

your-job.sh
#!/bin/bash
# Run your job
echo "Processing data..."
python3 /path/to/your/script.py
# Signal success
curl -fsS https://pulse.9n9s.com/your-monitor-uuid

With Error Handling:

#!/bin/bash
set -e
PULSE_URL="https://pulse.9n9s.com/your-monitor-uuid"
# Trap errors
trap 'curl -fsS -X POST -d "Job failed at line $LINENO" "$PULSE_URL/fail"' ERR
# Run job
echo "Starting backup..."
/usr/local/bin/backup-database.sh
# Signal success
curl -fsS -X POST -d "Backup completed successfully" "$PULSE_URL"

Python:

from nines import Nines
nines = Nines("your-monitor-uuid")
@nines.time
def daily_backup():
# Your backup logic here
run_backup()
daily_backup()

Node.js:

import { Nines } from "@9n9s/sdk";
const nines = new Nines("your-monitor-uuid");
await nines.time(async () => {
// Your job logic here
await processData();
});

Go:

nines := nines.New("your-monitor-uuid")
err := nines.Time(func() error {
// Your job logic here
return processData()
})

Send rich context with your pulses:

Terminal window
# Send JSON payload
curl -fsS -X POST \
-H "Content-Type: application/json" \
-d '{"records_processed": 1000, "duration_ms": 5432, "errors": 0}' \
https://pulse.9n9s.com/your-monitor-uuid

Track job execution time:

Terminal window
# Signal job start
curl -fsS https://pulse.9n9s.com/your-monitor-uuid/start
# Run your job
run_long_process.sh
# Signal completion (9n9s calculates runtime)
curl -fsS https://pulse.9n9s.com/your-monitor-uuid

Configure keyword scanning for automatic status detection:

  • Success Keywords: ["completed successfully", "finished", "done"]
  • Failure Keywords: ["error", "failed", "exception", "timeout"]

Extract time-series metrics from JSON payloads using JSONPath expressions:

{
"records_processed": 1000,
"processing_time_ms": 5432,
"memory_usage_mb": 256,
"error_count": 0
}

Configure metrics:

  • $.records_processed → Track volume over time
  • $.processing_time_ms → Monitor performance trends
  • $.error_count → Alert on error spikes
  • Use realistic grace periods (account for normal variance)
  • Consider system load and resource availability
  • Set appropriate expected runtime bounds
  • Use tags for organization and alert routing
  • Always include error handling in your scripts
  • Send meaningful error messages with failure pulses
  • Include relevant context (exit codes, stack traces)
  • Use timeouts for external dependencies
  • Minimize pulse payload sizes where possible
  • Use async/background pulse sending to avoid blocking
  • Implement retry logic with exponential backoff
  • Batch pulses for high-frequency operations when appropriate
  • Keep pulse URLs confidential (they’re secret tokens)
  • Use HTTPS for all pulse requests
  • Rotate monitor UUIDs if compromised
  • Don’t log pulse URLs in plain text
import requests
from datetime import datetime
def monitor_etl_stage(stage_name, monitor_uuid):
pulse_url = f"https://pulse.9n9s.com/{monitor_uuid}"
try:
# Signal stage start
requests.get(f"{pulse_url}/start")
# Run ETL stage
result = run_etl_stage(stage_name)
# Signal success with metrics
payload = {
"stage": stage_name,
"records_processed": result.count,
"duration_seconds": result.duration,
"timestamp": datetime.utcnow().isoformat()
}
requests.post(pulse_url, json=payload)
except Exception as e:
# Signal failure
error_payload = {
"stage": stage_name,
"error": str(e),
"timestamp": datetime.utcnow().isoformat()
}
requests.post(f"{pulse_url}/fail", json=error_payload)
raise
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup-job
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup-image:latest
env:
- name: PULSE_URL
valueFrom:
secretKeyRef:
name: monitoring-secrets
key: backup-pulse-url
command:
- /bin/sh
- -c
- |
# Signal start
curl -fsS "${PULSE_URL}/start"
# Run backup
if /usr/local/bin/backup.sh; then
curl -fsS -X POST -d "Backup completed" "${PULSE_URL}"
else
curl -fsS -X POST -d "Backup failed" "${PULSE_URL}/fail"
exit 1
fi
restartPolicy: OnFailure