Heartbeat Monitors

Heartbeat monitors are designed to track processes that can actively signal their status to 9n9s. They work like a “dead man’s switch” - your systems send regular “pulses” to indicate they’re running correctly, and 9n9s alerts you if pulses stop arriving.

How Heartbeat Monitors Work

Create a Monitor: Configure when pulses are expected (schedule + grace period)
Send Pulses: Your process sends HTTP requests to unique pulse endpoints
Monitor Status: 9n9s tracks timing and status of these signals
Get Alerts: Receive notifications if pulses are late or indicate failure

Perfect For

Scheduled Tasks: Cron jobs, Windows Scheduled Tasks, Kubernetes CronJobs
Background Workers: Queue processors, message consumers, data pipelines
Serverless Functions: AWS Lambda, Google Cloud Functions, Azure Functions
Data Processing: ETL jobs, backup scripts, batch processes
CI/CD Pipelines: Build processes, deployment scripts, automated testing

Key Benefits

Rich Context Capture

Send logs, metrics, and error details with each pulse
Capture up to 1MB of payload data per pulse
Automatic content indexing for searchability
JSON payload parsing and metrics extraction

Flexible Scheduling

Simple intervals: every 5 minutes, hourly, daily
Cron expressions: 0 2 * * *, */15 * * * *
Timezone support for accurate scheduling
Grace periods to handle natural variance

Runtime Tracking

Track execution time with start/completion pulses
Set expected runtime bounds for performance monitoring
Detect jobs that run too fast or too slow
Historical runtime trend analysis

Security & Reliability

Secret pulse URLs - no API keys needed
High availability pulse ingestion (99.99% uptime SLA)
Automatic retry handling in SDKs
Rate limiting protection

Monitor States

State	Description
Up	Receiving pulses on schedule
Down	Missing expected pulses beyond grace period
Late	Pulse overdue but within grace period
Started	Job signaled start, awaiting completion
Degraded	Pulse received but runtime outside expected bounds
Paused	Monitoring temporarily disabled

Getting Started

Quick Setup

Create Monitor: Use the web interface to create your first heartbeat monitor
Configure Schedule: Set when pulses are expected (e.g., daily or 0 2 * * *)
Set Grace Period: Allow buffer time for natural variance (e.g., 30m)
Get Pulse URL: Copy the unique pulse endpoint for your monitor

Basic Integration

Simple Pulse (Shell Script):

#!/bin/bash
# Run your job
echo "Processing data..."
python3 /path/to/your/script.py

# Signal success
curl -fsS https://pulse.9n9s.com/your-monitor-uuid

With Error Handling:

#!/bin/bash
set -e

PULSE_URL="https://pulse.9n9s.com/your-monitor-uuid"

# Trap errors
trap 'curl -fsS -X POST -d "Job failed at line $LINENO" "$PULSE_URL/fail"' ERR

# Run job
echo "Starting backup..."
/usr/local/bin/backup-database.sh

# Signal success
curl -fsS -X POST -d "Backup completed successfully" "$PULSE_URL"

SDK Integration

Python:

from nines import Nines

nines = Nines("your-monitor-uuid")

@nines.time
def daily_backup():
    # Your backup logic here
    run_backup()

daily_backup()

Node.js:

import { Nines } from "@9n9s/sdk";

const nines = new Nines("your-monitor-uuid");

await nines.time(async () => {
    // Your job logic here
    await processData();
});

Go:

nines := nines.New("your-monitor-uuid")

err := nines.Time(func() error {
    // Your job logic here
    return processData()
})

Advanced Features

Payload Data Capture

Send rich context with your pulses:

# Send JSON payload
curl -fsS -X POST \
  -H "Content-Type: application/json" \
  -d '{"records_processed": 1000, "duration_ms": 5432, "errors": 0}' \
  https://pulse.9n9s.com/your-monitor-uuid

Runtime Tracking

Track job execution time:

# Signal job start
curl -fsS https://pulse.9n9s.com/your-monitor-uuid/start

# Run your job
run_long_process.sh

# Signal completion (9n9s calculates runtime)
curl -fsS https://pulse.9n9s.com/your-monitor-uuid

Keywords and Auto-Detection

Configure keyword scanning for automatic status detection:

Success Keywords: ["completed successfully", "finished", "done"]
Failure Keywords: ["error", "failed", "exception", "timeout"]

Metrics Extraction

Extract time-series metrics from JSON payloads using JSONPath expressions:

{
    "records_processed": 1000,
    "processing_time_ms": 5432,
    "memory_usage_mb": 256,
    "error_count": 0
}

Configure metrics:

$.records_processed → Track volume over time
$.processing_time_ms → Monitor performance trends
$.error_count → Alert on error spikes

Best Practices

Scheduling

Use realistic grace periods (account for normal variance)
Consider system load and resource availability
Set appropriate expected runtime bounds
Use tags for organization and alert routing

Error Handling

Always include error handling in your scripts
Send meaningful error messages with failure pulses
Include relevant context (exit codes, stack traces)
Use timeouts for external dependencies

Performance

Minimize pulse payload sizes where possible
Use async/background pulse sending to avoid blocking
Implement retry logic with exponential backoff
Batch pulses for high-frequency operations when appropriate

Security

Keep pulse URLs confidential (they’re secret tokens)
Use HTTPS for all pulse requests
Rotate monitor UUIDs if compromised
Don’t log pulse URLs in plain text

Common Patterns

ETL Pipeline Monitoring

import requests
from datetime import datetime

def monitor_etl_stage(stage_name, monitor_uuid):
    pulse_url = f"https://pulse.9n9s.com/{monitor_uuid}"

    try:
        # Signal stage start
        requests.get(f"{pulse_url}/start")

        # Run ETL stage
        result = run_etl_stage(stage_name)

        # Signal success with metrics
        payload = {
            "stage": stage_name,
            "records_processed": result.count,
            "duration_seconds": result.duration,
            "timestamp": datetime.utcnow().isoformat()
        }
        requests.post(pulse_url, json=payload)

    except Exception as e:
        # Signal failure
        error_payload = {
            "stage": stage_name,
            "error": str(e),
            "timestamp": datetime.utcnow().isoformat()
        }
        requests.post(f"{pulse_url}/fail", json=error_payload)
        raise

Kubernetes CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
    name: backup-job
spec:
    schedule: "0 2 * * *"
    jobTemplate:
        spec:
            template:
                spec:
                    containers:
                        - name: backup
                          image: backup-image:latest
                          env:
                              - name: PULSE_URL
                                valueFrom:
                                    secretKeyRef:
                                        name: monitoring-secrets
                                        key: backup-pulse-url
                          command:
                              - /bin/sh
                              - -c
                              - |
                                  # Signal start
                                  curl -fsS "${PULSE_URL}/start"

                                  # Run backup
                                  if /usr/local/bin/backup.sh; then
                                    curl -fsS -X POST -d "Backup completed" "${PULSE_URL}"
                                  else
                                    curl -fsS -X POST -d "Backup failed" "${PULSE_URL}/fail"
                                    exit 1
                                  fi
                    restartPolicy: OnFailure

Next Steps

Create your first heartbeat monitor
Set up alerts for when monitors fail
Use SDKs for easier integration
Configure advanced features like payload scanning
See real-world examples of heartbeat monitoring