Skip to content

Monitor Types Overview

9n9s provides two complementary types of monitors to cover all your monitoring needs: Heartbeat Monitors for push-based monitoring and Uptime Monitors for pull-based monitoring. This page explains the differences, use cases, and how to choose the right approach for your specific needs.

Heartbeat monitors work like a “dead man’s switch” - your systems actively signal to 9n9s that they’re running correctly. If 9n9s doesn’t receive a signal within the expected timeframe, it triggers an alert.

How it works:

  1. You configure a schedule (when signals are expected)
  2. Your process sends HTTP requests to unique pulse endpoints
  3. 9n9s tracks the timing and status of these signals
  4. Alerts trigger if signals are late or indicate failure

Perfect for:

  • Scheduled tasks and cron jobs
  • Background workers and queue processors
  • ETL and data processing pipelines
  • Serverless function executions
  • Deployment and CI/CD processes
  • Any process that can make HTTP requests

Key Benefits:

  • Rich Context: Send logs, metrics, and error details with pulses
  • Runtime Tracking: Monitor execution time and performance
  • Flexible Scheduling: Support for cron expressions and intervals
  • Low Overhead: Minimal impact on your applications
  • Secure: No credentials needed, uses secret URLs

Uptime monitors proactively check your services from 9n9s infrastructure by making requests to your endpoints and validating the responses.

How it works:

  1. You configure endpoints and assertions
  2. 9n9s regularly makes requests from our infrastructure
  3. Responses are validated against your assertions
  4. Alerts trigger if checks fail or assertions don’t pass

Perfect for:

  • Website availability monitoring
  • API health checks
  • Service endpoint monitoring
  • SSL certificate expiration tracking
  • Response time monitoring
  • Third-party service integration checks

Key Benefits:

  • External Perspective: Monitor from outside your infrastructure
  • Comprehensive Assertions: Validate status codes, content, performance
  • Multi-Region Checks: Monitor from multiple geographic locations
  • SSL Monitoring: Automatic certificate expiration tracking
  • No Code Changes: Monitor existing endpoints without modification

You Control the Process

  • You can modify the code or script to send signals
  • You want detailed context about execution
  • You need to track runtime performance
  • The process runs on a schedule or regular interval

Examples:

Terminal window
# Cron job with heartbeat monitoring
0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://pulse.9n9s.com/backup-job
# Background worker with heartbeat
def process_queue():
nines.start()
try:
process_jobs()
nines.pulse(f"Processed {job_count} jobs")
except Exception as e:
nines.fail(str(e))

You Want Rich Context

  • Send logs and error messages with failures
  • Track custom metrics and performance data
  • Monitor complex multi-step processes
  • Debug issues with detailed payload information

You Want External Validation

  • Monitor services from an outside perspective
  • Validate end-to-end functionality
  • Check public-facing endpoints
  • Monitor services you don’t control

Examples:

# Website monitoring
- name: "Homepage Availability"
url: "https://example.com"
assertions:
- status_code: 200
- response_time: < 2000ms
- content_contains: "Welcome"

You Can’t Modify the Service

  • Third-party APIs and services
  • Legacy applications without modification access
  • Vendor-provided endpoints
  • Services owned by other teams

You Need Comprehensive Validation

  • Check response status, content, and performance
  • Validate API responses and data structure
  • Monitor SSL certificate expiration
  • Test complex user flows and transactions

Many services benefit from using both monitor types together for comprehensive coverage:

Heartbeat Monitors for:

  • Background job processing
  • Database backup scripts
  • Cache warming tasks
  • Log rotation processes

Uptime Monitors for:

  • Homepage availability
  • API endpoint health
  • User authentication flow
  • Payment processing endpoints

Heartbeat Monitors for:

  • Inter-service health checks
  • Message queue consumers
  • Scheduled data synchronization
  • Circuit breaker recovery

Uptime Monitors for:

  • Public API gateways
  • Load balancer health
  • Database connection pools
  • External service dependencies
heartbeat_monitor:
name: "Daily ETL Process"
schedule: "0 3 * * *" # 3 AM daily
grace_period: "2h" # Allow 2 hours to complete
timezone: "America/New_York"
expected_runtime:
min: "30m" # Should take at least 30 minutes
max: "90m" # Should complete within 90 minutes
tags:
environment: production
team: data
criticality: high

Key Configuration Options:

  • Schedule: Cron expression or simple interval
  • Grace Period: Buffer time before marking as down
  • Expected Runtime: Min/max execution time bounds
  • Timezone: For cron schedule interpretation
  • Tags: Metadata for organization and alerting
uptime_monitor:
name: "API Health Check"
url: "https://api.example.com/health"
method: "GET"
frequency: "1m" # Check every minute
timeout: "10s" # 10 second timeout
headers:
Authorization: "Bearer {{env.API_TOKEN}}"
assertions:
- type: "status_code"
operator: "equals"
value: 200
- type: "response_time"
operator: "less_than"
value: 1000 # Less than 1 second
- type: "json_content"
path: "$.status"
operator: "equals"
value: "healthy"
tags:
service: api
environment: production
criticality: high

Key Configuration Options:

  • URL & Method: Target endpoint and HTTP method
  • Frequency: How often to run checks
  • Timeout: Maximum time to wait for response
  • Headers: Custom HTTP headers (supports variables)
  • Assertions: Validation rules for success/failure
StateDescriptionTriggers
NewMonitor created, no pulses receivedInitial state
UpReceiving pulses on scheduleSuccessful pulse within schedule + grace
DownMissing expected pulsesNo pulse within schedule + grace period
LatePulse overdue but within gracePulse overdue but grace period not expired
StartedJob signaled start, awaiting completion/start pulse received
DegradedPulse received but runtime abnormalRuntime outside expected bounds
PausedMonitoring temporarily disabledManual user action
StateDescriptionTriggers
NewMonitor created, no checks runInitial state
UpAll assertions passingSuccessful check with all assertions met
DownOne or more assertions failingFailed assertion or unreachable endpoint
PausedChecking temporarily disabledManual user action

Payload Analysis

  • Capture up to 1MB of logs or JSON data per pulse
  • Automatic content indexing for searchability
  • Keyword scanning for automated status determination
  • Metrics extraction from JSON payloads

Runtime Tracking

  • Measure execution time between /start and completion pulses
  • Alert on jobs that run too fast or too slow
  • Track performance trends over time
  • Identify performance regressions

Context Capture

  • Automatic capture of source IP and user agent
  • Support for custom metadata and tags
  • Correlation IDs for distributed tracing
  • Environment and deployment information

Multi-Region Checking

  • Run checks from multiple geographic locations
  • Compare performance across regions
  • Detect regional outages and issues
  • Validate global CDN performance

Advanced Assertions

  • JSONPath expressions for complex data validation
  • Regular expressions for content matching
  • Response header validation
  • Certificate expiration monitoring

Authentication Support

  • Basic HTTP authentication
  • Bearer token authentication
  • Custom header-based authentication
  • OAuth and API key support

Use Descriptive Names

  • Include service and environment in monitor names
  • Use consistent naming conventions across your team
  • Include the monitored action or endpoint

Tag Effectively

  • Tag by environment (production, staging, development)
  • Tag by team or service owner
  • Tag by criticality level
  • Use tags for alert routing and filtering

Project Structure

  • Group related monitors in projects
  • Separate projects by environment or service
  • Use projects to manage team access and permissions

Set Appropriate Thresholds

  • Configure grace periods based on normal variance
  • Set realistic expected runtimes
  • Use different alert channels for different severities

Prevent Alert Fatigue

  • Use escalation rules for sustained issues
  • Configure maintenance windows for planned work
  • Group related alerts to reduce noise

Test Your Monitoring

  • Regularly test alert delivery
  • Verify monitor sensitivity to real issues
  • Practice incident response procedures

Minimize Monitor Overhead

  • Batch heartbeat signals when possible
  • Use appropriate check frequencies for uptime monitors
  • Optimize payload sizes for critical path operations

Scale Monitoring with Growth

  • Plan for increased monitor volume
  • Use tags and projects for organization at scale
  • Automate monitor creation and management

Integrate monitoring into your deployment pipeline:

# GitHub Actions example
- name: Deploy and Monitor
run: |
# Deploy application
kubectl apply -f deployment.yaml
kubectl rollout status deployment/app
# Update monitoring
9n9s-cli heartbeat update $MONITOR_ID \
--tags "version=${{ github.sha }},deployed_by=${{ github.actor }}"
# Test deployment
9n9s-cli uptime create-temporary \
--url "https://staging.example.com/health" \
--duration "10m" \
--assertions "status_code=200"

Define monitors alongside your infrastructure:

# Terraform example
resource "nines_heartbeat_monitor" "backup" {
name = "Database Backup - ${var.environment}"
project_id = nines_project.main.id
schedule = "0 2 * * *"
tags = {
environment = var.environment
service = "database"
terraform = "true"
}
}
resource "nines_uptime_monitor" "api" {
name = "API Health - ${var.environment}"
url = "https://${var.api_domain}/health"
assertions = [
{
type = "status_code"
operator = "equals"
value = "200"
}
]
}

Connect monitoring with your observability stack:

# OpenTelemetry integration
from opentelemetry import trace
from nines import Nines
tracer = trace.get_tracer(__name__)
nines = Nines("monitor-id")
@tracer.start_as_current_span("background_job")
def background_job():
span = trace.get_current_span()
# Add trace ID to heartbeat context
trace_id = span.get_span_context().trace_id
nines.start(metadata={"trace_id": hex(trace_id)})
try:
# Job logic here
result = process_data()
# Success with metrics
nines.pulse({
"trace_id": hex(trace_id),
"records_processed": result.count,
"processing_time_ms": result.duration
})
except Exception as e:
# Failure with context
nines.fail({
"trace_id": hex(trace_id),
"error": str(e),
"error_type": type(e).__name__
})
raise

Understanding the different monitor types and their appropriate use cases is crucial for building effective monitoring. By combining heartbeat and uptime monitors strategically, you can achieve comprehensive visibility into your systems while minimizing overhead and alert fatigue.