Core Concepts

9n9s is built around several core concepts that work together to provide comprehensive monitoring for your systems and processes. Understanding these concepts will help you make the most of the platform.

Organizations

An Organization is the top-level container for all your resources in 9n9s. When you sign up, you create your first organization. Organizations serve several purposes:

Billing & Subscriptions: Each organization has its own subscription plan and billing
Team Management: Invite team members and manage their access
Global Settings: Configure organization-wide notification channels and security policies
Resource Isolation: Keep different teams or environments completely separate

Organization Features

Custom Branding: Upload logos and customize the appearance of status pages
SSO Integration: Connect with SAML/OIDC providers for enterprise authentication
Audit Logging: Track all changes and access across the organization
API Keys: Generate organization-scoped API keys with configurable permissions

Projects

Projects are containers within an organization that group related monitors together. Think of projects as “folders” for organizing your monitoring based on:

Applications: Separate projects for different apps or services
Environments: Different projects for production, staging, development
Teams: Project per team or department
Infrastructure Components: Projects for databases, APIs, background services

Project Benefits

Organized Monitoring: Keep related monitors together for easier management
Scoped Alerting: Configure different alert rules and channels per project
Team Permissions: Grant team members specific access to individual projects
Logical Grouping: Filter and search monitors by project across the platform

Monitors

Monitors are the core entities that track the health and availability of your systems. 9n9s supports two primary types of monitors:

Heartbeat Monitors (Push-Based)

Heartbeat monitors work like a “dead man’s switch” - your systems actively signal to 9n9s that they’re running correctly. If 9n9s doesn’t receive a signal within the expected timeframe, it triggers an alert.

Perfect for:

Cron jobs and scheduled tasks
Background workers and queue consumers
ETL and data processing pipelines
Deployment scripts and automation
Serverless functions
Batch processes

Uptime Monitors (Pull-Based)

Uptime monitors proactively check your services from 9n9s infrastructure by making requests to your endpoints and validating the responses.

Perfect for:

Website availability monitoring
API health checks
Service endpoint monitoring
SSL certificate monitoring
Response time tracking

Monitor States

Monitors can be in one of several states that indicate their current health:

Heartbeat Monitor States

New: Monitor created but hasn’t received any pulses yet
Up: Recently received a successful pulse within the expected schedule
Down: Failed to receive a pulse within schedule + grace period
Late: Pulse is overdue but still within grace period
Started: Job has signaled it started but hasn’t completed yet
Degraded: Received pulse but runtime was outside expected bounds
Paused: Monitoring temporarily disabled by user

Uptime Monitor States

New: Monitor created but hasn’t run any checks yet
Up: Latest check passed all configured assertions
Down: Latest check failed one or more assertions
Paused: Checking temporarily disabled by user

Schedules and Timing

Heartbeat Schedules

Heartbeat monitors use schedules to define when pulses are expected:

Interval Schedules: Simple time-based intervals

every 5 minutes (300 seconds)
hourly (3600 seconds)
daily (86400 seconds)

Cron Expressions: Precise scheduling using standard cron syntax

*/15 * * * * (every 15 minutes)
0 9 * * 1-5 (weekdays at 9 AM)
0 2 * * 0 (Sundays at 2 AM)

Grace Periods

Grace periods provide buffer time after the expected pulse time before marking a monitor as down. This accounts for:

Natural variance in job execution times
System load and resource contention
Network delays and temporary issues

Expected Runtime Tracking

For processes where execution time matters, you can define expected minimum and maximum runtimes. If a job completes outside these bounds, the monitor enters a “Degraded” state, indicating potential issues even if the job succeeded.

Pulse Endpoints and Signaling

Heartbeat monitors use unique, secret URLs for receiving signals:

Basic Signaling

GET/POST /{uuid} - Signals successful completion
GET/POST /{uuid}/start - Signals job start (enables runtime tracking)
GET/POST /{uuid}/fail - Signals job failure
GET/POST /{uuid}/{exit_code} - Signals with specific exit code

Payload and Context

POST requests can include payloads up to 1MB containing:

Log Output: Capture stdout/stderr for debugging
Error Messages: Detailed failure information
Metrics Data: Custom metrics and performance data
Contextual Information: Environment details, parameters, etc.

Alerting and Notifications

Notification Channels

Configure how and where you receive alerts:

Email - Direct email notifications
SMS - Text message alerts via Twilio
Slack - Messages to channels or direct messages
Discord - Notifications to Discord channels
PagerDuty - Incident creation and escalation
Webhooks - Custom HTTP callbacks
Many More - Teams, Telegram, Pushover, etc.

Alert Rules

Define when and how alerts are triggered:

Trigger Conditions: Which monitor states trigger alerts
Filtering: Route alerts based on tags, projects, or other criteria
Timing: Immediate alerts vs. waiting for sustained issues
Recovery Notifications: Get notified when issues resolve

Smart Alert Management

Deduplication: Prevent spam from flapping monitors
Grouping: Consolidate related failures into single notifications
Escalation: Route to different channels based on severity or duration
Maintenance Windows: Temporarily suppress alerts during planned work

Tags and Organization

Tags are key-value pairs that help organize and filter monitors:

Common Tagging Strategies

By Environment:

environment: production
environment: staging

By Criticality:

criticality: high
criticality: medium
criticality: low

By Component:

component: database
component: api
component: frontend

By Team:

team: backend
team: infrastructure
team: data

Tag Benefits

Filtering: Quickly find monitors by tags in the UI
Alert Routing: Send alerts for specific tagged monitors to different channels
Bulk Operations: Apply changes to all monitors with certain tags
Reporting: Generate reports and analytics grouped by tags

Context and Payload Analysis

Log Capture and Analysis

9n9s automatically captures and indexes payload data from your heartbeat pulses:

Full Text Search: Search through logs and output across all monitors
Structured Data: JSON payloads are parsed and made searchable
Historical Retention: Logs retained based on your subscription plan
Export Capabilities: Download logs for external analysis

Payload Content Scanning

Configure 9n9s to scan payload content for specific patterns:

Success Keywords: Automatically mark pulses as successful based on content
Failure Keywords: Detect failures from error messages in logs
Custom Patterns: Use regex patterns for complex matching rules

Metrics Extraction

Extract time-series metrics from JSON payloads:

{
    "files_processed": 1247,
    "processing_time_ms": 5432,
    "memory_usage_mb": 256,
    "error_count": 0
}

Define JSONPath expressions to extract values and track them as metrics over time.

Team Collaboration

Role-Based Access Control (RBAC)

Organization Roles:

Owner: Full control including billing and organization deletion
Admin: All permissions except billing and deletion
Member: Standard access to assigned projects
Viewer: Read-only access across the organization

Project Roles:

Admin: Full control over project and its monitors
Editor: Create, modify, and delete monitors and alert rules
Viewer: Read-only access to project monitors and logs

Team Features

Member Invitations: Invite team members via email
Permission Management: Fine-grained control over access
Activity Tracking: See who made changes and when
Collaborative Workflows: Share monitors and alert configurations

API and Automation

REST API

Complete programmatic access to all platform features:

Monitor Management: Create, update, delete monitors
Organization Control: Manage projects, teams, settings
Data Access: Query monitor status, logs, and metrics
Alert Configuration: Automate notification setup

SDK Libraries

Official libraries for popular languages:

Python: pip install 9n9s
Node.js: npm install @9n9s/sdk
Go: go get github.com/9n9s-io/9n9s-go
More Coming: Java, .NET, PHP, Ruby

CLI Tool

Command-line interface for automation and CI/CD:

Monitor Management: Create and configure monitors
Pulse Sending: Signal from shell scripts and automation
Configuration Sync: Apply Infrastructure as Code definitions
Status Querying: Check monitor status in scripts

Configuration as Code

Infrastructure as Code Integration

Terraform Provider:

resource "nines_heartbeat_monitor" "backup_job" {
  name         = "Daily Backup"
  project_id   = nines_project.infrastructure.id
  schedule     = "0 2 * * *"
  grace_period = "1h"
}

Pulumi Support:

const backupMonitor = new nines.HeartbeatMonitor("backup", {
    name: "Daily Backup",
    projectId: infraProject.id,
    schedule: "0 2 * * *",
    gracePeriod: "1h",
});

YAML Definitions:

projects:
    infrastructure:
        heartbeats:
            - name: "Daily Backup"
              schedule: "0 2 * * *"
              grace_period: "1h"
              tags:
                  environment: production
                  criticality: high

GitOps Workflows

Version Control: Store monitor configurations in Git
Code Review: Review monitoring changes like code changes
Automated Deployment: Apply changes via CI/CD pipelines
Rollback Capability: Easily revert configuration changes

Understanding these core concepts provides the foundation for effectively using 9n9s to monitor your critical systems and processes. Each concept builds on the others to create a comprehensive, flexible monitoring platform that scales with your needs.