Troubleshooting Guide

This guide covers common issues you might encounter when using 9n9s and provides step-by-step solutions to resolve them.

Monitor Issues

Heartbeat Monitor Problems

Monitor Shows as Down but Process is Running

Symptoms:

Your process/job is running successfully
9n9s shows the monitor as “Down”
No pulses appear in the monitor logs

Common Causes:

Incorrect pulse URL: Verify you’re using the correct URL from your monitor dashboard
Network connectivity: Check if your server can reach pulse.9n9s.com
Firewall restrictions: Ensure outbound HTTPS (port 443) is allowed
Script errors: Your pulse request might be failing silently

Solutions:

# Test pulse endpoint manually
curl -v "https://pulse.9n9s.com/your-monitor-id"

# Check network connectivity
nslookup pulse.9n9s.com
curl -v https://pulse.9n9s.com/health

# Add error handling to your script
if ! curl -fsS "https://pulse.9n9s.com/your-monitor-id"; then
    echo "Failed to send pulse" >&2
fi

Pulses Not Being Received

Symptoms:

Network is working
Curl commands succeed
Monitor still shows as down

Solutions:

Check monitor status: Ensure the monitor isn’t paused
Verify grace period: Your pulse might be within the grace period
Check schedule: Ensure your pulse timing matches the expected schedule
Monitor logs: Check the monitor’s pulse history for any received pulses

Runtime Warnings

Symptoms:

Monitor shows “Degraded” status
Job completed successfully
Runtime was outside expected bounds

Solutions:

# Adjust expected runtime in monitor settings
expected_runtime:
    min: "5m" # Increase if jobs naturally take longer
    max: "30m" # Adjust based on actual performance

Uptime Monitor Problems

False Positive Failures

Symptoms:

Service is accessible manually
Monitor reports failures
Intermittent up/down status

Common Causes:

Assertion configuration: Check if assertions match actual responses
Timeout settings: Response time exceeding timeout
Rate limiting: Your service might be rate-limiting 9n9s checks
SSL certificate issues: Certificate validation failures

Solutions:

# Test endpoint manually with same parameters
curl -v -H "User-Agent: 9n9s-Monitor" https://your-endpoint.com

# Check SSL certificate
echo | openssl s_client -connect your-domain.com:443 -servername your-domain.com

# Verify response content
curl -s https://your-endpoint.com | jq .  # For JSON responses

SSL/TLS Certificate Errors

Symptoms:

Monitor fails with SSL errors
Certificate is valid when checked manually

Solutions:

Check certificate chain: Ensure intermediate certificates are installed
Verify certificate expiration: Update certificates before expiry
Check certificate validity:

# Check certificate expiration
echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates

# Check certificate chain
openssl s_client -connect example.com:443 -showcerts

Notification Issues

Emails Not Received

Symptoms:

Alerts should have triggered
No emails received
Other notification channels work

Solutions:

Check spam/junk folders: 9n9s emails might be filtered
Verify email address: Ensure no typos in configuration
Whitelist senders: Add [email protected] to safe senders
Test email channel:

# Test email delivery via CLI
9n9s-cli notification-channel test email-channel-id

Email Server Configuration:

Add to DNS whitelist:
- [email protected]
- [email protected]

SPF records are configured for 9n9s sending IPs

Slack Notifications Not Working

Symptoms:

Slack channel configured
No messages appearing
Other integrations work

Common Issues:

Channel permissions: Bot not added to channel
Webhook URL expired: Regenerate webhook
Workspace restrictions: Admin disabled external apps

Solutions:

# Test Slack webhook manually
curl -X POST \
  -H 'Content-type: application/json' \
  -d '{"text":"Test from 9n9s"}' \
  YOUR_WEBHOOK_URL

# Verify bot permissions in Slack
# Check #apps channel for integration messages

PagerDuty Integration Issues

Symptoms:

Incidents not created
Integration configured correctly
API key valid

Solutions:

Check service configuration: Ensure service accepts Events API v2
Verify routing key: Must match PagerDuty service integration
Check escalation policy: Ensure policy is active

# Test PagerDuty integration
curl -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "routing_key": "YOUR_ROUTING_KEY",
    "event_action": "trigger",
    "payload": {
      "summary": "Test from 9n9s",
      "source": "9n9s-test",
      "severity": "error"
    }
  }' \
  https://events.pagerduty.com/v2/enqueue

Authentication and Access Issues

API Key Problems

Symptoms:

401 Unauthorized errors
API key recently created
CLI authentication fails

Solutions:

Check key format: Ensure complete key copied
Verify key permissions: Check scope in organization settings
Key expiration: Regenerate if expired

# Test API key
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.9n9s.com/v1/user

# Verify key permissions
9n9s-cli auth whoami

Permission Denied Errors

Symptoms:

403 Forbidden responses
User has access to organization
Specific resources inaccessible

Solutions:

Check user role: Verify organization and project permissions
Resource ownership: Ensure resource exists and user has access
API key scope: Check if API key has required permissions

Two-Factor Authentication Issues

Symptoms:

Cannot log in with correct password
2FA code rejected
Lost authenticator device

Solutions:

Time synchronization: Ensure device clock is accurate
Backup codes: Use saved backup codes
Contact support: For device recovery assistance

Performance Issues

Slow Dashboard Loading

Common Causes:

Large number of monitors: Filter views to reduce load
Network connectivity: Check internet connection
Browser issues: Clear cache and cookies

Solutions:

# Clear browser data
# Use project filters to reduce monitor count
# Check browser console for errors

API Rate Limiting

Symptoms:

429 Too Many Requests errors
Scripts failing with rate limit errors
Delayed responses

Solutions:

# Implement exponential backoff
import time
import requests

def make_request_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        if response.status_code == 429:
            wait_time = 2 ** attempt
            time.sleep(wait_time)
            continue
        return response
    raise Exception("Max retries exceeded")

Delayed Alert Delivery

Symptoms:

Alerts arrive late
Monitor status changes not reflected quickly

Common Causes:

Grace periods: Alerts delayed by configured grace period
Rate limiting: High volume causing delays
External service issues: Notification service delays

Solutions:

Adjust grace periods: Reduce if alerts need to be faster
Check service status: Verify 9n9s and external service status
Use multiple channels: Configure backup notification methods

SDK and Integration Issues

Python SDK Problems

Common Issues:

# Import errors
from nines import Nines  # Correct import

# Environment variable issues
import os
monitor_id = os.getenv("MONITOR_ID")
if not monitor_id:
    raise ValueError("MONITOR_ID environment variable not set")

# Network timeout issues
nines = Nines(monitor_id, timeout=30)  # Increase timeout

Node.js SDK Problems

Common Issues:

// Async/await usage
await nines.pulse(); // Don't forget await

// Error handling
try {
    await nines.pulse();
} catch (error) {
    console.error("Pulse failed:", error);
}

// Environment variables
const monitorId = process.env.MONITOR_ID;
if (!monitorId) {
    throw new Error("MONITOR_ID environment variable required");
}

CLI Issues

Common Problems:

# Authentication
9n9s-cli login  # Re-authenticate if expired

# Configuration
export NINES_API_KEY="your-key"  # Set environment variable

# Project context
9n9s-cli --project proj_123 monitors list  # Specify project

Getting Help

Before Contacting Support

Check status page: Visit status.9n9s.com
Review recent changes: Any recent configuration changes?
Test with simple case: Try with minimal configuration
Gather logs: Collect relevant error messages and logs

Support Channels

Documentation: Search this documentation for specific topics
Community Discord: Join for peer support and discussions
GitHub Issues: Report bugs and request features
Email Support: [email protected] for direct assistance

Information to Include

When contacting support, include:

Monitor ID: Specific monitor experiencing issues
Error messages: Exact error text and codes
Timeline: When did the issue start?
Environment: Operating system, SDK version, etc.
Configuration: Relevant configuration (remove sensitive data)
Steps taken: What troubleshooting you’ve already tried

Emergency Contacts

For critical production issues:

Status page: Check status.9n9s.com for known issues
Priority support: Available for Enterprise customers
Community: Discord for urgent community assistance

Prevention and Best Practices

Monitoring Health Checks

Set up monitoring for your monitoring:

# Monitor your critical monitors
9n9s-cli heartbeat create \
  --name "Monitor Health Check" \
  --schedule "every 10 minutes" \
  --grace 600

# Script to check monitor status
#!/bin/bash
FAILED_MONITORS=$(9n9s-cli monitors list --status down --output json | jq length)
if [ "$FAILED_MONITORS" -gt 0 ]; then
    echo "Warning: $FAILED_MONITORS monitors are down"
    exit 1
fi

Regular Maintenance

Update API keys: Rotate keys quarterly
Review alert rules: Ensure they’re still relevant
Test notification channels: Monthly verification
Update documentation: Keep runbooks current
Monitor usage: Track against subscription limits

Backup Strategies

Multiple notification channels: Don’t rely on single channel
Export configurations: Backup monitor configurations
Documentation: Maintain external documentation
Alternative monitoring: Have fallback monitoring systems

This troubleshooting guide covers the most common issues. For specific problems not covered here, please check our community Discord or contact [email protected].