Skip to content

Troubleshooting Guide

This guide covers common issues you might encounter when using 9n9s and provides step-by-step solutions to resolve them.

Monitor Shows as Down but Process is Running

Section titled “Monitor Shows as Down but Process is Running”

Symptoms:

  • Your process/job is running successfully
  • 9n9s shows the monitor as “Down”
  • No pulses appear in the monitor logs

Common Causes:

  1. Incorrect pulse URL: Verify you’re using the correct URL from your monitor dashboard
  2. Network connectivity: Check if your server can reach pulse.9n9s.com
  3. Firewall restrictions: Ensure outbound HTTPS (port 443) is allowed
  4. Script errors: Your pulse request might be failing silently

Solutions:

Terminal window
# Test pulse endpoint manually
curl -v "https://pulse.9n9s.com/your-monitor-id"
# Check network connectivity
nslookup pulse.9n9s.com
curl -v https://pulse.9n9s.com/health
# Add error handling to your script
if ! curl -fsS "https://pulse.9n9s.com/your-monitor-id"; then
echo "Failed to send pulse" >&2
fi

Symptoms:

  • Network is working
  • Curl commands succeed
  • Monitor still shows as down

Solutions:

  1. Check monitor status: Ensure the monitor isn’t paused
  2. Verify grace period: Your pulse might be within the grace period
  3. Check schedule: Ensure your pulse timing matches the expected schedule
  4. Monitor logs: Check the monitor’s pulse history for any received pulses

Symptoms:

  • Monitor shows “Degraded” status
  • Job completed successfully
  • Runtime was outside expected bounds

Solutions:

# Adjust expected runtime in monitor settings
expected_runtime:
min: "5m" # Increase if jobs naturally take longer
max: "30m" # Adjust based on actual performance

Symptoms:

  • Service is accessible manually
  • Monitor reports failures
  • Intermittent up/down status

Common Causes:

  1. Assertion configuration: Check if assertions match actual responses
  2. Timeout settings: Response time exceeding timeout
  3. Rate limiting: Your service might be rate-limiting 9n9s checks
  4. SSL certificate issues: Certificate validation failures

Solutions:

Terminal window
# Test endpoint manually with same parameters
curl -v -H "User-Agent: 9n9s-Monitor" https://your-endpoint.com
# Check SSL certificate
echo | openssl s_client -connect your-domain.com:443 -servername your-domain.com
# Verify response content
curl -s https://your-endpoint.com | jq . # For JSON responses

Symptoms:

  • Monitor fails with SSL errors
  • Certificate is valid when checked manually

Solutions:

  1. Check certificate chain: Ensure intermediate certificates are installed
  2. Verify certificate expiration: Update certificates before expiry
  3. Check certificate validity:
Terminal window
# Check certificate expiration
echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates
# Check certificate chain
openssl s_client -connect example.com:443 -showcerts

Symptoms:

  • Alerts should have triggered
  • No emails received
  • Other notification channels work

Solutions:

  1. Check spam/junk folders: 9n9s emails might be filtered
  2. Verify email address: Ensure no typos in configuration
  3. Whitelist senders: Add [email protected] to safe senders
  4. Test email channel:
Terminal window
# Test email delivery via CLI
9n9s-cli notification-channel test email-channel-id

Email Server Configuration:

Add to DNS whitelist:
SPF records are configured for 9n9s sending IPs

Symptoms:

  • Slack channel configured
  • No messages appearing
  • Other integrations work

Common Issues:

  1. Channel permissions: Bot not added to channel
  2. Webhook URL expired: Regenerate webhook
  3. Workspace restrictions: Admin disabled external apps

Solutions:

Terminal window
# Test Slack webhook manually
curl -X POST \
-H 'Content-type: application/json' \
-d '{"text":"Test from 9n9s"}' \
YOUR_WEBHOOK_URL
# Verify bot permissions in Slack
# Check #apps channel for integration messages

Symptoms:

  • Incidents not created
  • Integration configured correctly
  • API key valid

Solutions:

  1. Check service configuration: Ensure service accepts Events API v2
  2. Verify routing key: Must match PagerDuty service integration
  3. Check escalation policy: Ensure policy is active
Terminal window
# Test PagerDuty integration
curl -X POST \
-H 'Content-Type: application/json' \
-d '{
"routing_key": "YOUR_ROUTING_KEY",
"event_action": "trigger",
"payload": {
"summary": "Test from 9n9s",
"source": "9n9s-test",
"severity": "error"
}
}' \
https://events.pagerduty.com/v2/enqueue

Symptoms:

  • 401 Unauthorized errors
  • API key recently created
  • CLI authentication fails

Solutions:

  1. Check key format: Ensure complete key copied
  2. Verify key permissions: Check scope in organization settings
  3. Key expiration: Regenerate if expired
Terminal window
# Test API key
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.9n9s.com/v1/user
# Verify key permissions
9n9s-cli auth whoami

Symptoms:

  • 403 Forbidden responses
  • User has access to organization
  • Specific resources inaccessible

Solutions:

  1. Check user role: Verify organization and project permissions
  2. Resource ownership: Ensure resource exists and user has access
  3. API key scope: Check if API key has required permissions

Symptoms:

  • Cannot log in with correct password
  • 2FA code rejected
  • Lost authenticator device

Solutions:

  1. Time synchronization: Ensure device clock is accurate
  2. Backup codes: Use saved backup codes
  3. Contact support: For device recovery assistance

Common Causes:

  1. Large number of monitors: Filter views to reduce load
  2. Network connectivity: Check internet connection
  3. Browser issues: Clear cache and cookies

Solutions:

Terminal window
# Clear browser data
# Use project filters to reduce monitor count
# Check browser console for errors

Symptoms:

  • 429 Too Many Requests errors
  • Scripts failing with rate limit errors
  • Delayed responses

Solutions:

# Implement exponential backoff
import time
import requests
def make_request_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 429:
wait_time = 2 ** attempt
time.sleep(wait_time)
continue
return response
raise Exception("Max retries exceeded")

Symptoms:

  • Alerts arrive late
  • Monitor status changes not reflected quickly

Common Causes:

  1. Grace periods: Alerts delayed by configured grace period
  2. Rate limiting: High volume causing delays
  3. External service issues: Notification service delays

Solutions:

  1. Adjust grace periods: Reduce if alerts need to be faster
  2. Check service status: Verify 9n9s and external service status
  3. Use multiple channels: Configure backup notification methods

Common Issues:

# Import errors
from nines import Nines # Correct import
# Environment variable issues
import os
monitor_id = os.getenv("MONITOR_ID")
if not monitor_id:
raise ValueError("MONITOR_ID environment variable not set")
# Network timeout issues
nines = Nines(monitor_id, timeout=30) # Increase timeout

Common Issues:

// Async/await usage
await nines.pulse(); // Don't forget await
// Error handling
try {
await nines.pulse();
} catch (error) {
console.error("Pulse failed:", error);
}
// Environment variables
const monitorId = process.env.MONITOR_ID;
if (!monitorId) {
throw new Error("MONITOR_ID environment variable required");
}

Common Problems:

Terminal window
# Authentication
9n9s-cli login # Re-authenticate if expired
# Configuration
export NINES_API_KEY="your-key" # Set environment variable
# Project context
9n9s-cli --project proj_123 monitors list # Specify project
  1. Check status page: Visit status.9n9s.com
  2. Review recent changes: Any recent configuration changes?
  3. Test with simple case: Try with minimal configuration
  4. Gather logs: Collect relevant error messages and logs
  1. Documentation: Search this documentation for specific topics
  2. Community Discord: Join for peer support and discussions
  3. GitHub Issues: Report bugs and request features
  4. Email Support: [email protected] for direct assistance

When contacting support, include:

  • Monitor ID: Specific monitor experiencing issues
  • Error messages: Exact error text and codes
  • Timeline: When did the issue start?
  • Environment: Operating system, SDK version, etc.
  • Configuration: Relevant configuration (remove sensitive data)
  • Steps taken: What troubleshooting you’ve already tried

For critical production issues:

  • Status page: Check status.9n9s.com for known issues
  • Priority support: Available for Enterprise customers
  • Community: Discord for urgent community assistance

Set up monitoring for your monitoring:

Terminal window
# Monitor your critical monitors
9n9s-cli heartbeat create \
--name "Monitor Health Check" \
--schedule "every 10 minutes" \
--grace 600
# Script to check monitor status
#!/bin/bash
FAILED_MONITORS=$(9n9s-cli monitors list --status down --output json | jq length)
if [ "$FAILED_MONITORS" -gt 0 ]; then
echo "Warning: $FAILED_MONITORS monitors are down"
exit 1
fi
  1. Update API keys: Rotate keys quarterly
  2. Review alert rules: Ensure they’re still relevant
  3. Test notification channels: Monthly verification
  4. Update documentation: Keep runbooks current
  5. Monitor usage: Track against subscription limits
  1. Multiple notification channels: Don’t rely on single channel
  2. Export configurations: Backup monitor configurations
  3. Documentation: Maintain external documentation
  4. Alternative monitoring: Have fallback monitoring systems

This troubleshooting guide covers the most common issues. For specific problems not covered here, please check our community Discord or contact [email protected].