Troubleshooting Guide
This guide covers common issues you might encounter when using 9n9s and provides step-by-step solutions to resolve them.
Monitor Issues
Section titled “Monitor Issues”Heartbeat Monitor Problems
Section titled “Heartbeat Monitor Problems”Monitor Shows as Down but Process is Running
Section titled “Monitor Shows as Down but Process is Running”Symptoms:
- Your process/job is running successfully
- 9n9s shows the monitor as “Down”
- No pulses appear in the monitor logs
Common Causes:
- Incorrect pulse URL: Verify you’re using the correct URL from your monitor dashboard
- Network connectivity: Check if your server can reach
pulse.9n9s.com - Firewall restrictions: Ensure outbound HTTPS (port 443) is allowed
- Script errors: Your pulse request might be failing silently
Solutions:
# Test pulse endpoint manuallycurl -v "https://pulse.9n9s.com/your-monitor-id"
# Check network connectivitynslookup pulse.9n9s.comcurl -v https://pulse.9n9s.com/health
# Add error handling to your scriptif ! curl -fsS "https://pulse.9n9s.com/your-monitor-id"; then echo "Failed to send pulse" >&2fiPulses Not Being Received
Section titled “Pulses Not Being Received”Symptoms:
- Network is working
- Curl commands succeed
- Monitor still shows as down
Solutions:
- Check monitor status: Ensure the monitor isn’t paused
- Verify grace period: Your pulse might be within the grace period
- Check schedule: Ensure your pulse timing matches the expected schedule
- Monitor logs: Check the monitor’s pulse history for any received pulses
Runtime Warnings
Section titled “Runtime Warnings”Symptoms:
- Monitor shows “Degraded” status
- Job completed successfully
- Runtime was outside expected bounds
Solutions:
# Adjust expected runtime in monitor settingsexpected_runtime: min: "5m" # Increase if jobs naturally take longer max: "30m" # Adjust based on actual performanceUptime Monitor Problems
Section titled “Uptime Monitor Problems”False Positive Failures
Section titled “False Positive Failures”Symptoms:
- Service is accessible manually
- Monitor reports failures
- Intermittent up/down status
Common Causes:
- Assertion configuration: Check if assertions match actual responses
- Timeout settings: Response time exceeding timeout
- Rate limiting: Your service might be rate-limiting 9n9s checks
- SSL certificate issues: Certificate validation failures
Solutions:
# Test endpoint manually with same parameterscurl -v -H "User-Agent: 9n9s-Monitor" https://your-endpoint.com
# Check SSL certificateecho | openssl s_client -connect your-domain.com:443 -servername your-domain.com
# Verify response contentcurl -s https://your-endpoint.com | jq . # For JSON responsesSSL/TLS Certificate Errors
Section titled “SSL/TLS Certificate Errors”Symptoms:
- Monitor fails with SSL errors
- Certificate is valid when checked manually
Solutions:
- Check certificate chain: Ensure intermediate certificates are installed
- Verify certificate expiration: Update certificates before expiry
- Check certificate validity:
# Check certificate expirationecho | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates
# Check certificate chainopenssl s_client -connect example.com:443 -showcertsNotification Issues
Section titled “Notification Issues”Emails Not Received
Section titled “Emails Not Received”Symptoms:
- Alerts should have triggered
- No emails received
- Other notification channels work
Solutions:
- Check spam/junk folders: 9n9s emails might be filtered
- Verify email address: Ensure no typos in configuration
- Whitelist senders: Add
[email protected]to safe senders - Test email channel:
# Test email delivery via CLI9n9s-cli notification-channel test email-channel-idEmail Server Configuration:
Add to DNS whitelist:
SPF records are configured for 9n9s sending IPsSlack Notifications Not Working
Section titled “Slack Notifications Not Working”Symptoms:
- Slack channel configured
- No messages appearing
- Other integrations work
Common Issues:
- Channel permissions: Bot not added to channel
- Webhook URL expired: Regenerate webhook
- Workspace restrictions: Admin disabled external apps
Solutions:
# Test Slack webhook manuallycurl -X POST \ -H 'Content-type: application/json' \ -d '{"text":"Test from 9n9s"}' \ YOUR_WEBHOOK_URL
# Verify bot permissions in Slack# Check #apps channel for integration messagesPagerDuty Integration Issues
Section titled “PagerDuty Integration Issues”Symptoms:
- Incidents not created
- Integration configured correctly
- API key valid
Solutions:
- Check service configuration: Ensure service accepts Events API v2
- Verify routing key: Must match PagerDuty service integration
- Check escalation policy: Ensure policy is active
# Test PagerDuty integrationcurl -X POST \ -H 'Content-Type: application/json' \ -d '{ "routing_key": "YOUR_ROUTING_KEY", "event_action": "trigger", "payload": { "summary": "Test from 9n9s", "source": "9n9s-test", "severity": "error" } }' \ https://events.pagerduty.com/v2/enqueueAuthentication and Access Issues
Section titled “Authentication and Access Issues”API Key Problems
Section titled “API Key Problems”Symptoms:
- 401 Unauthorized errors
- API key recently created
- CLI authentication fails
Solutions:
- Check key format: Ensure complete key copied
- Verify key permissions: Check scope in organization settings
- Key expiration: Regenerate if expired
# Test API keycurl -H "Authorization: Bearer YOUR_API_KEY" \ https://api.9n9s.com/v1/user
# Verify key permissions9n9s-cli auth whoamiPermission Denied Errors
Section titled “Permission Denied Errors”Symptoms:
- 403 Forbidden responses
- User has access to organization
- Specific resources inaccessible
Solutions:
- Check user role: Verify organization and project permissions
- Resource ownership: Ensure resource exists and user has access
- API key scope: Check if API key has required permissions
Two-Factor Authentication Issues
Section titled “Two-Factor Authentication Issues”Symptoms:
- Cannot log in with correct password
- 2FA code rejected
- Lost authenticator device
Solutions:
- Time synchronization: Ensure device clock is accurate
- Backup codes: Use saved backup codes
- Contact support: For device recovery assistance
Performance Issues
Section titled “Performance Issues”Slow Dashboard Loading
Section titled “Slow Dashboard Loading”Common Causes:
- Large number of monitors: Filter views to reduce load
- Network connectivity: Check internet connection
- Browser issues: Clear cache and cookies
Solutions:
# Clear browser data# Use project filters to reduce monitor count# Check browser console for errorsAPI Rate Limiting
Section titled “API Rate Limiting”Symptoms:
- 429 Too Many Requests errors
- Scripts failing with rate limit errors
- Delayed responses
Solutions:
# Implement exponential backoffimport timeimport requests
def make_request_with_retry(url, headers, max_retries=3): for attempt in range(max_retries): response = requests.get(url, headers=headers) if response.status_code == 429: wait_time = 2 ** attempt time.sleep(wait_time) continue return response raise Exception("Max retries exceeded")Delayed Alert Delivery
Section titled “Delayed Alert Delivery”Symptoms:
- Alerts arrive late
- Monitor status changes not reflected quickly
Common Causes:
- Grace periods: Alerts delayed by configured grace period
- Rate limiting: High volume causing delays
- External service issues: Notification service delays
Solutions:
- Adjust grace periods: Reduce if alerts need to be faster
- Check service status: Verify 9n9s and external service status
- Use multiple channels: Configure backup notification methods
SDK and Integration Issues
Section titled “SDK and Integration Issues”Python SDK Problems
Section titled “Python SDK Problems”Common Issues:
# Import errorsfrom nines import Nines # Correct import
# Environment variable issuesimport osmonitor_id = os.getenv("MONITOR_ID")if not monitor_id: raise ValueError("MONITOR_ID environment variable not set")
# Network timeout issuesnines = Nines(monitor_id, timeout=30) # Increase timeoutNode.js SDK Problems
Section titled “Node.js SDK Problems”Common Issues:
// Async/await usageawait nines.pulse(); // Don't forget await
// Error handlingtry { await nines.pulse();} catch (error) { console.error("Pulse failed:", error);}
// Environment variablesconst monitorId = process.env.MONITOR_ID;if (!monitorId) { throw new Error("MONITOR_ID environment variable required");}CLI Issues
Section titled “CLI Issues”Common Problems:
# Authentication9n9s-cli login # Re-authenticate if expired
# Configurationexport NINES_API_KEY="your-key" # Set environment variable
# Project context9n9s-cli --project proj_123 monitors list # Specify projectGetting Help
Section titled “Getting Help”Before Contacting Support
Section titled “Before Contacting Support”- Check status page: Visit status.9n9s.com
- Review recent changes: Any recent configuration changes?
- Test with simple case: Try with minimal configuration
- Gather logs: Collect relevant error messages and logs
Support Channels
Section titled “Support Channels”- Documentation: Search this documentation for specific topics
- Community Discord: Join for peer support and discussions
- GitHub Issues: Report bugs and request features
- Email Support: [email protected] for direct assistance
Information to Include
Section titled “Information to Include”When contacting support, include:
- Monitor ID: Specific monitor experiencing issues
- Error messages: Exact error text and codes
- Timeline: When did the issue start?
- Environment: Operating system, SDK version, etc.
- Configuration: Relevant configuration (remove sensitive data)
- Steps taken: What troubleshooting you’ve already tried
Emergency Contacts
Section titled “Emergency Contacts”For critical production issues:
- Status page: Check status.9n9s.com for known issues
- Priority support: Available for Enterprise customers
- Community: Discord for urgent community assistance
Prevention and Best Practices
Section titled “Prevention and Best Practices”Monitoring Health Checks
Section titled “Monitoring Health Checks”Set up monitoring for your monitoring:
# Monitor your critical monitors9n9s-cli heartbeat create \ --name "Monitor Health Check" \ --schedule "every 10 minutes" \ --grace 600
# Script to check monitor status#!/bin/bashFAILED_MONITORS=$(9n9s-cli monitors list --status down --output json | jq length)if [ "$FAILED_MONITORS" -gt 0 ]; then echo "Warning: $FAILED_MONITORS monitors are down" exit 1fiRegular Maintenance
Section titled “Regular Maintenance”- Update API keys: Rotate keys quarterly
- Review alert rules: Ensure they’re still relevant
- Test notification channels: Monthly verification
- Update documentation: Keep runbooks current
- Monitor usage: Track against subscription limits
Backup Strategies
Section titled “Backup Strategies”- Multiple notification channels: Don’t rely on single channel
- Export configurations: Backup monitor configurations
- Documentation: Maintain external documentation
- Alternative monitoring: Have fallback monitoring systems
This troubleshooting guide covers the most common issues. For specific problems not covered here, please check our community Discord or contact [email protected].