Troubleshooting Playbook

Start every investigation with the Diagnostics panel and relevant logs. This section catalogs common failure patterns, how to isolate them, and recommended tools.

1. Registration / signaling

SymptomInvestigationResolution
Extension refuses to registerDiagnostics → SIP → Locator registry to confirm bindings and expiryVerify password, SIP port, firewall rules; reset the password or clear stale bindings when required
Trunk status degradedRun Diagnostics → Trunks probes or OPTIONS probeConfirm peer IP and auth mode; enable backup trunks in config/trunks
INVITE has no responseUse sngrep or Diagnostics → Routing Evaluate to confirm rule hitsDouble-check routing matches and ACL permissions

2. Media & quality

  1. One-way / no audio:
    • Inspect NAT/port mappings between server and peers.
    • Ensure rtp_start_port / rtp_end_port ranges are open in config.toml and firewalls.
    • Reproduce via Diagnostics → Web Dialer or a handset, then capture RTP with tcpdump/sngrep to verify return packets.
  2. Noise or jitter:
    • Switch to lower bitrate codecs.
    • Enable the denoise models from fixtures/ or turn on echo cancellation at the endpoint.
    • Check QoS policies and link bandwidth.

3. Routing & billing

  • Routing ineffective: confirm Reload ran and validate config/routes syntax via tomlcheck or CI.
  • Wrong route selected: Diagnostics → Routing Evaluate shows the hit rule/trunk; adjust priority or match filters accordingly.
  • Billing mismatch: export CDRs from Call Records, compare billing templates, and look for no_rate alerts caused by missing prefixes.

4. Console / API

  • Cannot log in: inspect the [console] config and DB connection; make sure browser time is accurate to avoid expired tokens.
  • API returns 500: read logs/console (or stdout) stack traces; most errors stem from missing config or unfinished DB migrations.
  • Diagnostics blank page: typically SIP server is down or the user lacks permission; validate /health reports ok and grant diagnostics access.

5. Performance & stability

  • High CPU: use top/bt to locate hot threads, lower concurrency or scale out, and check for excessive transcoding.
  • Growing memory: see whether large recording buffers are enabled; verify the cleanup plan in callrecord/storage.rs.
  • Crashes / restarts: consult journalctl or container logs—configuration syntax errors or unreachable dependencies (DB/Redis) are common causes.

6. Incident workflow

  1. Gather evidence: screenshots from Diagnostics, log exports, precise timestamps.
  2. Roll back quickly: if caused by configuration, revert config/ in Git and reload.
  3. Validate fix: place test calls and confirm CDRs/alerts return to normal.
  4. Document: record root cause, impact, and remediation steps in the internal wiki for future reference.
TroubleshootingFlow