Use this page as a triage hub, not as one giant checklist.
If you already know the failure area, jump straight to the matching guide:
- Node does not connect or finish enrollment
- Gateway looks unhealthy or keeps dropping heartbeats
- Policy changed but traffic path did not
- DNS records changed but lookups stay stale
- Authentication or browser session keeps failing
- Build or CI hits
too many open files - GRE-specific runtime problems
Start with the outcome you see
Node does not connect or finish enrollment
Choose Troubleshooting node enrollment when:
- the node never appears online,
- onboarding completes but the runtime never becomes healthy,
- generated config or key material might be stale.
Gateway looks unhealthy or keeps dropping heartbeats
Choose Troubleshooting gateway health when:
- the gateway looks offline,
- heartbeat timestamps stop moving,
- runtime prerequisites on the host may be broken.
Policy changed but traffic path did not
Choose Troubleshooting policy and routing when:
- access policy changed but traffic still follows the old path,
- runtime status says
pendingorerror, - route ownership or device attachments might be stale.
DNS records changed but lookups stay stale
Choose Troubleshooting DNS runtime when:
- DNS changes are visible in the UI but not in lookups,
- CoreDNS runtime files might be missing or old,
- the gateway DNS runtime reports an error.
Authentication or browser session keeps failing
Choose Troubleshooting auth and session failures when:
- login works for some users but not others,
- cookies or reverse-proxy headers look wrong,
- browser auth keeps looping or expiring unexpectedly.
Build or CI hits too many open files
Choose Troubleshooting open file limits when:
next build, Playwright, Docker builds, or CI fail withENFILEorEMFILE,- the issue is clearly host-level file descriptor exhaustion instead of Nanami policy/runtime state.
Fast public triage order
- Pick the smallest guide that matches the visible outcome.
- Run only the checks listed for that guide.
- If the guide points back to deployment or security posture, fix that first.
- Escalate only after you capture the guide-specific evidence.
Capture the management-service logs, the affected gateway logs, screenshots of the relevant Nanami page, and the exact guide you already followed before escalating.