Skip to content

Troubleshooting

Common issues and their solutions for both systems.

Boot Issues

Laptop: Secure Boot Failure

Symptom: System won't boot after NixOS rebuild; UEFI reports Secure Boot violation.

Cause: Lanzaboote failed to sign a boot component, or Secure Boot keys were not enrolled.

Fix:

bash
# 1. Disable Secure Boot in BIOS temporarily
# 2. Boot into NixOS
# 3. Check signing status
sbctl verify

# 4. If unsigned entries exist, rebuild
sudo nixos-rebuild switch --flake ~/nixos-config/laptop#laptop

# 5. Re-enable Secure Boot in BIOS

If keys need re-enrollment:

bash
sudo sbctl enroll-keys --microsoft

Server: Stuck at LUKS Prompt

Symptom: Server is unreachable after reboot; it's waiting for the LUKS passphrase in the initrd.

Fix: Connect via SSH to the initrd and unlock:

bash
ssh -p 22 root@192.168.1.20
# In the initrd shell:
cryptsetup-askpass
# Enter LUKS passphrase

Network Requirements

The initrd SSH server requires the r8169 network driver (loaded via availableKernelModules) and DHCP via udhcpc. Ensure the server's Ethernet port is connected and the router provides DHCP.

Server: Boot Loop (Panic)

Symptom: Server reboots repeatedly; never becomes reachable.

Cause: A critical boot service failed, triggering boot.panic_on_failpanic=1 (auto-reboot after 1 second).

Fix:

  1. Connect a monitor and keyboard (or serial console)
  2. At the systemd-boot menu, select a previous generation
  3. Once booted, check logs: journalctl -b -1 -p err
  4. Fix the issue and rebuild

If no previous generations work, boot from a live USB:

bash
# Mount the BTRFS volume
cryptsetup open /dev/nvme0n1p2 crypted
mount /dev/mapper/crypted /mnt -o subvol=nix

# Chroot and debug
nixos-enter --root /mnt

Impermanence Issues

Missing Files After Reboot

Symptom: A file or directory you created is gone after reboot.

Cause: The root filesystem is wiped on every boot. Only paths under /persist survive.

Fix: Add the path to the impermanence configuration in server/modules/impermanence.nix:

nix
environment.persistence."/persist" = {
  directories = [
    "/your/directory"
  ];
  files = [
    "/your/file"
  ];
};

Then rebuild:

bash
sudo nixos-rebuild switch --flake /home/nixos/nixos-homelab#homelab

Container Data Missing

Symptom: A container lost its data after reboot.

Cause: Container data is stored under /var/lib/nixos-containers/<name>/. This path must be persisted.

Verify: Check that the container's data directory exists under /persist:

bash
ls -la /persist/var/lib/nixos-containers/

The impermanence module persists /var/lib/nixos-containers as a directory, so all container state should survive. If a specific container's data is missing, check if it stores data outside this path.

Container Networking

Container Can't Reach the Internet

Symptom: A container can't download packages or connect to external services.

Check NAT configuration:

bash
# Verify NAT is active
sudo iptables -t nat -L -n

# Check if the container's veth interface exists
ip link show | grep ve-

# Check container's IP
machinectl shell <container> /run/current-system/sw/bin/ip addr

Common causes:

  1. NAT not configured for the interface: Ensure networking.nat.internalInterfaces = [ "ve-+" ] is set
  2. DNS not resolving: The container should use the host as its DNS server or have its own DNS config
  3. Firewall blocking: Check iptables -L -n for relevant FORWARD rules

Container Can't Reach Host Services

Symptom: A container can't connect to services on the host (e.g., AdGuard on port 53).

Fix: Containers use their hostAddress to reach the host. Verify the addressing:

bash
# Inside the container
ping <hostAddress>  # e.g., 192.168.100.10

If pinging fails, check that the container's network config matches:

nix
containers.<name> = {
  hostAddress = "192.168.100.10";
  localAddress = "192.168.100.11";
};

Service Unreachable via Domain Name

Symptom: https://service.nemnix.site doesn't work from the LAN.

Causes:

  1. DNS not pointing to server: Ensure your router uses 192.168.1.20 as DNS, or configure AdGuard DNS rewrites
  2. AdGuard DNS rewrite missing: Check AdGuard Home UI for a rewrite rule mapping service.nemnix.site192.168.1.20
  3. Traefik route missing: Check Traefik logs: journalctl -M traefik -u traefik
  4. TLS certificate not issued: Check Traefik ACME logs for DNS-01 challenge failures

Secrets Issues

agenix Decryption Failure

Symptom: nixos-rebuild switch fails with an agenix error about unable to decrypt secrets.

Cause: The SSH host key doesn't match the public keys in secrets.nix.

Fix:

bash
# Check the host key fingerprint
ssh-keygen -lf /persist/etc/ssh/ssh_host_ed25519_key.pub

# Compare with the key in secrets.nix
cat server/secrets/secrets.nix

If they don't match, you need to either:

  1. Restore the correct host key from backup
  2. Or re-key all secrets with the new key:
bash
# Add the new public key to secrets.nix
# Then re-encrypt all secrets
cd server/secrets
agenix -r

Secret File Not Found at Runtime

Symptom: A service fails to start, logging that its secret file doesn't exist.

Check:

bash
# List decrypted secrets
ls -la /run/agenix/

# Check the service's expected path
systemctl show <service> | grep -i secret

Common causes:

  1. Secret not defined in the module: Ensure age.secrets.<name> is defined
  2. Identity path wrong: Verify age.identityPaths points to /persist/etc/ssh/ssh_host_ed25519_key
  3. File permission issue: Check that the secret's owner matches the service's User

Backup Issues

Restic Backup Failing

bash
# Check the timer and last run
systemctl status restic-backups-backup.timer
journalctl -u restic-backups-backup.service --since today

# Test manually
sudo restic -r /backup --password-file /run/agenix/restic_password snapshots

Common causes:

  1. Backup disk full: Check df -h /backup
  2. Password file missing: Check ls -la /run/agenix/restic_password
  3. Repository corruption: Run restic -r /backup --password-file /run/agenix/restic_password check

Restic Repository Repair

bash
# Check for errors
restic -r /backup --password-file /run/agenix/restic_password check

# If pack files are damaged
restic -r /backup --password-file /run/agenix/restic_password check --read-data

# Repair index
restic -r /backup --password-file /run/agenix/restic_password repair index

# Repair snapshots
restic -r /backup --password-file /run/agenix/restic_password repair snapshots

Auto-Upgrade Issues

Upgrade Failed

bash
# Check the upgrade log
journalctl -u nixos-upgrade.service -b

# Common issues:
# - Network failure during flake input fetch
# - Build failure in updated packages
# - Git commit failure (dirty working tree)

If the upgrade broke the system:

bash
# Roll back to the previous generation
sudo nixos-rebuild switch --rollback

# Or select a previous generation at boot

Git Amend Failed

The ExecStartPost git amend has || true so it won't fail the upgrade, but if commits aren't appearing:

bash
cd /home/nixos/nixos-homelab
git log --oneline -5
git status

Common cause: the working tree has uncommitted changes that prevent the amend.

Performance Issues

High Memory Usage

bash
# Check per-process memory
btop  # or: ps aux --sort=-%mem | head

# Check container memory
for c in $(machinectl list --no-legend | awk '{print $1}'); do
  echo "$c: $(systemctl show systemd-nspawn@$c --property=MemoryCurrent)"
done

Disk I/O Issues

bash
# Check I/O scheduler per device
for dev in /sys/block/*/queue/scheduler; do
  echo "$dev: $(cat $dev)"
done

# Check if fstrim has run recently
journalctl -u fstrim.service --since "1 week ago"

Getting Help

bash
# System overview
btop

# Security audit
sudo lynis audit system

# Check all failed services
systemctl --failed

# Full system journal (errors only)
journalctl -p err -b