Skip to content

Systemd Sandboxing

The server uses systemd-level hardening for both host services and system daemons. This page covers the Cloudflared tunnel sandboxing, journald hardening, and emergency mode configuration.

Source: server/containers/cloudflared.nix, server/modules/systemd.nix

Cloudflared Sandboxing

Cloudflared runs as a systemd service on the host (not in a container), so its sandboxing is critical. It has one of the most restrictive systemd configurations in the setup.

Source: server/containers/cloudflared.nix

Full Configuration

nix
systemd.services.cloudflared-tunnel = {
  after = [ "network-online.target" ];
  wants = [ "network-online.target" ];
  wantedBy = [ "multi-user.target" ];

  serviceConfig = {
    ExecStart = "${pkgs.cloudflared}/bin/cloudflared tunnel run --token-file ${config.age.secrets.cloudflare_tunnel_token.path}";
    Restart = "on-failure";
    RestartSec = 5;

    User = "cloudflared";
    Group = "cloudflared";

    # ... (full sandboxing below)
  };
};

Sandboxing Breakdown

User and Process Isolation

DirectiveValueEffect
User / GroupcloudflaredRuns as a dedicated unprivileged system user
PrivateUserstrueRuns in a user namespace; root inside the namespace maps to nobody outside
PrivatePIDstrueSeparate PID namespace; cannot see or signal other processes
UMask0077Files created by the service are owner-only
KeyringModeprivateSeparate kernel keyring; cannot access host or other services' keys

Filesystem Isolation

DirectiveValueEffect
ProtectSystemstrictEntire filesystem is read-only except explicit ReadWritePaths
ProtectHometrue/home, /root, /run/user are inaccessible
PrivateTmptrueSeparate /tmp and /var/tmp; cleaned on service stop
PrivateMountstrueSeparate mount namespace; cannot see host mounts
PrivateDevicestrueOnly pseudo-devices (/dev/null, /dev/zero, etc.) are available
ProcSubsetpidOnly PID-related entries visible in /proc
ProtectProcinvisibleProcesses outside the namespace are invisible in /proc
ReadOnlyPaths[token path]Only the tunnel token is readable; everything else is blocked

Kernel Protection

DirectiveValueEffect
ProtectKernelTunablestrue/proc/sys, /sys, etc. are read-only
ProtectKernelModulestrueCannot load/unload kernel modules
ProtectKernelLogstrueCannot read kernel ring buffer (dmesg)
ProtectControlGroupstrueCgroup hierarchy is read-only
ProtectHostnametrueCannot change the system hostname
ProtectClocktrueCannot change the system clock

Capability Restrictions

nix
AmbientCapabilities = null;
CapabilityBoundingSet = null;

All Linux capabilities are dropped. The service has zero elevated privileges. Cloudflared only needs to make outbound TCP connections, which doesn't require any capabilities.

Network Restrictions

nix
RestrictAddressFamilies = [ "AF_INET" "AF_INET6" ];

Only IPv4 and IPv6 sockets are allowed. Unix domain sockets (AF_UNIX), Netlink (AF_NETLINK), and all other socket families are blocked.

System Call Filtering

nix
SystemCallArchitectures = "native";
SystemCallFilter = [
  "~@clock"
  "~@cpu-emulation"
  "~@debug"
  "~@module"
  "~@mount"
  "~@obsolete"
  "~@privileged"
  "~@raw-io"
  "~@reboot"
  "~@resources"
  "~@swap"
];
Blocked GroupSyscalls Blocked
@clockclock_settime, settimeofday, etc.
@cpu-emulationmodify_ldt, subpage_prot, etc.
@debugptrace, process_vm_readv, etc.
@moduleinit_module, finit_module, delete_module
@mountmount, umount2, pivot_root, etc.
@obsoleteLegacy syscalls (sysfs, uselib, etc.)
@privilegedchroot, setuid, capset, etc.
@raw-ioioperm, iopl, raw device access
@rebootreboot, kexec_load
@resourcessetrlimit, sched_setscheduler, etc.
@swapswapon, swapoff

SystemCallArchitectures = "native" blocks 32-bit syscalls on the 64-bit system, preventing compatibility-mode exploitation.

Other Restrictions

DirectiveValueEffect
NoNewPrivilegestrueCannot gain privileges via SUID/SGID binaries
LockPersonalitytrueCannot change execution domain (blocks personality-based attacks)
MemoryDenyWriteExecutetrueCannot create writable+executable memory mappings (blocks JIT exploitation)
RestrictRealtimetrueCannot acquire realtime scheduling
RestrictSUIDSGIDtrueCannot create SUID/SGID files
RestrictNamespacestrueCannot create new namespaces (blocks container escape techniques)
RemoveIPCtrueAll IPC resources are cleaned on service stop

Journald Hardening

Source: server/modules/systemd.nix

nix
systemd.services.systemd-journald.serviceConfig = {
  UMask = 0077;
  PrivateNetwork = true;
  ProtectHostname = true;
  ProtectKernelModules = true;
};
DirectiveValueEffect
UMask0077Journal files are only readable by root
PrivateNetworktrueJournald has no network access (blocks remote log exfiltration)
ProtectHostnametrueCannot change hostname
ProtectKernelModulestrueCannot load kernel modules

Emergency Mode

nix
boot.initrd.systemd.suppressedUnits = [ "emergency.service" "emergency.target" ];
systemd.enableEmergencyMode = false;

Emergency mode is completely disabled. On a headless server with no physical access, an emergency shell is an attack vector, not a recovery tool. If the system fails to boot, it should reboot (via panic=1) and attempt to recover automatically.

Disabled Services

Unnecessary services are explicitly disabled to reduce attack surface:

nix
systemd.services = {
  pre-sleep.enable = false;
  prepare-kexec.enable = false;
  systemd-rfkill.enable = false;
  systemd-hibernate-clear.enable = false;
  systemd-networkd-wait-online.enable = false;
};
ServiceReason for Disabling
pre-sleepServer never sleeps
prepare-kexeckexec is disabled via sysctl (kexec_load_disabled = 1)
systemd-rfkillNo wireless hardware
systemd-hibernate-clearHibernation is disabled (hibernate=no boot param)
systemd-networkd-wait-onlineStatic IP; no need to wait for DHCP

Shutdown Timeouts

nix
systemd.settings.Manager = {
  runtimeTime = "15s";
  rebootTime = "30s";
  kexecTime = "1m";
};

Short timeouts ensure the server doesn't hang during shutdown or reboot. If a service doesn't stop within 15 seconds, systemd kills it.

Built with VitePress