Home Lab
GitHub: https://github.com/ricmatsui/home
Overview
The project provides infrastructure for running services accessed via an ingress. Services can also communicate with each other using shared networks.
Infrastructure
The infrastructure consists of a cluster of nodes. Each node runs WireGuard to create a private overlay network for the nodes to communicate over. The nodes are clustered using Docker Swarm which allows services to be easily managed and provides virtual networking that spans the entire cluster. For storage, each node has local block storage attached and Gluster is used to create a distributed file system so that services which require persistence can mount storage that is resilient and available across all nodes. Datadog is used for monitoring and log aggregation.
Ingress
External access to the cluster arrives at a specific ingress node using WireGuard. Cloudflare DNS stores the public IP of the router and the router is configured to forward WireGuard traffic to the ingress node running WireGuard. Traefik is also running on the node with published ports and provides TLS termination. It then routes the traffic to the service running in the cluster.
DNS
A dynamic DNS service running in the cluster automatically updates the Cloudflare DNS record with the public IP of the router. Additional DNS records are set for services and contain the WireGuard IP of the ingress node.
VPN
WireGuard is dynamically configured using a WireGuard UI service running on the ingress node, which is accessed via Traefik, similar to other services. The WireGuard UI service updates the configuration file and a systemd path unit then activates due to this change and syncs the configuration with the WireGuard service.
Routing
Traefik is given access to the underlying Docker Swarm so that it can automatically reconfigure itself based on changes to services running in the cluster. Docker labels are set on the services to provide the configuration. Incoming requests are routed to the corresponding service in the cluster based on port and host header. TLS certificates are automatically managed by using Let’s Encrypt with DNS challenges. Traefik updates Cloudflare DNS to answer the DNS challenges. A forward auth service is configured as a middleware for some services, providing delegated authentication using Google OAuth. The auth service sets a client-side cookie to establish a session.
Storage
Gluster is used to create a resilient, distributed file system. A cluster of servers forms a trusted storage pool, providing a volume to clients. The volume is configured as replica 2 for redundancy, allowing for failure of one storage device without data loss. Each node mounts the volume using the Gluster client which communicates with the cluster. Services running on the node can then have a bind mount to access the volume, allowing them to run on any node. To protect against data corruption, quorum majority must be maintained to allow writes to the volume and bitrot detection is enabled to automatically heal corrupted files.
Service Networking
The infrastructure nodes are clustered using Docker Swarm which provides overlay networking that spans all nodes in the cluster and DNS resolution of services for service discovery. Services are then attached to these overlay networks, enabling them to resolve and connect to related services in the cluster by service name.
Deployment
Ansible is used to apply changes to the infrastructure. It accesses the cluster nodes using the ingress node as a jump host, and it may also make changes directly, for example for DNS records.
Stacks
Services are deployed using Docker Swarm stacks . Stacks define services including their placement constraints, mounts, and networks. The services are then scheduled in the cluster by Docker Swarm.