Hardware Infrastructure¶
Overview¶
The HomeLab runs on virtualized infrastructure using Proxmox VE as the hypervisor, with dedicated VMs for Kubernetes nodes.
Cluster Composition¶
Kubernetes Nodes¶
| Role | Count | Purpose |
|---|---|---|
| Master | 1 | Control plane, etcd, API server |
| Worker | 4 | Application workloads |
Node Specifications¶
Each Kubernetes node runs as a VM with:
- OS: Linux (optimized for containers)
- Runtime: containerd
- Kubernetes: K3s v1.34.3
Virtualization Layer¶
Proxmox VE¶
- Role: Hypervisor for all VMs
- Features:
- Live migration
- Snapshot management
- Resource pooling
Storage Architecture¶
Longhorn CSI¶
Longhorn provides distributed block storage across worker nodes:
graph LR
subgraph Worker 1
L1[Longhorn Replica]
end
subgraph Worker 2
L2[Longhorn Replica]
end
subgraph Worker 3
L3[Longhorn Replica]
end
subgraph Worker 4
L4[Longhorn Replica]
end
PVC[Persistent Volume Claim]
PVC --> L1
PVC --> L2
PVC --> L3
PVC --> L4
Features:
- Automatic replication (default: 3 replicas)
- Snapshot and backup support
- Dynamic provisioning
- Volume expansion
Storage Classes¶
| Class | Provisioner | Reclaim Policy |
|---|---|---|
longhorn |
Longhorn | Delete |
longhorn-retain |
Longhorn | Retain |
Resource Allocation¶
Typical Workload Distribution¶
| Namespace | CPU Request | Memory Request |
|---|---|---|
| monitoring | 500m | 2Gi |
| databases | 1000m | 4Gi |
| hub | 200m | 512Mi |
| argocd | 250m | 512Mi |
High Availability Considerations¶
Control Plane¶
- Single master node (acceptable for homelab)
- etcd data backed up regularly
Worker Nodes¶
- 4 worker nodes provide redundancy
- Pod anti-affinity spreads workloads
- Longhorn replicates data across nodes
Failure Scenarios¶
| Failure | Impact | Recovery |
|---|---|---|
| 1 worker down | Minimal, pods reschedule | Automatic |
| 2 workers down | Degraded, some PVCs unavailable | Manual intervention |
| Master down | No new deployments | Restore from backup |
Monitoring Hardware Health¶
Metrics Collected¶
- CPU utilization
- Memory usage
- Disk I/O
- Network throughput
- Node conditions
Alerts¶
- Node NotReady
- High CPU/Memory usage
- Disk pressure
- Network unavailable
Capacity Planning¶
Current Utilization¶
Monitor via Grafana dashboards:
- Cluster CPU usage
- Cluster memory usage
- Storage capacity
- Network bandwidth
Scaling Options¶
- Vertical: Increase VM resources
- Horizontal: Add more worker nodes
- Storage: Expand Longhorn pool