High availability (HA) in virtualization is no longer a luxury reserved for large enterprises. With Proxmox VE (Virtual Environment), you can build a redundant, stable, and enterprise-grade hyperconverged infrastructure without prohibitive licensing costs. In this technical guide, we show you step-by-step how to set up a Proxmox cluster in high availability.
1. Prerequisites and Network Architecture
To build a cluster with real fault tolerance, you need at least three physical nodes. Although Proxmox allows two-node clusters, you would require an external device (QDevice) to avoid the split-brain problem (where both nodes believe they have the majority and corrupt storage).
- Homogeneous hardware: Similar CPUs, RAM, and network cards across all three nodes.
- Dedicated Corosync network: A switch and port dedicated exclusively to cluster traffic (low latency is crucial).
- Shared storage: Ceph (built-in hyperconverged) or a high-speed external NFS/iSCSI storage array.
2. Creating the Proxmox Cluster
Access the web interface of the first node (Node 1) and go to Datacenter -> Cluster -> Create Cluster. Name it (e.g., nodosfera-cluster) and select the network interface assigned for internal communication (Corosync).
Once created, click on Join Information and copy the encoded code. Go to Node 2 and Node 3, go to Datacenter -> Cluster -> Join Cluster, paste the code, enter the root password of Node 1, and finish the process. Your three-node cluster is now active!
3. Configuring Shared Storage (Ceph)
For a virtual machine (VM) to automatically migrate from a failed node to an active one, its data must be accessible to all nodes. Ceph is the ideal hyperconverged solution for this:
- Install Ceph on each node from the Ceph -> Install Ceph tab.
- Create the Monitors (Mon) and Managers (Mgr) on all three nodes to ensure Ceph quorum.
- Configure OSDs (physical disks dedicated to Ceph storage on each node).
- Create a Ceph storage Pool and assign it as storage for your VMs.
4. Configuring High Availability (HA) and Fencing
The high availability service is managed in Datacenter -> HA. To activate protection:
- HA Groups: Create a group (e.g., all-nodes) that includes your three nodes. You can prioritize a specific node if desired.
- Resources: Add the VM or container (CT) you want to protect and configure it in started state.
- Fencing (Hardware Watchdog): Proxmox uses hardware watchdogs to safely reboot a node that has lost communication, ensuring it does not access the disks at the same time as another node (preventing data corruption).
Conclusion
With this cluster active, if Node 1 suffers a hardware failure, the cluster will detect the loss in less than a minute, isolate the node, and automatically restart all your critical VMs on Node 2 or 3. At Nodosfera, we design and deploy these high-availability architectures with Proxmox to guarantee total continuity for your business.
Deja una respuesta