How to Set Up a High-Availability (HA) Proxmox VE Cluster Step-by-Step

Written by

in

High availability (HA) in virtualization is no longer a luxury reserved for large enterprises. With Proxmox VE (Virtual Environment), you can build a redundant, stable, and enterprise-grade hyperconverged infrastructure without prohibitive licensing costs. In this technical guide, we show you step-by-step how to set up a Proxmox cluster in high availability.

1. Prerequisites and Network Architecture

To build a cluster with real fault tolerance, you need at least three physical nodes. Although Proxmox allows two-node clusters, you would require an external device (QDevice) to avoid the split-brain problem (where both nodes believe they have the majority and corrupt storage).

  • Homogeneous hardware: Similar CPUs, RAM, and network cards across all three nodes.
  • Dedicated Corosync network: A switch and port dedicated exclusively to cluster traffic (low latency is crucial).
  • Shared storage: Ceph (built-in hyperconverged) or a high-speed external NFS/iSCSI storage array.

2. Creating the Proxmox Cluster

Access the web interface of the first node (Node 1) and go to Datacenter -> Cluster -> Create Cluster. Name it (e.g., nodosfera-cluster) and select the network interface assigned for internal communication (Corosync).

Once created, click on Join Information and copy the encoded code. Go to Node 2 and Node 3, go to Datacenter -> Cluster -> Join Cluster, paste the code, enter the root password of Node 1, and finish the process. Your three-node cluster is now active!

3. Configuring Shared Storage (Ceph)

For a virtual machine (VM) to automatically migrate from a failed node to an active one, its data must be accessible to all nodes. Ceph is the ideal hyperconverged solution for this:

  1. Install Ceph on each node from the Ceph -> Install Ceph tab.
  2. Create the Monitors (Mon) and Managers (Mgr) on all three nodes to ensure Ceph quorum.
  3. Configure OSDs (physical disks dedicated to Ceph storage on each node).
  4. Create a Ceph storage Pool and assign it as storage for your VMs.

4. Configuring High Availability (HA) and Fencing

The high availability service is managed in Datacenter -> HA. To activate protection:

  • HA Groups: Create a group (e.g., all-nodes) that includes your three nodes. You can prioritize a specific node if desired.
  • Resources: Add the VM or container (CT) you want to protect and configure it in started state.
  • Fencing (Hardware Watchdog): Proxmox uses hardware watchdogs to safely reboot a node that has lost communication, ensuring it does not access the disks at the same time as another node (preventing data corruption).

Conclusion

With this cluster active, if Node 1 suffers a hardware failure, the cluster will detect the loss in less than a minute, isolate the node, and automatically restart all your critical VMs on Node 2 or 3. At Nodosfera, we design and deploy these high-availability architectures with Proxmox to guarantee total continuity for your business.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *