High Availability of Postgres-XC using the Corosync Pacemaker LinuxHA stack
As with any distributed architecture based product, Postgres-XC has many components that interact with each other across multiple/different physical nodes. High availability (HA) of these components to ensure that cluster functionality is not impaired across failure of subsets of them is very important. Equally important is zeroing down on a High Availability infrastructure which will provide the necessary bells and whistles to manage complex events like node deaths, network split brain, ordered resource stop/start scenarios etc. I am sure you get the drift, yes it can get very complicated very quickly. To give you an idea, without getting too much into specifics, following were some of the considerations that we had to be mindful of:
* If a node in the cluster goes down or becomes unreachable, all the components running on that cluster should be automatically promoted elsewhere. This should happen within seconds of the node going down. Obviously this assumes that we have standby components made available to this HA infrastructure.
* Only one of the "backup" components should get promoted.
* Components can have dependencies, so there should be an ordered way of stopping one resource after the other and other way around the resources should be started in the appropriate reverse order.
* Some components should be preferred to start on specific nodes. The HA infrastructure should allow for this.
* The HA infrastructure should allow for Master/Slave kinds of resource monitoring and automatic promotion of slave resources if the master goes down.
In my research, I landed upon the Linux HA framework. This Linux HA framework has been out there providing reliable High-Availability software on *nix platforms for more than a decade now. The two main components are:
Cluster Messaging Layer:
This layer is responsible for providing node membership information. It provides cluster infrastructure (communication and membership) services to its clients. It allows a platform for easy exchange of messages between member nodes and notifies presence or disappearance of participating nodes immediately.
Cluster Resource Manager:
The cluster resource manager makes use of the messaging and membership capabilities provided by the above cluster messaging layer to provide maximum availability of your cluster resources. Along with this it also has intelligence built in to support redundancy, scriptability, node and resource level recovery, features like resource ordering, co-location etc.
We narrowed down on Corosync as the messaging layer coupled with Pacemaker as the cluster resource manager. We also looked at Heartbeat for the messaging layer, but Corosync has seen much more active work and adoption in recent times. Coupled together, this HA stack using Corosync and Pacemaker addressed most of our concerns raised above. Additionally the fact that there were already existing resource agents for base Postgres available on this framework made the decision easier.
We have worked on it for a while now and while it does have a bit of a steep learning curve, going ahead with this professional, rock-solid, open-source HA platform has been the right choice for us!