Postgres-XC Components And High Availability
In the last blog I talked about why we chose Corosync/Pacemaker as the infrastructure to base our HA offerings on. We now move on to the specifics of Postgres-XC and the various components that make its distributed architecture work.
To give a complete picture, let me quickly list down all the components:
GTM (Global Transaction Manager)
GTM is an important component of Postgres-XC. It provides consistent transaction management across the entire cluster. It's also responsible for maintaining the values of sequences. All transaction ids and transaction snapshots are issued by the GTM thus providing a consistent view across the entire cluster. There is a single GTM for one Postgres-XC cluster.
Applications and clients connect to the Coordinator. Typically the Coordinator does not store any actual data. It maintains the metadata about the different nodes (IP addresses, ports, etc.) involved in the Postgres-XC cluster. Coordinator node receives the SQL queries, interacts with GTM and is responsible for routing (parts of) queries to appropriate Datanodes (described below) and collecting the final result. There can be multiple such coordinators and applications can talk to any of them to get the same global picture of the data.
The complete table data is stored across all the Datanodes. Each Datanode, in turn, is responsible for its locally stored data only. The Coordinator identifies which Datanodes are to be involved for a given query and communicates the query (or parts of it) along with the global transaction id and snapshots to the Datanodes for execution. Each involved Datanode sends back its local results to the Coordinators. You can have as many Datanodes as needed per your data distribution strategy.
So, to re-iterate, a HA strategy for each of these components is needed to ensure that the Postgres-XC cluster continues to function across issues.
If you look at the components mentioned above, then the GTM comes across as a single point of failure - SPOF. However Postgres-XC has accounted for the same by making available the GTM Standby component. GTM Standby acts as a backup of a GTM. So a proper HA solution will have to ensure automatic fail-over of GTM to GTM Standby (after promoting it) in case of issues with the GTM node.
Moving on to the Coordinator component, if one Coordinator goes down, the applications should be able to talk to another Coordinator. However the HA solution should handle this properly. Additionally since multiple Coordinators can be made available, load balancing across them can be carried out. Please refer to this earlier blog post on a possible load balancing solution.
The Datanodes contain all the data and present the most challenge in terms of the HA requirements. It is evident that we need to have an up-to-date copy of the Datanode to fail over to in case of issues. Thankfully Postgres 9.x supports replication and replicas can be created of each of the Datanodes. We use synchronous replicas in our case. In synchronous replication, a transaction can hang waiting for a replica to acknowledge back if the replica goes down, so to handle this case, we keep two replicas of each Datanode. Again this provides the HA pieces to build a solution on. But it's upto the HA solution to track/monitor each Datanode and its replicas and take the necessary action to promote one of the replicas in case the Datanodes go down.
Using the Corosync/Pacemaker Linux HA infrastructure and the above components and pieces, we have been able to come up with custom resource agents which provide HA solutions for our product and yes we do sleep a bit easy because of this! :)