Load Balancing Postgres-XC
One of the nicest things about the Postgres-XC architecture is the fact that you can have multiple coordinators and they are all peers of one another. There is no master or slave. What you can do on one coordinator, you can do on all coordinators. The problem comes in when you want to evenly distribute load across all of the coordinators. You have several options on how to do this, but usually the way this is handled is through some sort of intermediary, like HA Proxy, routing connections to the least loaded server. This works, but the downside is that an extra layer is added between the client and the server adding to latency. When you're goal is to have a high throughput transactional database, you want to remove as much latency as possible.
Another thought came to me when talking with some of the folks over at VMware on how they handle high availability for vPostgres. They are essentially using an LDAP server for connection string resolution to route the connection to the correct virtual machine. Its a pretty simple concept and works well, but to really leverage it, modifications to the connectors need to be made so its a non-starter for a general use case. However, changing this concept a little, the same principles can be used at the DNS level.
There is a Bind contrib module for a simple database API (SDB) that allows a zone to be dynamically loaded from a database call. This gives us the flexibility we need to easily load balance traffic across a number of coordinators and as an added benefit, we get high availability as well thanks to the highly available nature of DNS. To leverage this, I wrote an extension module for Postgres-XC that easily plugs into a SDB call.
Check it out:
With it configured, you can connect to the least loaded, available coordinator just by hitting the name
psql -h coord.xc.stormdb.us postgres
The key to effectively load balancing with this method is that the TTL of the A record is set to 0. This is something you never want to do for your public www A record, but for an internal database connection, it works great. By having a TTL of 0, the record is not cached causing it to be looked up every time a connection is made and since pgxc_dns returns the IP of the least loaded coordinator, the connections are balanced. The downside to this is that it adds about 20% to the time it takes to connect, but with connection pooling, the extra time may never be noticed.