Back in the early days of PostgreSQL, the name was officially changed from Postgres to Postgres95 and finally to PostgreSQL. While the name certainly has it challenges in pronunciation for people new to PostgreSQL, the community has regularly debated a formal name changes. This seems to pop up every now and then, but it always comes back to the same result. We're keeping the name PostgreSQL, but the nick-name Postgres is perfectly acceptable.
Over the last month, I've attended several conferences and had an opportunity to talk with a bunch of developers and to some of them, SQL is just toxic. I was able to wrangle a few of them to talk a bit and I learned a little about their first impressions. It wasn't just PostgreSQL, but MySQL and SQL Server fell into the same boat just because of their name. When I told them about Christophe Pettus's benchmark of Postgres vs MongoDB  and Mason Sharp's benchmark of Postgres-XC vs MongoDB some seemed intrigued.
As a community, we're not going to win over everyone and there isn't much we can do about toxic perceptions to things that are irrational, but we can recognize their perceptions. The next time I'm at a booth at conference dominated by web developers who many think SQL is a 4 letter word, I'm dropping the QL and using the nick-name Postgres a lot more. Maybe that's just enough for a few more people to stop and find out what Postgres is all about.

Link: view
Title: US PG Days

Over the course of a month, there will be 3 PG Days in the US and I'm excited to be going to all of them. Its always a blast to interact with the local communities and see how they are different in different parts of the country.
The first one up is next Friday at PGDay SoCal. This is run in conjunction with SCALE. I attended this last year and I'm looking forward returning.
The next one up is PyPgDay in Santa Clara on March 13th. I'm a noob when it comes to Python, but seeing how passionate Django developers are about PostgreSQL, it should be a fun time.
The final one is NYC PGDay on March 22nd. This one is the only one that isn't tied to a bigger show, but is shaping up to a pretty big day. We already have almost as many registrations as we did last year, and the early bird isn't over yet. The early bird ends Friday so if you're thinking of attending. You can register at
If you're a PostgreSQL user and you're in SoCal, the Bay Area or the Northeast Corridor, I encourage you to check out your local PG Day.

Link: view

We've seen a lot of interest in running PostGIS on Postgres-XC recently, but unfortunately, setting things up is not as straightforward as it is in PostgreSQL. In PostgreSQL, you simple enter:


Postgres-XC does support extensions, but PostGIS is not your average extension. There are a number of dependencies as well as the spatial_ref_sys table that needs to be properly created on all of the data-nodes. To help people get up and going with PostGIS on Postgres-XC, we've create a tutorial on how to set it up.

Link: view

One of the nicest things about the Postgres-XC architecture is the fact that you can have multiple coordinators and they are all peers of one another. There is no master or slave. What you can do on one coordinator, you can do on all coordinators. The problem comes in when you want to evenly distribute load across all of the coordinators. You have several options on how to do this, but usually the way this is handled is through some sort of intermediary, like HA Proxy, routing connections to the least loaded server. This works, but the downside is that an extra layer is added between the client and the server adding to latency. When you're goal is to have a high throughput transactional database, you want to remove as much latency as possible.

Another thought came to me when talking with some of the folks over at VMware on how they handle high availability for vPostgres. They are essentially using an LDAP server for connection string resolution to route the connection to the correct virtual machine. Its a pretty simple concept and works well, but to really leverage it, modifications to the connectors need to be made so its a non-starter for a general use case. However, changing this concept a little, the same principles can be used at the DNS level.

There is a Bind contrib module for a simple database API (SDB) that allows a zone to be dynamically loaded from a database call. This gives us the flexibility we need to easily load balance traffic across a number of coordinators and as an added benefit, we get high availability as well thanks to the highly available nature of DNS. To leverage this, I wrote an extension module for Postgres-XC that easily plugs into a SDB call.

Check it out:

With it configured, you can connect to the least loaded, available coordinator just by hitting the name

Link: view

Typically, when I give talks about PostgreSQL, Stado or Postgres-XC, I like to use examples as much as possible to get my points across to the audience. This usually presents a problem in deciding what data to use in the examples. The old stand-by is to use pgbench to generate a bunch of sample data, but the values are meaningless to the audience, and you're a bit limited with what you can show. I've also used the sample databases from PG Foundry but also found them lacking.
Last week, I attended the State of the Map conference out in Portland and the answer became blatantly obvious, use the Open Street Map data set. The data size can either be as big or as small as you'd like. You can load in just a small subset of the data covering your city or state or load in the whole planet. This let's me target the example queries to the location that I'm giving the talk making it much more meaningful for the audience. Another great attribute of the data is that it contains both PostGIS data and hstore data so we can show off some of the coolest PostgreSQL features that developers love.
Take for example, the following query that I would use when presenting in New Jersey. It will find the schools in Monmouth County that are near by the Garden State Parkway.

SELECT distinct, ST_AsText(ST_Transform(p.way,4326)) AS pt_lonlattext
FROM planet_osm_point p, planet_osm_roads r
WHERE p.tags @> 'gnis:state_id=>34'::hstore
AND p.tags @> 'gnis:county_id=>025'::hstore
AND p.amenity = 'school'
AND = 'Garden State Parkway'
AND ST_Distance(p.way, r.way) < 400;
If you'd like to start use the Open Street Map data, loading the data into a PostgreSQL is really straight forward thanks to the...

Link: view

Reading through Google's Spanner paper, I can't help but see the parallels to Postgres-XC. Although the two solutions tackle different problem sets, the fact that they are both distributed databases with the goal of providing consistency out to the client, results in very similar architectures. Take for example the first paragraph in the abstract of the paper. If you change just a few words, it does a great job of describing what Postgres-XC can do:

Postgres-XC is PostgreSQL’s scalable, multi-version, distributed, and synchronously-replicated database. It is the first system to distribute data at scale and support externally consistent distributed transactions. This paper describes how Postgres-XC is structured, its feature set, the rationale underlying various design decisions, and a novel Global Transaction Manager. This Manager and its implementation are critical to supporting external consistency and a variety of powerful features: non- blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Postgres-XC.

One of the key differences is how concurrency control is managed. In Postgres-XC, this is all handled by the Global Transaction Manager which supply things like transaction ids. Since Spanner is globally distributed and contacting a single process to get the necessary transaction ids is a real drain on performance, they take the route of using timestamps. They keep the clocks synchronized geographically through the use of GPS and atomic clocks. This method isn't exact and Google claims less than 10ms of uncertainty, which is fine for many applications, but not all. The real interesting part is how Spanner handles multi-version concurrency control. Spanner uses a concept of TT.before and TT.after to determine which version of a tuple the client will see. That sounds a lot like the concept of xmin and xmax...

Link: view

Since first adding PostGIS support to Stado last year, I've heard from a number of people needing to scale their write workload in addition to reads. The architecture of Stado doesn't really lend itself to a write intensive application, but luckily Postgres-XC does. Write intensive PostGIS applications have a few interesting challenges that many applications don't experience. First is just the size of a typical row. A geometry column typically contains a lot of rich information, but that adds up to move size on disk. Take, for example, the nj_edges table contained in the TIGER dataset. On average, each row in that table consumes over 500 bytes on disk. Not enormous, but it does fill up an 8K page fairly quickly. Another challenge is the constraints on the geometry column. Looking at the same nj_edges table, we have the following constraint on the geometry column:

CONSTRAINT enforce_dims_the_geom CHECK ((public.st_ndims(the_geom) = 2)),
CONSTRAINT enforce_geotype_the_geom CHECK (((public.geometrytype(the_geom) = 'MULTILINESTRING'::text) OR (the_geom IS NULL))),
CONSTRAINT enforce_srid_the_geom CHECK ((public.st_srid(the_geom) = 4269)),

Each constraint is fairly straightforward and necessary if you want to ensure your data quality, but their presence adds 5-10% extra overhead to load times compared to simple integer check, ie (gid > 0). This adds up when you're loading a lot of data. This is exactly the type of application that shines running Postgres-XC. The very simple example of cloning a table shows how a 4 node Postgres-XC cluster can scale over a single PostgreSQL instance.
postgres=# create table nj_edges_clone as select * from nj_edges;
SELECT 883537
Time: 3861.257 ms
j1345569270=> create table nj_edges_clone as select * from nj_edges;
INSERT 0 883537
Time: 1372.802 ms
Once you have your...

Link: view

We've scheduled our first meeting of the Philly PostgreSQL User Group so if you in our around Philly on Septemeber 12, come join us. Bruce will be talking about what's new in 9.2.

Link: view

  Being in Central New Jersey, I'm on the dividing line between the suburbs of New York and Philadelphia. Not only is that great during football season where I'm getting a ton of games over the air and not have to spring for Sunday Ticket, I also get to check out the tech scene in both cities pretty easily. When I'm in Philly which is at least a few times a month at this point, I routinely get asked about a Philly PostgreSQL User Group. There is a great, albeit unorganized, PostgreSQL community in the city so now its time to get organized.
  We're starting a PostgreSQL User Group in the city of Philadelphia so if you're in or around the city, check out We'll schedule the first meeting sometime in early to mid September once I can arrange for a venue. If you're interested, sign up for the Meetup group so I have an idea of how big of a place we'll need.

Link: view
© 2013 StormDB. All rights reserved. Developed by Aspire Solutions