Scaling PostGIS Location Data
Since first adding PostGIS support to Stado last year, I've heard from a number of people needing to scale their write workload in addition to reads. The architecture of Stado doesn't really lend itself to a write intensive application, but luckily Postgres-XC does. Write intensive PostGIS applications have a few interesting challenges that many applications don't experience. First is just the size of a typical row. A geometry column typically contains a lot of rich information, but that adds up to move size on disk. Take, for example, the nj_edges table contained in the TIGER dataset. On average, each row in that table consumes over 500 bytes on disk. Not enormous, but it does fill up an 8K page fairly quickly. Another challenge is the constraints on the geometry column. Looking at the same nj_edges table, we have the following constraint on the geometry column:
CONSTRAINT enforce_dims_the_geom CHECK ((public.st_ndims(the_geom) = 2)),
CONSTRAINT enforce_geotype_the_geom CHECK (((public.geometrytype(the_geom) = 'MULTILINESTRING'::text) OR (the_geom IS NULL))),
CONSTRAINT enforce_srid_the_geom CHECK ((public.st_srid(the_geom) = 4269)),
Each constraint is fairly straightforward and necessary if you want to ensure your data quality, but their presence adds 5-10% extra overhead to load times compared to simple integer check, ie (gid > 0). This adds up when you're loading a lot of data. This is exactly the type of application that shines running Postgres-XC. The very simple example of cloning a table shows how a 4 node Postgres-XC cluster can scale over a single PostgreSQL instance.
postgres=# create table nj_edges_clone as select * from nj_edges;
Time: 3861.257 ms
j1345569270=> create table nj_edges_clone as select * from nj_edges;
INSERT 0 883537
Time: 1372.802 ms
Once you have your data in a Postgres-XC cluster, you can run the same PostGIS queries that you would on a stand-alone PostgreSQL database including calling PostGIS functions in the WHERE clause.
j1345569270=> SELECT sum(ST_Length_Spheroid(e.the_geom, 'SPHEROID["WGS 84",6378137,298.257223563]'))
j1345569270-> FROM nj_edges e, nj_county c
j1345569270-> WHERE c.name = 'Monmouth'
j1345569270-> AND e.roadflg = 'Y'
j1345569270-> AND ST_Intersects(c.the_geom, e.the_geom);
More information can be found about Postgres-XC at http://postgres-xc.sourceforge.net/ or if you just want to quickly experiment with what Postgres-XC can do without setting up a full cluster, create a StormDB database and you're be able to see what PostGIS can do inside a Postgres-XC cluster in a few seconds.