Playing with JSON in Postgres

I can fetch a JSON list of subscribers like this:

curl -u $API_LOGIN:$API_PASSWORD https://example.org/subscribers

Then there’s jq, which is basically AWK for JSON.

The following invocation simply breaks a top-level JSON array into an object per line.

cat subscribers.json | jq '.[]' -c

And then there’s Postgres, which has a COPY command that will (by default) accept a row per line, with columns separated by tabs. In my case I only care about one column.

(I specify the port number because I have both Postgres 9.1 and 9.3 servers running)

cat subscribers.txt | psql -p 5433 -c 'DELETE FROM scratch; COPY scratch (subscriber) FROM STDIN;'

Put it all together:

curl -u $API_LOGIN:$API_PASSWORD https://example.org/subscribers 
| jq '.[]' -c 
| psql -p 5433 -c 'DELETE FROM scratch; COPY scratch (subscriber) FROM STDIN;'

Bam, I’ve refreshed my entire local subscriber database in one shell command and 4 seconds (most of which was the initial HTTP request). This could easily go in a cron job.

OK, so now I have a bunch of rows with raw JSON data in them, what’s the big deal? Ah, but it isn’t raw. I declared the subscriber column as type json.

Let’s count active subscriptions.

psql -p 5433 -c "SELECT COUNT(*) 
from scratch 
where subscriber#>>'{subscription, status}' = 'ACTIVE'"

Thanks to Postgres’ JSON querying operators, I just selected into the JSON data in the subscriber column in order to query on a nested property of that data.

How cool is that?!

3 comments

  1. Sweet one-liner. I wish PG would use friendlier json “selectors”.. this hurts my brain:

    subscriber#>>'{subscription, status}’ = ‘ACTIVE’

    This would be more what I am used to:

    subscriber[‘subscription’][‘status’] = ‘ACTIVE’

Comments are closed.