trvrm.github.io

Postgres JSON Aggregation

Thu 01 January 2015


I've been using the new JSON functionality in Postgres a lot recently: I'm fond of saying that Postgresql is the best NoSQL database available today. I'm quite serious about this: having used key-value and JSON stores such as CouchDB in the past, it's amazing to me how the Postgres developers have managed to marry the best of traditional relational technology with the flexibility of schema-free JSON documents.

As of version 9.3, postgres allows you to create JSON column types, and provides a number of functions to access and iterate through the data stored in them.

This week I discovered another hidden gem - json_agg() . This function lets you take the results from an aggregation operation and convert them into a JSON block - very helpful if you're then going to work with the returned data in a language like Python

To demonstrate this, we'll first set up some simple tables.

%load_ext sql
%config SqlMagic.feedback=False
%%sql
postgresql://testuser:password@localhost/test
u'Connected: testuser@test'
%%sql
CREATE TABLE IF NOT EXISTS person (
    name text primary key
);

INSERT INTO person (name) VALUES
('emily'),('arthur'),('nicki'),('oliver')
;
[]

We can query this in the usual way:

%sql SELECT * FROM person;
name
emily
arthur
nicki
oliver

But we can also use json_agg()

%sql SELECT json_agg(name) FROM person
json_agg
[u'emily', u'arthur', u'nicki', u'oliver']

Which gives us a single object to work with. So far, this isn't particularly helpful, but it becomes very useful when we start doing JOINS

%%sql
CREATE TABLE IF NOT EXISTS action(
    id serial primary key,
    created timestamp with time zone default now(),
    person_name text references person,
    type text not null
);

INSERT INTO action(person_name, type) VALUES ('emily','login');
INSERT INTO action(person_name, type) VALUES ('emily','pageview');
INSERT INTO action(person_name, type) VALUES ('arthur','login');
INSERT INTO action(person_name, type) VALUES ('emily','logout');
INSERT INTO action(person_name, type) VALUES ('nicki','password_change');
INSERT INTO action(person_name, type) VALUES ('nicki','createpost');
[]

If we want to ask Postgres to give us every user and every action they've performed, we could do it this way:

%sql SELECT person.name,  action.type , action.created FROM action JOIN person ON action.person_name=person.name
name type created
emily login 2014-11-08 17:45:05.963569-05:00
emily pageview 2014-11-08 17:45:05.964663-05:00
arthur login 2014-11-08 17:45:05.965214-05:00
emily logout 2014-11-08 17:45:05.965741-05:00
nicki password_change 2014-11-08 17:45:05.966274-05:00
nicki createpost 2014-11-08 17:45:05.966824-05:00

But then iterating through this recordset is a pain - I can't easily construct a nested for loop to iterate through each person and then through each action.

Enter json_agg()

%sql SELECT person.name,  json_agg(action) FROM action JOIN person ON action.person_name=person.name GROUP BY person.name
name json_agg
arthur [{u'person_name': u'arthur', u'type': u'login', u'id': 3, u'created': u'2014-11-08 17:45:05.965214-05'}]
emily [{u'person_name': u'emily', u'type': u'login', u'id': 1, u'created': u'2014-11-08 17:45:05.963569-05'}, {u'person_name': u'emily', u'type': u'pageview', u'id': 2, u'created': u'2014-11-08 17:45:05.964663-05'}, {u'person_name': u'emily', u'type': u'logout', u'id': 4, u'created': u'2014-11-08 17:45:05.965741-05'}]
nicki [{u'person_name': u'nicki', u'type': u'password_change', u'id': 5, u'created': u'2014-11-08 17:45:05.966274-05'}, {u'person_name': u'nicki', u'type': u'createpost', u'id': 6, u'created': u'2014-11-08 17:45:05.966824-05'}]

Which becomes much more usable in Python:

people = %sql SELECT person.name,  json_agg(action) FROM action JOIN person ON action.person_name=person.name GROUP BY person.name
for name, actions in people:
    print name
arthur
emily
nicki
for name, actions in people:
    print name
    for action in actions:
        print '\t',action['type']
arthur
    login
emily
    login
    pageview
    logout
nicki
    password_change
    createpost

So now we've managed to easily convert relational Postgres data into a hierarchical Python data structure. From here we can easily continue to XML, JSON, HTML or whatever document type suits your need.