trvrm.github.io
Thu 01 January 2015
I've been using the new JSON functionality in Postgres a lot recently: I'm fond of saying that Postgresql is the best NoSQL database available today. I'm quite serious about this: having used key-value and JSON stores such as CouchDB in the past, it's amazing to me how the Postgres developers have managed to marry the best of traditional relational technology with the flexibility of schema-free JSON documents.
As of version 9.3, postgres allows you to create JSON column types, and provides a number of functions to access and iterate through the data stored in them.
This week I discovered another hidden gem -
json_agg()
. This
function lets you take the results from an aggregation operation and
convert them into a JSON block - very helpful if you're then going to
work with the returned data in a language like Python
To demonstrate this, we'll first set up some simple tables.
%load_ext sql
%config SqlMagic.feedback=False
%%sql
postgresql://testuser:password@localhost/test
u'Connected: testuser@test'
%%sql
CREATE TABLE IF NOT EXISTS person (
name text primary key
);
INSERT INTO person (name) VALUES
('emily'),('arthur'),('nicki'),('oliver')
;
[]
We can query this in the usual way:
%sql SELECT * FROM person;
name |
---|
emily |
arthur |
nicki |
oliver |
But we can also use
json_agg()
%sql SELECT json_agg(name) FROM person
json_agg |
---|
[u'emily', u'arthur', u'nicki', u'oliver'] |
Which gives us a single object to work with. So far, this isn't
particularly helpful, but it becomes very useful when we start doing
JOINS
%%sql
CREATE TABLE IF NOT EXISTS action(
id serial primary key,
created timestamp with time zone default now(),
person_name text references person,
type text not null
);
INSERT INTO action(person_name, type) VALUES ('emily','login');
INSERT INTO action(person_name, type) VALUES ('emily','pageview');
INSERT INTO action(person_name, type) VALUES ('arthur','login');
INSERT INTO action(person_name, type) VALUES ('emily','logout');
INSERT INTO action(person_name, type) VALUES ('nicki','password_change');
INSERT INTO action(person_name, type) VALUES ('nicki','createpost');
[]
If we want to ask Postgres to give us every user and every action they've performed, we could do it this way:
%sql SELECT person.name, action.type , action.created FROM action JOIN person ON action.person_name=person.name
name | type | created |
---|---|---|
emily | login | 2014-11-08 17:45:05.963569-05:00 |
emily | pageview | 2014-11-08 17:45:05.964663-05:00 |
arthur | login | 2014-11-08 17:45:05.965214-05:00 |
emily | logout | 2014-11-08 17:45:05.965741-05:00 |
nicki | password_change | 2014-11-08 17:45:05.966274-05:00 |
nicki | createpost | 2014-11-08 17:45:05.966824-05:00 |
But then iterating through this recordset is a pain - I can't easily construct a nested for loop to iterate through each person and then through each action.
Enter
json_agg()
%sql SELECT person.name, json_agg(action) FROM action JOIN person ON action.person_name=person.name GROUP BY person.name
name | json_agg |
---|---|
arthur | [{u'person_name': u'arthur', u'type': u'login', u'id': 3, u'created': u'2014-11-08 17:45:05.965214-05'}] |
emily | [{u'person_name': u'emily', u'type': u'login', u'id': 1, u'created': u'2014-11-08 17:45:05.963569-05'}, {u'person_name': u'emily', u'type': u'pageview', u'id': 2, u'created': u'2014-11-08 17:45:05.964663-05'}, {u'person_name': u'emily', u'type': u'logout', u'id': 4, u'created': u'2014-11-08 17:45:05.965741-05'}] |
nicki | [{u'person_name': u'nicki', u'type': u'password_change', u'id': 5, u'created': u'2014-11-08 17:45:05.966274-05'}, {u'person_name': u'nicki', u'type': u'createpost', u'id': 6, u'created': u'2014-11-08 17:45:05.966824-05'}] |
Which becomes much more usable in Python:
people = %sql SELECT person.name, json_agg(action) FROM action JOIN person ON action.person_name=person.name GROUP BY person.name
for name, actions in people:
print name
arthur emily nicki
for name, actions in people:
print name
for action in actions:
print '\t',action['type']
arthur login emily login pageview logout nicki password_change createpost
So now we've managed to easily convert relational Postgres data into a hierarchical Python data structure. From here we can easily continue to XML, JSON, HTML or whatever document type suits your need.