trvrm.github.io

Postgres Timestamps

Thu 01 January 2015


At my company, I maintain a large distributed. data collection platform . Pretty much every record we collect needs to be stamped with a created field. But because the incoming data comes from sources on various devices in multiple countries and timezones, making sure that the timestamps are precise and meaningful can be a challenge. Postgres can do this very elegantly, but can also trip you up in subtle ways.

Postgres has two subtly different timestamp data types: TIMESTAMP and TIMESTAMP WITH TIMEZONE .

The former stores year/month/day/hour/minutes/second/milliseconds, as you’d expect, and the later ALSO stores a timezone offset, expressed in hours.

We can switch between the two using the AT TIMEZONE syntax, but, and here is the tricky bit the function goes BOTH WAYS, and you can easily get confused if you don’t know what type you’re starting with.

Furthermore, Postgres will sometimes sneakily convert one to the other without you asking.

%load_ext sql
%config SqlMagic.feedback=False
%%sql
postgresql://testuser:password@localhost/test
u'Connected: testuser@test'
%sql SELECT NOW();
now
2014-08-18 22:33:58.998549-04:00

now() returns a TIMESTAMP WITH TIME ZONE . It shows the current local time, and the offset between that time and UTC (http://time.is/UTC)

But if we put the output from now() into a field that has type TIMESTAMP we will get a silent conversion:

%sql SELECT NOW():: timestamp
now
2014-08-18 22:33:58.998549

Which is not the current UTC time. We have stripped the timezone offset right of it. However, if we explicitly do the conversion, we get:

%sql SELECT NOW() AT TIME ZONE 'UTC';
timezone
2014-08-19 02:33:58.998549

Which is the current UTC time: (http://time.is/UTC)

It's worth reviewing the Postgresql documentation on this construct at this point.

Expression

Return Type

Description

timestamp without time zone AT TIME ZONE zone

timestamp with time zone

Treat given time stamp without time zone as located in the specified time zone

timestamp with time zone AT TIME ZONE zone

timestamp without time zone

Convert given time stamp with time zone to the new time zone, with no time zone designation

The danger here is that the AT TIMEZONE construct goes both ways. If you don't know what type you're feeding in, you won't know what type you're getting out. I've been bitten by this in the past; ending up with a timestamp that is wrong by several hours because I wasn't clear about my inputs.

Specifically, consider a table that looks like this:

%%sql
DROP TABLE IF EXISTS test;
CREATE TABLE test(name TEXT, created TIMESTAMP DEFAULT NOW());

Which I then populate:

%%sql
INSERT INTO test (name) VALUES ('zaphod beeblebrox');
INSERT INTO test(name,created) VALUES('ford prefect',now() at time zone 'utc');
SELECT * FROM test;
name created
zaphod beeblebrox 2014-08-18 22:34:03.620583
ford prefect 2014-08-19 02:34:03.621957

Note that the second record contains the current UTC time, but the first contains the current time local to the database server. This seems a good idea, and tends to work fine in local testing. But when you try to maintain a system where the database may be in one province, the data collected in another, and then reviewed in a third, you start to understand why this is too simplistic.

The fact that it's 10:12 now in Toronto isn't very helpful for a record that's getting created for a user in Halifax and is monitored from Vancouver.

So it's probably best to save timestamps WITH their timezone so as to avoid any ambiguity. This is the recommendation given here.

In our above example, the simplest approach is to change the table definition:

%%sql
DROP TABLE IF EXISTS test;
CREATE TABLE test(name TEXT, created TIMESTAMP WITH TIME ZONE DEFAULT (NOW() ));
%%sql
INSERT INTO test (name) VALUES ('zaphod beeblebrox');
INSERT INTO test(name,created) VALUES('ford prefect',now() );
SELECT * FROM test;
name created
zaphod beeblebrox 2014-08-18 22:35:15.988764-04:00
ford prefect 2014-08-18 22:35:15.989726-04:00

So now the dates are globally meaningful. But I still have to be careful, because if I use the wrong date format to populate this table, it'll still get messed up.

%sql INSERT INTO test(name,created) VALUES ('arthur dent',now() at time zone 'utc')
%sql SELECT * FROM test;
name created
zaphod beeblebrox 2014-08-18 22:35:15.988764-04:00
ford prefect 2014-08-18 22:35:15.989726-04:00
arthur dent 2014-08-19 02:35:15.990308-04:00

Note how arthur dent has completely the wrong created time.

Now, if I want to report on this data, I'm going to now have to specify which timezone I want the dates formatted too:

%sql delete from test WHERE name='arthur dent';
%sql select name, created FROM test;
name created
zaphod beeblebrox 2014-08-18 22:35:15.988764-04:00
ford prefect 2014-08-18 22:35:15.989726-04:00

gives me timestamps formatted in the timezone of the database server, which isn't necessarily particularly helpful, which may be helpful, but will be less so if the actual users of the data are in a different time zone.

%sql  SELECT name, created at time zone 'utc' FROM test;
name timezone
zaphod beeblebrox 2014-08-19 02:35:15.988764
ford prefect 2014-08-19 02:35:15.989726

gives me the time formatted in the UTC timezone, and

%sql select CREATED at time zone 'CST' FROM test;
timezone
2014-08-18 20:35:15.988764
2014-08-18 20:35:15.989726

gives me the time formatted for central standard time.

external data

Now so far we've been letting the database create the timestamps, but sometimes we want to save data provided to us from an external source. In this case it's very important the we know what timezone the incoming data comes from. So our middleware should require that all dates include a timestamp. Fortunately, if we're writing javascript applications, we get this automatically:

%%html
<div id="js-output"></div>
%%javascript
var d = JSON.stringify(new Date())
"2014-08-19T02:41:12.872Z"
import psycopg2,pandas
def execute(sql,params={}):
    with psycopg2.connect(database='test') as connection:
        with connection.cursor() as cursor:
            cursor.execute(sql,params)

So let's imagine that we got this string submitted to us by a client, and we're going to store it in the database via some Python code.

sql="INSERT INTO test (name, created) VALUES ( 'externally created date', %(date)s)"
params=dict(date="2014-08-19T02:35:24.321Z")
execute(sql,params)
%sql SELECT * FROM test
name created
zaphod beeblebrox 2014-08-18 22:35:15.988764-04:00
ford prefect 2014-08-18 22:35:15.989726-04:00
externally created date 2014-08-18 22:35:24.321000-04:00

And now we're getting to the point where all our timestamp data is both stored and displayed unambiguously.