Monthly Archives

May 2011

Bypassing DISTINCT’s ORDER BY Requirement with Sub Query Wrapper

When using DISTINCT in a query to weed out duplicates (which is necessary often in PostgreSQL, since it has the very tiresome rule of making a simple GROUP BY pointless by requiring that all fields being pulled be included in the GROUP BY expression), its important to use smart, efficient methodology.

Using DISTINCT can be very simple and very powerful, but can quickly get annoying when you start off the beaten path unless you employ some simple SQL techniques.

Lets say we have a table called employees that has several fields including (but not limited to) id, name, and clearance_level (whether id is a primary key or not is un-important, as well as whether or not the other fields are integers or strings or blah blah blah.)

SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E ORDER BY id

Which is fine and dandy, if you want the query to return them by the order of their id. However, I’m sure you’ve found that when you attempt something like say:

SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E ORDER BY clearance_level

or

SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E ORDER BY name

…then you’ll find yourself in sad or dire straits.

Fear not! We can bypass this nasty little requirement of DISTINCT by making the original query a sub query and then re-order that data. Not only that, it is crazy simple:

SELECT * FROM
(SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E) xyzzy
ORDER BY clearance_level;

Voila!

You can see, we simply turned our original query into a sub-query, from which we are collecting everything, and then referring to it as xyzzy (of course you can name it anything you please) and then ordering that data by the clearance_level.

I would recommend that delimiters that can be used in the initial query be used so, so as to allow the secondary (outer) ORDER BY statement to run as fast as possible.