Posts Tagged ‘sql’

Bypassing DISTINCT’s ORDER BY Requirement with Sub Query Wrapper

// May 12th, 2011 // No Comments » // Digital Walkabout

When using DISTINCT in a query to weed out duplicates (which is necessary often in PostgreSQL, since it has the very tiresome rule of making a simple GROUP BY pointless by requiring that all fields being pulled be included in the GROUP BY expression), its important to use smart, efficient methodology.

Using DISTINCT can be very simple and very powerful, but can quickly get annoying when you start off the beaten path unless you employ some simple SQL techniques.

Lets say we have a table called employees that has several fields including (but not limited to) id, name, and clearance_level (whether id is a primary key or not is un-important, as well as whether or not the other fields are integers or strings or blah blah blah.)

SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E ORDER BY id

Which is fine and dandy, if you want the query to return them by the order of their id. However, I’m sure you’ve found that when you attempt something like say:

SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E ORDER BY clearance_level

or

SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E ORDER BY name

…then you’ll find yourself in sad or dire straits.

Fear not! We can bypass this nasty little requirement of DISTINCT by making the original query a sub query and then re-order that data. Not only that, it is crazy simple:

SELECT * FROM
(SELECT DISTINCT ON (E.id) id, name, clearance_level FROM employees E) xyzzy
ORDER BY clearance_level;

Voila!

You can see, we simply turned our original query into a sub-query, from which we are collecting everything, and then referring to it as xyzzy (of course you can name it anything you please) and then ordering that data by the clearance_level.

I would recommend that delimiters that can be used in the initial query be used so, so as to allow the secondary (outer) ORDER BY statement to run as fast as possible.

Updating PostgreSQL fields using multiple tables

// April 29th, 2011 // 2 Comments » // Digital Walkabout

I was involved in a small project where I had to crawl through several systems to mine data, organize it, and filter it down so it could be placed into a database. The actual crawl took many days to complete and unfortunately, a small irregularity occurred during the process. Simply put, INSERTs were being used when in fact UPDATEs with a check should have been implemented.

Imagine my dismay when I found that, although the information was all in the database, it was floating around jumbled and would be difficult to re-organize, so much so that I almost feared that the a re-crawl could be necessary.

Of course, that would have been insane; after all, once you have your data in a database, there should always be some fancy, efficient way of re-organizing it.

At one point during the process, I needed to able to take data from one field and assign it to a similar field in a different table. To do so I required an UPDATE statement of a caliber I had never used (or needed) before.

At first I tried funny (and terrible) variations like:

UPDATE table_a AS a, table_b AS b SET a.this_field = a.thatfield WHERE a.some_id = b.some_id

I’m sure you’re laughing if you can see I was under the delusion that I could use an UPDATE statement like a normal SELECT statement.

Anyhow, after doing a little digging the solution is still elegant and simple simply requiring something still closely resembling an INSERT query.

I ended up using this format instead:

UPDATE private."MyData" MD SET thisid = C.thisid FROM private."ComparisonData" C WHERE MD.thatid = C.thatid AND (other statements..)

As you can see I’m using a Schema called private which you can happily ignore, and some of the tables and fields’ names have been changed to protect the innocent (or perhaps in this case, the guilty :)

Code Snippets: PHP Function sqldate2timestamp

// October 2nd, 2010 // No Comments » // Code Codex

There are times when I have Dates variables which I’ve retrieved through SQL and they need manipulation sometimes later on in rare circumstances. In an environment where pulling too much data from the SQL DB (extra variations of a field) is bad, but pulling the data in its most basic form (timestamp from the start) costs too many CPU cycles (nobody ever said scripting languages were fast), a function like this comes in handy.

An example:

$my_ts = sqldate2timestamp($sql_date);
echo $my_ts;
1989348943830 (or whatever ;-)

Here’s the Code:

//////////////////////////////////////////////////////////
// FUNCTION: sqldate2timestamp
// ///////////////////////////////////////////////////////
// Description:
//
// This function takes an SQL date (e.g. 12/31/2001) and
// converts it into a timestamp.
//
// Note: None
// ///////////////////////////////////////////////////////
// Returns: long integer
//////////////////////////////////////////////////////////
function sqldate2timestamp($date)
{
    return mktime(1, 1, 1,
                  substr($date, 0, 2),
                  substr($date, 3, 2),
                  substr($date, 6, 4));
}

Disgustingly simple, but elegant at the same time. (Like all code should be IMHO)