Sunday, January 31, 2010

Going ...

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

See you there! Or maybe even there.

Wait ... I have the last slot on Saturday and the first slot on Sunday?!? Great! :^)

Monday, January 25, 2010

PostgreSQL: The Universal Database Management System

I'm glad you asked, since I've been pondering this for a while.  $subject is my new project slogan.  Now I'm not sure whether we can actually use it, because a) it's stolen from Debian, and b) another (commercial, proprietary) database product already uses the "universal database" line.

I have come to appreciate that the "universality" of a software proposition can be a killer feature.  For example, Debian GNU/Linux, the "universal operating system", might not be the operating system that is the easiest to approach or use, but once you get to know it, the fact that it works well and the same way on server, desktop, and embedded ensures that you never have to worry about what operating system to use for a particular task.  Or Python, it's perhaps not the most geeky nor the most enterprisy programming language, but you can use it for servers, GUIs, scripting, system administration, like few other languages.  It might as well be the "universal programming language".  A lot of other software is not nearly universal, which means that whenever you move into a new area, you have to learn a lot of things from scratch and cannot easily apply and extend past experiences.  And often the results are then poor and expensive.

The nice thing about PostgreSQL is that you never have to worry about whether to use it, because you can be pretty sure that it will fit the job.  Even if you don't care whether something is "open source" or "most advanced".  But it will fit the job.  The only well-known exception is embedded databases, and frankly I think we should try to address that.

Tuesday, January 12, 2010

Procedural Languages in PostgreSQL 8.5: The One That Works!

While much of the PostgreSQL hacker world is abuzz over two-letter acronyms (HS, SR, VF), I will second Andrew's post and will generalize this to say, partially tooting my own horn, of course, that the next PostgreSQL release will be a great one for procedural languages. Behold:
  • PL/pgSQL is installed by default.
  • New DO statement allows ad hoc execution of PL code.
  • PL/pgSQL finally got a sane parser.
  • PL/Perl got a shot in the arm.
  • PL/Python got saner data type handling, Unicode support, and Py3k support.
  • Not directly related, but the coming PL/Proxy features are looking promising as well.
  • (Meanwhile, language historians will be interested to know that PL/Tcl has received exactly zero feature or bug-fix commits since 8.4.)
This will be a great boost for PostgreSQL the development platform.

Monday, January 4, 2010

Remove and Purge

Debian's package manager dpkg has the perhaps unique feature that it distinguishes between removing and purging a package. Removing it removes the program files but keeps the configuration files (and sometimes the logs) around, purging it really removes everything. While this distinction undoubtedly has some uses, I have found that I almost never make use of it. I think in about six years of using Debian I have actually needed a remove-but-not-purge functionality about five times, during some really tricky upgrades (and using Aptitude instead of APT might have helped, not sure) and once when I wanted to build a package that had a build dependency that conflicted with a package I had installed (cowbuilder came later).

I think many people don't fully realize this distinction, and thus aged systems will often contain dozens or hundreds of removed-but-not-purged packages lying around. Great fun cleaning that up. And therefore, at some point in the distant past I have switched all my APTs to purge by default, using the configuration setting Apt::Get::Purge "true";. At the time I thought this would be daring, but I have never looked back. The one time a year that I don't want to purge I override this by hand.

Later, APT actually got an apt-get purge command, but there is no apt-get autopurge and no apt-get dist-upgrade-and-purge (or whatever) to purge the packages it wants to remove. This can be worked around by carefully adding --purge to all invocations of apt-get, but who will remember that. And of course apt-get remove is hardwired into my fingers.

How do other people handle this? Are there undiscovered reasons removing is the better default? How do you clean up packages that were forgotten to be purged?

Sunday, January 3, 2010

Missing Features for PostgreSQL SQL Conformance

A thought to start the new year: Perhaps it's time for the final push to complete the core SQL conformance for PostgreSQL.

Where do we stand? The PostgreSQL documentation lists in its appendix the currently supported and unsupported SQL features. As explained there, a certain subset of these features represents the "Core" features, which every conforming SQL implementation must supply, while the rest is purely optional. The unsupported features page currently lists 14 remaining Core features and subfeatures that are missing from PostgreSQL. Two of those are about client-side module support that is actually not mandatory if the implementation provides an embedded language (e.g., ECPG), so there are 12 items left.

So that's not so bad. Here's a list of the missing features:

E081-09 USAGE privilege

This would mean adding a USAGE privilege to domains.

Maybe this isn't very useful, although perhaps those working on SELinux support might have a more qualified opinion on it.  But let's say if we get all the other things done and this is left, this would be a fairly straightforward and well-defined feature to add.

(This would then complete feature E081 Basic Privileges.)

E153 Updatable queries with subqueries

This presupposes updatable views and requires views to be updatable even if their WHERE clause contains a subquery.

This is probably the big one. In the current PostgreSQL architecture, updatable views are apparently quite difficult to implement correctly. The mailing list archives contain plenty of details.

F311-04 CREATE VIEW: WITH CHECK OPTION

This also presupposes updatable views and requires the CHECK OPTION feature. See above.

(This would then complete feature F311 Schema definition statement.)

F812 Basic flagging

This feature means that there should be some implementation-specific facility that raises a notice or warning when a not standard-conforming SQL statement or clause is used. Or in other words a facility that warns when a PostgreSQL extension is used.

A naive implementation might consist of just adding something like elog(WARNING, "not SQL standard") in about five hundred places, but the trick would be to implement it in a way that is easy to maintain in the future. The mailing list archives also contain some discussions about this, key word "SQL flagger".

S011 Distinct data types

This is a way to define user-defined types based on existing types, like
CREATE TYPE new AS old;
Unlike domains, this way the new type does not inherit any of the functions and operators from the old type. This might sound useless at first, but it can actually create better type safety. For example, you could create a type like
CREATE TYPE order_number AS int;
while preventing that someone tries to, say, multiply order numbers.

The implementation effort would probably be similar to that for domains or enums. Also, search the mailing list archives for "distinct types".

(This includes feature S011-01 USER_DEFINED_TYPES view.)

T321 Basic SQL-invoked routines

There are a number of bits missing from fully SQL-compatible SQL function definitions, besides the specific subfeatures mentioned below.
  • Instead of a routine body like AS $$ ... $$, allow one unquoted SQL statement as routine body (see example below under RETURN).
  • LANGUAGE SQL is the default.
  • SPECIFIC xyz clause, allowing the assignment of an explicit "specific routine name" that can be used to refer to the function even when overloaded. Probably not terribly useful for PostgreSQL.
  • DETERMINISTIC / NOT DETERMINISTIC clause. DETERMINISTIC means the same as IMMUTABLE in PostgreSQL; NOT DETERMINSTIC is then STABLE or VOLATILE.
  • CONTAINS SQL / READS SQL DATA / MODIFIES SQL DATA clause. These also appear to overlap with the volatility property in PostgreSQL: MODIFIES would make the function volatile, READS would make it
    STABLE.
Also, for DROP FUNCTION the ability to drop a function by its "specific name" is required:
DROP SPECIFIC FUNCTION specific_name;
There are probably some more details missing, so part of finishing this item would also be some research.

T321-02 User-defined stored procedures with no overloading

Add a new command CREATE PROCEDURE that does that same thing as CREATE FUNCTION .. RETURNS void, and a DROP PROCEDURE command.

T321-04 CALL statement

Add a new command CALL procname() that does the same thing as SELECT procname() but requires procname() to not return a value, meaning it has to be a procedure in the above sense.

T321-05 RETURN statement

Add a new command RETURN callable only from within SQL functions. Then, instead of writing a function like
CREATE FUNCTION name(args) RETURNS type LANGUAGE SQL
AS $$ SELECT something $$;
write
CREATE FUNCTION name(args) RETURNS type LANGUAGE SQL
RETURN something;

That's it! Plus all the stuff I missed, of course. We only have about 2 weeks left(!) until the final commit fest for the 8.5 release, so it's a bit late to tackle these issues now, but maybe for the release after that?