Peter Eisentraut's Blog: 2008

Monday, July 21, 2008

New Job at Sun

On July 22nd, 2008, I will be joining Sun Microsystems as PostgreSQL software engineer. Sun has been a valuable contributor to the PostgreSQL project for a number of years now, and I am looking forward to joining them in this effort. I am glad that I will be able to continue my personal role in the PostgreSQL project with the support of the great resources that Sun provides.

I would like to take this opportunity as well to thank my former colleagues at credativ GmbH for their support of the PostgreSQL project and my own career. I wish them much success in their continued development.

So, I expect that I will have more time to contribute to PostgreSQL development from now on, and both Sun and I have a sizeable backlog of projects and ideas that we would like to realize. Time to get started!

Thursday, June 19, 2008

Schema design and ID fields

I'm sure everyone has created something like this once:

CREATE TABLE a (id int PRIMARY KEY, value text);

CREATE TABLE b (id int PRIMARY KEY, data text, a_id int REFERENCES a (id));

CREATE TABLE c (id int PRIMARY KEY, a_id int REFERENCES a (id), moredata text);

They all have id columns, but they are all different sets of IDs. Moreover, the same set of IDs is sometimes knows as "id", sometimes as "a_id". Some time ago I come to the conclusion that this naming scheme is bad. It makes sense locally within the table, because a.a_id would be redundant. But I think it has advantages to be redundant here anyway.

The first advantage is that you can use this join syntax:

a JOIN b USING (a_id) JOIN c USING (a_id)

If you build denormalized intermediate tables (materialized views) of this type

CREATE TABLE a_b AS SELECT * FROM a, b, c WHERE a.id = b.a_id AND a.id = c.a_id

it will fail because the join will have multiple columns named a_id. Unless using the USING join syntax, it won't know they are actually the same. So you would have to write out the column list instead of the asterisk in the select list.

Also, once your SQL queries become more complicated, say using table aliases and subselects, it is easy to lose track of what foo.id means in a particular context.

For all of these reasons, the general rule arises that a column name should be globally unique within a database schema. This should actually be checkable automatically if you allow duplicates only for columns connected by foreign key constraints.

There is another idea you can play with while developing your database schema. You start by creating (conceptually) one big table containing all the columns that are needed to store your data, which obviously requires globally unique column names. Then you use the decomposition algorithm to normalize this table down to the individual tables that you actually implement. Some might find this preferrable to converting ER diagrams.

Friday, June 13, 2008

Debian Release Goals

I have been tracking the Debian release goals progress for a while now, through bug-squashing parties and using various iterations of wiki pages. It's time to write down some thoughts.

First of all, I think the concept of release goals is a fantastic idea. It may be one of the most important conceptual moves in Debian of late. It allows developers to implement distribution-wide changes without having to negotiate with every package maintainer individually and without having to go through the vicious no-change-without-policy/no-policy-without-established-practice maze. It moves the focus back on operating system development instead of package hoarding.

In the future, I think we should come up with some "sexier" release goal descriptions. If you look over the current list, many of these tasks are implementation-level, without any obvious benefit to users: double-build support? no unmet recommends? Or they have incomprehensible descriptions. The pretty much only release goal that has a real user benefit, namely the piuparts cleanness, ensuring clean distribution upgrades, is both the most cryptic and actually the most neglected one in terms of development attention. Don't get me wrong, all these goals are very valid. For better or worse, the press has started to pick up these release goals to put in their various Debian release candidate timeline pre-announcement articles, and next time around we should have something cool in there. If armel support or KDE 4 integration had been release goals, the marketing effect would be better.

The next thing is tracking bugs. If you have some time on your hands, how do you find a release goal bug that you can help with? I had worked up this wiki page to help myself, but it is still too weird. The fact that bts.turmzimmer.net exists shows that the BTS is lacking features or is too hard to use or both. I'm thinking, we should regularly copy the whole BTS into an SQL database and make queries from there. Think aggregation, grouping, full-text search! I'm not familiar with the BTS internals or the LDAP gateway, but I know my SQL. So if anyone knows the other end of this deal, I'd love to chat about setting something like this up.

So then, what happens to the release goals after the release? Has this been thought through?

Obviously, some release goals become technically self-enforcing. Python 2.5 and GCC 4.3 are now the default, and anything that doesn't work with these will have permanent trouble in the future.

Release goals that implement a distribution-wide feature, such as the LSB-based init system enhancements, will really need to become part of policy after the release. Otherwise they would revert to being normal bugs without any NMU powers and the other special attention given to release goals. So over time, the support for the former release goals across packages would again decline. To keep it up, you'd either have to repeat the release goal indefinitely, which would be silly, especially when there are no problems right now, or you'd have to make it a part of policy to enforce it in the future. Essentially, you'd have to consider most release goals trial policy changes, which become permanent if successful.

Near-examples of the above are the eternal release goals of IPv6 support and LFS support. We really have these under control quite well now -- although I still wouldn't dare to call Debian or any general-purpose Linux distribution fully IPv6-ready -- but nowadays these problems are almost all upstream code problems that Debian can't really fix very well on the packaging side. So under the current approach these release goals will exist forever, or worse, will be dropped from release goal status.

And then of course, some release goals fail to be reached. These should reapply for the next release, if interest persists. The dash goal comes to mind.

By the way, we had 13 release goals. 3 are completely done, 5 are pretty much done (less than 10 bugs), 1 more looks like it could get pretty much done before the release, and 4 will most likely fail to be accomplished.

Saturday, May 31, 2008

VirtualBox on lenny

I'm glad there is so much interest in KDE 4.1. I'm going to give this another try on a fresh system without any previously accumulated cruft.

In the meantime, I notice that the Debian lenny installer beta 1 does not boot in VirtualBox OSE as included in Debian. It freezes right after the "OK, booting the kernel ..." message. I had to run "install noapic nolapic acpi=off" from the boot prompt. (Perhaps you don't need all of these.) Does anyone know the reason for this? It'd be great if this could be fixed; otherwise using VirtualBox will become very annoying in the future, in my estimate.

Friday, May 30, 2008

KDE 4.1 experience

I tried the installation of KDE 4.1 beta 1 on Debian, as described here. I ended up undoing this about an hour later. The environment isn't really usable yet, so my advice to the public is: don't do it, on a system you plan to actually use.

Here is a random assortment of problems I encountered:

You can't configure size and appearance of the panel. This is a long-standing problem, but I thought it was fixed.
In general, there don't appear to be all that many configuration options available to replicate various behaviors that I have become accustomed to.
Rendering problems in the system tray; icons appear on top of each other.
Various kicker widgets don't appear to be available for KDE 4 (yet?).
Most applications don't copy their old settings, but it appears that some do?
KMail doesn't interact well with encrypted IMAP accounts. I didn't try much beyond that, because I have no accounts that are not affected by that.
Clicking on an RSS link in Konqueror doesn't link to Akregator anymore. Probably, various MIME type associations are missing.
I couldn't get the environment to remember to open terminal windows maximized. Very annoying.
I couldn't add any application icons to the panel. I'm not sure where on is supposed to store one's favorites now.

So it was quite obvious that that various pieces don't really fit together all that well yet. On the upside, I noticed many really good ideas that I would look forward to using if the overall environment worked.

The good news is that the packaging appears to be really robust (I found one packaging bug with a file conflict) and the upgrade and downgrade works well (you have to use aptitude, as described in Ana's post; apt craps out completely).
So when KDE 4 is actually usable, Debian will be ready. But not for lenny; that would be suicide in my opinion.

Monday, May 26, 2008

Mission Accomplished

To conclude my expedition to PGCon in Ottawa, today I completed the Ottawa Marathon. A very well organized race to go along with a very well organized conference. Until next year.

Saturday, May 24, 2008

PGCon Day Four

PGCon 2008 is over, and I couldn't help feeling somewhat sad.

The second day of talks also featured many good presentations. My own also went quite well. We concluded the day with a meeting of the core team members who were present in Ottawa, and there will probably be a few actions coming out of that. Don't expect anything too dramatic, though.

Thanks to everyone for this great conference, and I'll see all of you next year if not sooner.

Friday, May 23, 2008

PGCon Day Three

I welcome back my faithful readership to their regularly scheduled program. Thursday was the first day of talks at PGCon, as evidenced by the rise of blog posts on planetpostgresql under a "day 1" label. Welcome newbies! :-)

The talks were really good. I won't go into the details here; the slides should be available on the conference web site. For those not here I will summarize the sentiments of the day as Everyone is hiring. If to you, "vacuum" means cleaning the disk rather than cleaning the house, there appear to be literally dozens of jobs just waiting to be grabbed.

In the evening we had the EnterpriseDB dinner. I think I caught a cold in the draft. Must save my voice for my presentation.

Thursday, May 22, 2008

PGCon Day Two

Wednesday morning, I went geocaching again, attempting to complete the second UO cache, but ended up at a pile of rubble again. Hmm. Maybe this game works differently here in Canada. I hope I get to drop my travel bug before I leave though.

The main event of the day, at least for the major developers, was the developer meeting, which I incidentally managed to locate with my GPS device. I was skeptical about this meeting beforehand, but I have to say, it was extremely useful and enjoyable. The wiki page linked to above contains the meeting minutes. So I guess I'm on the hook now to kill off some PostgreSQL mailing lists, and perhaps we'll have a prototype cmake-based build system for PostgreSQL sometime.

In the evening, everyone met at the Yahoo! drinks+food event. Since I arrived an hour late because I was training for the run on Sunday, I didn't get to see any Yahoo things except a few Yahoo-branded napkins. But hey, I got a free beer and I was able to see the rerun of the Champions League final. Someone came up to me later to thank me for my previous blog entry reminding him to watch the game as well. So FYI: As far as I know, there is nothing important on TV today. :-)

Wednesday, May 21, 2008

PGCon Day One

I went geocaching this morning, because I have a Travel Bug to drop in Canada. But I only found a pile of rubble. Ran into Bruce just getting out of his cab from the airport on the way back.

I presented my tutorial on porting Oracle applications to PostgreSQL in the afternoon. This went quite well, and I received encouraging feedback afterwards. The slides of the presentation should appear on the PGCon web site at the linked URL. For those who were confused by the somewhat cynical tone of the presentation: I have been traumatized a bit by the issue. By and large many applications are quite easy and straightforward to port. I certainly do encourage these efforts.

One thing that came up after the presentation that I have not considered in great depth is the issue of performance of the ported result. In the discussion, a few possibilities were mentioned:

The Oracle application is so carefully tuned with optimizer hints, it will never perform on PostgreSQL. We probably can't/won't port it.

Half the time of a porting project will be required to tune the PostgreSQL port, because Oracle optimizes bad queries much better.

Some things perform better in PostgreSQL, some worse. It probably averages out.

With the money you save with PostgreSQL, you can afford better hardware.

More insight on this issue would be welcome.

I think I'll go on a second geocaching attempt now and hit the Royal Oak pub later with the rest of the group.

Note for those coming from across the pond: The Champions League final is televised on TSN and RDS beginning at 14:00 tomorrow. You get both of these channels in the university residence. But there's the developer meeting ...

Tuesday, May 20, 2008

PGCon Day Zero

We're back in town, Ottawa! I had to go through special interrogation twice at the airport (once for immigration, once for customs), but I made it. Now I'm sitting at the desk in the university residence, staring out the window down onto the city. Not much has changed; it feels like I was here just yesterday. How funny.

Tuesday, April 29, 2008

Lenny release goals wiki page

I created the wiki page http://wiki.debian.org/LennyReleaseGoals to have some clickable and more selective links to bugs pertaining to lenny release goals. I found this helpful for selecting a bug to work on every day; perhaps this is useful for others as well. If someone has a clue about writing proper URLs into the bug tracking system, please contribute any improvements.

Thursday, April 24, 2008

The wiki is great!

Well, I asked for a wiki almost three years ago, but it is finally here in official capacity. Thanks! In the few weeks of its existence, we have already seen great progress in documenting many development issues that were previously undocumented or hidden in strange places. I have started to keep my personal notes and todo items, which were previously spread around various files and pieces of papers, under User:Petere.

I have been observing the wikification in the Debian project for a few years now. I expect that in the near future, all of the contents relevant to developers will have migrated from the main web site to the wiki.

A Sporty PGCon!

I have signed up for the Ottawa Marathon on the weekend after PGCon. If anyone is still around on the Sunday, join me on or at the side of the track.

Monday, April 14, 2008

News on the PostgreSQL RPM packages for SUSE

Reinhard Max has revived the PostgreSQL RPM packages for SUSE and has given them a permanent place at the openSUSE Build Service. This builds on and supercedes my earlier offering, so please go to the new URL in the future. And please send your suggestions and contributions; it is "open" after all. By the way, the packages developed there are integrated into the official SUSE distributions, so it is the "real thing".

Wednesday, March 19, 2008

PostgreSQL RPM packages for SUSE

The PostgreSQL RPM packages for SUSE have been neglected a bit recently. I have put up some packages for 8.3.0 and 8.3.1 at the openSUSE Build Service. Use the search interface to search for "postgresql". This won't be a permanent project of mine, but I'll probably continue as long as I need the packages myself and SUSE or someone else doesn't organize this job more sustainably.

Friday, March 7, 2008

Notification about available package upgrades

I'm sure most people have one of these, but here is my yet-another email-me-when-I-need-to-upgrade-something crontab entry, after I found cron-apt too complex:

0 */8 * * *     apt-get -qq update && apt-get -dqq dist-upgrade && apt-get -qq --simulate dist-upgrade | grep ^Inst

If your root mail goes to some place you read (probably a good idea), you will get a list of packages it wants to upgrade. When running stable, this will also effectively send you alerts about security updates.

Tuesday, March 4, 2008

Readding implicit casts in PostgreSQL 8.3

Obviously, a lot of people are having trouble with the removal of many implicit casts in PostgreSQL 8.3. While this will lead to more robust applications in the future, it will prevent many people from moving to 8.3 altogether at the moment. The problem why you can't simply make the casts implicit again is that in a somewhat unrelated development, 8.3 will generate a cast from and to text automatically (well, implicitly) for any data type, which is why most of the casts in question have been removed from the pg_cast catalog altogether and you can't simply run an UPDATE command to put them back the way you want.

I have used a shell script to regenerate the removed casts including the required cast functions. The result is pg83-implicit-casts.sql. I have tested this against some of the recent "bug reports" I found on the mailing lists, so it should work for some people at least. If someone wants to use this, I suggest you only add the casts you really need, not the whole file.

(Note: This blog entry was recovered after a server crash and does not include any of the original comments. Those comments contained additional important insights about the restoration of the casts, which have unfortunately been lost now. The gist was, only restore the casts you need, not all of them.)

Wednesday, February 27, 2008

Announcing git.postgresql.org

A few helpful people and myself have been working on a Git hosting site for PostgreSQL and related projects at http://git.postgresql.org/, which we hereby let loose on the daring public. It serves mirrors of selected projects as Git pull and web access, and it offers project hosting with push access as well as user branch hosting. My earlier blog entry on patch review explains some of the rationale. We hope you will find this service useful.

Friday, February 22, 2008

On patch review

Improving the patch review process was one of the much discussed topics for the 8.4 release cycle. The recurring commit fests are one idea that will be tried out. Here is what I think. If you want more, better, and earlier review happening, you need to facilitate reviewing in many ways.

We currently have a well-established and arguably successful process for maintaining existing features. We have a public source code repository, test infrastructure, and many people willing and able to contribute. Committers can fix small issues immediately and larger issues with small overhead. Others can work on small issues with the small overhead of sending a patch that will soon get committed, and they can work on large issues, well, that is the debate. If more than one person wants to contribute, they obviously can. We have many hands and eyes working on everything, and that works well.

We don't have any of that for new feature patches, the staging area of development. The source code repository is a set of possibly related multiple 10k patches spread around the mailing list archives. The only way for others to get involved in small or large ways is by chance learning about the feature proposal, fetching the patch, which will usually no longer apply cleanly, alter it and send an equally huge patch back. If more than one person does that, it's impossible, unless you want to get involved in interdiff madness. Meanwhile the original developer has to sit still waiting for review, or more likely, continues development, which invalidates all the review work.

Now we also know that reviewing the core logic of patches is in fact hard and needs skilled people of which we would always like to have more. But much of our CVS-based development works with many little hands, and so should feature patch review.

Here is how I think patch review could work better. Someone writing a feature patch should use a distributed version control system that ties into our currently-CVS trunk. He publishes that repository either via web or via email. Now the little hands can get involved immediately. They can pull local copies and review the commit history. They can send small bug fixes, formatting, wording improvements. They can update the patch for new developments in the mainline or investigate collisions with other features in development. Someone else can start writing documentation. The original developer can merge those changes back and continue based on them. Meanwhile, a useful revision history is always available for more people to get involved. Additionally, by a simple link published in a wiki, say, you have obsoleted all the needs for patch trackers or the like. Other people, perhaps more skilled reviewers can offer early comments and easily review development progress.

That is how I would like to work. I have started to use the Git repository for my development work now because I believe that could make it happen. But you can replace your own technology if you think it can achieve similar effects. (In fact, Git is in my mind so ridiculously better than our current anonymous CVS setup and replaces CVSup as well, there is no reason -- other than the need to learn some new tools -- for anyone who has no CVS write access to even bother with CVS anymore.)

Incidentally, quite a few people have shown interest during the mailing list discussion in trying out source code repositories mirrored off CVS to something of their liking. I hope that way we can find more optimal development methods in the near future.

Thursday, February 21, 2008

Debian PostgreSQL Packaging Project

We have launched the Debian PostgreSQL Packaging Project at http://pkg-postgresql.alioth.debian.org/. We are a group of people interested in maintaining Debian packages related to PostgreSQL.

Obviously, Debian already contains a large number of packages related to PostgreSQL. The idea behind this project is to get the involved maintainers together, concentrate resources, exchange ideas, and allow more people to get involved in small ways. See our web site about participating.

Friday, February 8, 2008

When non ASCII names will finally work? (part 2)

no comment

Thursday, January 31, 2008

CDBS usage statistics

Here is a fun little statistic for CDBS lovers and haters: Percent of packages (source, binary) using CDBS over time:


Sources-sid-main-2005-04        9.76    10.02
Sources-sid-main-2005-05        10.20   10.62
Sources-sid-main-2005-06        10.36   10.76
Sources-sid-main-2005-07        10.76   11.16
Sources-sid-main-2005-08        11.05   11.36
Sources-sid-main-2005-09        11.48   12.32
Sources-sid-main-2005-10        12.08   13.12
Sources-sid-main-2005-11        12.66   13.68
Sources-sid-main-2005-12        13.13   14.13
Sources-sid-main-2006-01        13.29   14.37
Sources-sid-main-2006-02        13.59   15.00
Sources-sid-main-2006-03        13.97   15.34
Sources-sid-main-2006-04        14.38   15.73
Sources-sid-main-2006-05        14.53   16.17
Sources-sid-main-2006-06        14.71   16.30
Sources-sid-main-2006-07        15.17   16.54
Sources-sid-main-2006-08        15.84   17.04
Sources-sid-main-2006-09        16.23   17.07
Sources-sid-main-2006-10        16.60   17.24
Sources-sid-main-2006-11        16.97   17.56
Sources-sid-main-2006-12        17.46   17.90
Sources-sid-main-2007-01        17.66   17.97
Sources-sid-main-2007-02        17.87   18.12
Sources-sid-main-2007-03        18.06   18.38
Sources-sid-main-2007-04        18.27   18.68
Sources-sid-main-2007-05        18.86   19.39
Sources-sid-main-2007-06        19.64   20.66
Sources-sid-main-2007-07        19.79   20.93
Sources-sid-main-2007-08        20.13   21.15
Sources-sid-main-2007-09        20.32   21.33
Sources-sid-main-2007-10        20.81   21.95
Sources-sid-main-2007-11        21.16   22.02
Sources-sid-main-2007-12        21.48   22.17
Sources-sid-main-2008-01        22.12   22.67

World domination in about 2020. :)

Tuesday, January 8, 2008

A Sporty PgCon?

Those who stuck around one or two days after PgCon last year in Ottawa might have noticed that there were a lot of runners in town. As it happens the Ottawa Race Weekend will take place again this year the Saturday and Sunday after the conference. They offer everything from a short walk up to a marathon. I am considering signing up for one of the races this year. If any fellow conference goers want to join up in a type of PostgreSQL team effort, feel free to contact me.

Monday, January 7, 2008

Problems with newer kernels

It has become apparent to me that the Linux kernels in Debian testing and unstable (2.6.2x) have all kinds of problems compared to the one in stable (2.6.18), especially when it comes to working with virtualization. I have been trouble booting newer kernels as guest systems in VirtualBox (bug #434723) and QEMU (and bug #449085). There is some chatter in the Gentoo forums to the same effect. I have also had trouble compiling the host system kernel modules of VMPlayer and VirtualBox with newer kernels. If your job requires regular juggling of virtual machines and operating systems, this creates big problems. Plus my UMTS card stopped working (bug #433750), also confirmed by Gentoo chatter. :) I have also tried to build pure upstream kernels myself, but they have the same problems. So my advice to those running Debian post-stable is to stick to kernel 2.6.18 from stable if you are seeing funny issues.

Saturday, January 5, 2008

Configuration files with Git

A while ago I put the configuration files in my home directory (.bashrc, .psqlrc, etc.) into a Subversion repository so I could keep track of what I changed and distribute the changes to different machines. This worked well, but the choice of Subversion created a number of shortcomings: It was difficult to work with unconnected hosts (no local commits), it is difficult to make changes that apply only to some machines (branches, changesets?), and you need a separate directory for the repository.

Now I have converted this to Git, which appears to address these problems. The repository information is entirely contained in the .git directory in the top level of the working tree. I haven't transferred the tree to a different host yet, so that will be next week's project.

Now I'll go ahead and commit the .gitconfig file into the GIT repository. :)