PostgreSQL is goodness
A recent post by my old friend Sandy reminded me that I haven't posted much of anything about PostgreSQL on my blog. About 10 months ago I did some extensive research on open source databases and quickly zeroed in on PostgreSQL. Here in summary form are what's to like about PostgreSQL (PG).
- It's true open sources and free. MySQL is sorta, kinda open source but not to same level as PG.
- PG is very flexible and agnostic. There is nothing special about the function SUM as used in PG versus writing your own SUM function in C and hooking it in. This is also true of datatypes and nearly everything else (like indexes). You can create new datatypes, functions, index types etc. with absolutely no speed penalty and without requiring a recompile of PG.
- PG is very close to the features of Oracle. I use Oracle every day at work and have learned a lot of the ins and outs of that product. No doubt about it, Oracle is very quick, especially when you've got more than 4 processors, but PG is even faster in most cases if you have fewer processors. It's also close because they both use a write-ahead log (thereby supporting MVCC unlikes most other databases which use row or table locking), not to mention stored procedures, updatable views, triggers, basically all types of referential integrity constraints, sub-selects and so on. If ever you needed to migrate your database from PostgreSQL to Oracle you'd be far better off with PG than you would with MySQL, in my opinion.
- Excellent documentation. I once saw a review of databases and the reviewer rated the 'community' by doing a google search and seeing how many hits he got. He got a lot more for Oracle than he did for PG, therefore he said that Oracle is 'better'. The way I read it, is that Oracle's documentation is so poor that you need to access dozens of sites to understand anything, and with PG the source documentation is so good you don't have to go very far. MySQL's docs aren't too bad as well, but PG's is better. I especially enjoyed (!) reading about the internals of PG. You can see the obvious pride that the developers have with PG and they show it through their detailed documentation of how it all works. Understanding the actually algorithms and data structures used can really help when writing programs. When you write a query you can pretty much imagine in your mind's eye how PG is going to handle it. When learning about Oracle, it was all about rules-of-thumb and best practices, you had no idea why you needed to do things a certain way.
- Support for your favorite back-end language. There are certain benefits of having some business rules performed in stored procedures instead of in you client code. It's perhaps even more true in larger companies where one group writes the web pages, another the sales force program, and another the telephone interface. Each is
written in a different language and the business rules are replicated all over the place. With PG you can write these rules in your favorite language (PL/pgSQL, PL/Perl, plPHP, PL/Python,
PL/Ruby, PL/sh, PL/Tcl, PL/R, PL/Java or in C and C++) and store it in the database. An added bonus
is that PL/pgSQL looks a lot like Oracle's PL/SQL, if you're used to
- PG really has no limits. I follow the email@example.com mailing list and am regularly blown away with the size of the machines these people run PG on and how many rows they add every day, like this post:
There are some additional tables that have a few million rows / day of activity, so call it 60M rows/day added and removed.
- PG closely follows the ANSI SQL standard. It does this far better than older versions of MySQL or even Oracle.
- PG has only one type of storage engine. This is in comparison to MySQL which has InnoDB and MyISAM and some others. With a simple test I did in Python I found MySQL and MyISAM to be quite a bit faster than PostgreSQL. But when I tried MySQL with InnoDB it was slower. Basically, I think if you need the features that are included in the InnoDB you'd be better off going with PostgreSQL directly. Some may say that MySQL's multiple storage engines an important feature, but I think it just complicates things and makes you make difficult decisions earlier.
- PG scales well with lots of users. This has to do with the fact that it hardly ever needs to lock a row because of MVCC.
- PG used to be hard to install in Windows, not anymore. I think this is one big reason that MySQL surged ahead of PG historically it was a little tricky to install.
- Select count(*) can be slow. It's still true, but there are ways around it (basically using triggers).
- Some drivers (like the PHP driver) aren't as fast as they could be, but it's getting better all the time.
- PostgreSQL has a funny name.