MySQL vs. PostgreSQL comparison
Contents
Support is often asked for web hosting which of the two (MySQL and PostgreSQL) is better. As always, the answer is that it'll depend on your requirements.
That said, Anchor uses and recommends PostgreSQL.
Rough comparison of features
MySQL
Pros
- Somewhat lighter-weight
- Has commercial support if wanted, like redhat
- Widespread, plenty of apps that target it
- Blazing fast performance, particularly when predominantly SELECT queries, especially on MyISAM tables
- "Enterprise-grade" clustering and stuff like replication is available
Cons
- MyISAM tables don't support transactions and have little idea about ACID properties. If a MyISAM table breaks (frequent), you need to run myisamchk. This will take multiple aeons on a large table
- MyISAM tables only do table-level locking, so concurrency is bollocks
- Last I checked, subquery performance in MySQL was pretty awful
Triggers are perverse, see Triggers in MySQL for the gory details
PostgreSQL
Pros
- Very feature rich
- Uses multi-version concurrency control, so concurrent performance rocks
Fully ACID compliant
- Well-documented
- Strong access-control framework
Cons
- Frequently perceived as being slower than mysql, but this tends to happen only when doing heavy SELECT queries
- No commercial support
- Less-mature replication software
Glossary
Feel free to add more here as needed. This area is to define DB-specific terms and help answer questions that customer may (ie. should be) asking.
RDBMS
Relational Database Management System
"Lightweight"
Subjective. A lightweight application is one that doesn't use much resources for the features/speed it provides. On a sizeable database, performance may well be blazingly fast, but it's not lightweight if it eats several gig of memory to do it.
Commercial support
As well as the free MySQL that everyone knows and loves, there's Enterprise versions available that include more features, scalability, clustering, support and management tools. There's a feature comparison here, prices start at $600 per server, per year (probably USD).
MyISAM
MyISAM is a lightweight database storage engine used by MySQL. It's the default for new MySQL databases, meaning most hapless users are using it without even knowing it. Well-supported by applications that use it, but definitely not the best thing since sliced bread. MyISAM data is machine and OS-independent, which is nice. However, it doesn't have support for foreign keys (used for inter-table constraints and checks), and no transaction support (these are Very Good things to have). Performs well with mostly SELECT-based loads. Concurrency is poor if UPDATES are performed, due to table-level locking.
InnoDB
Included with MySQL, InnoDB claims to be ACID-compliant and offers row-level locking, transactions, foreign-key constraints and crash-recovery abilities. Concurrency is greatly improved with row-level locking, allowing non-overlapping SELECTs to take place at the same time as UPDATEs. InnoDB is said to be a bit slower and heavier than MyISAM, but this needs to be considered in light of usage requirements. MyISAM is probably unmatched for pure reads, but InnoDB offers much in the way of flexibility and capability. It's worth noting that the InnoDB engine is sometimes disabled in my.cnf by default sometimes. Anchor *strongly* recommends the usage of InnoDB over MyISAM, as when stuff breaks, it's much easier to work with and recovery is much easier.
Replication
To provide greater performance or reliability, MySQL can be replicated from master servers to slave servers. There is much flexibility in how this is done, and MySQL offers funky stuff that can make your brain hurt. As things go, it's also "pretty easy" to setup, probably no more than an hour of remote-hands charges for a customer.
ACID
Stands for Atomicity, Consistency, Isolation, Durability. Atomicity means that all tasks in a transaction (a single statement, or an explicit batch of statements) are either fully completed, or not completed at all. An example would be a money transfer between two bank accounts. If one account is credited, it's vital that the other is debited. Consistency refers to all "rules" being abided by, such as data-type constraints on a column, or assertions on the allowed range of values on a column. Isolation means that events within a transaction are isolated from other transactions until it is completed. It should not be possible to read the database and see an intermediate state of a transaction. Durability ensures that once a transaction has been completed and the application notified, the results will not be lost. This usually means writing to disk.
Multi-version Concurrency Control (MVCC)
PostgreSQL uses a different form of concurrency management, along with its own secret-recipe storage engine. MVCC works by taking a "snapshot" in time at the start of each transaction. This basically means that each transaction is fully isolated from the others. In conjunction with finely-grained row-level locking, this allows for a high level of concurrency and performance. While this in theory can lead to contention and deadlocks, PostgreSQL is able to detect and gracefully handle such situations. See "Isolation" for more details.
Isolation
While the ANSI SQL standard specified four levels of transaction isolation, PostgreSQL offers two, namely READ COMMITTED and SERIALIZABLE. READ COMMITTED ensures that one transaction will only see data that has been committed by other transation. This in theory means the data being read could change while you're in the middle of reading. This is usually not a big problem, however, and is suitable for most applications while allowing good concurrency and speed. SERIALIZABLE ensures that all transactions are executed in serial, guaranteed to be unaffected by others. In practise most transactions will be parallelised where possible, but if any overlap occurs, PostgreSQL will abort one transaction and issue an error.
Row/Table-level locking
Locks are used to prevent concurrent access when one transaction needs to ensure that data will not be changed by other transactions. The MyISAM storage engine can only perform table-level locking, meaning that even a small UPDATE on a large table makes the rest of the table unusable by other transactions. This can have profound effects on usability if operations such as a repair/check are taking place on a live system. InnoDB and PostgreSQL both support row-level locking, meaning only transactions with overlapping areas will affect each other. This is a very good reason to not use MyISAM.
Subquery
An SQL query made as part of another "parent" query to fetch data that will be processed/used in the parent. Most queries can be rewritten to avoid a subquery, which is preferable as subqueries tend to be a bit slow in MySQL.
Access control
MySQL and PostgreSQL both offer forms of access control to connecting clients, and both work a little bit differently. It roughly boils down to "user X can connect from IP address Y and has privileges A,B,C on object Z". Some will find MySQL's manner of doing this a little bit perverse and not entirely straightforward, due to the way it splits this metadata over a number of system tables. PostgresSQL does it a little more simply, and access control is done via a conf-file and standard GRANT/REVOKE statements. The conf-file specifies restrictions on how a given user (now knows as ROLEs) may connect to the database, after which GRANT statements are used to control exactly what they have access to.
