Migrate from MySQL to PostgreSQL (Rails)

25th October 2023 – 752 words

Our oldest project, the Empfehlungsbund user dashboard, was running on MySQL until recently. For various reasons that I will go into below, we decided to migrate to PostgreSQL. This is a short write-up of the steps we took to migrate the database.

Why PostgreSQL for us?

  • Almost all of our other projects are using PostgreSQL, so having a different SQL dialect in the mix is a bit annoying and confusing when diving into specific vendor-specific SQL features.
  • Transactional DDL, meaning even the migrations are running in a transaction. This is a huge benefit because it means that if a migration fails, the database is rolled back to the state before the migration. Especially useful if you have more than one change in a migration, and one of them fails. With MySQL that is usually a pain, because you have to manually revert the changes that were already applied or comment them out. With PostgreSQL, we even pair our DDL migrations (e.g. adding a column) with a data migration (e.g. filling the column with data) frequently, so that we can be sure that the data is in a consistent state after the migration.
  • We like to use specific SQL-Features, such as JSONB with indexed columns, Text-Search with TS-Vector or Reverse-Tagging-Search with TS-Query.

Migration vs pgloader

Fortunately, there is already an awesome tool: pgloader that can migrate from various Databases to PG. It will create tables, indices etc. and copy over the data.

Preparation

PGloader is available in most package managers, BUT we needed to compile from source because there had been issues with the old version.

# compile from source
apt remove pgloader -y
apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev
cd /root
git clone https://github.com/dimitri/pgloader.git
cd pgloader
make pgloader
./build/bin/pgloader --help
cp build/bin/pgloader /usr/local/bin/pgloader

Now, also make sure that your database users have access to the two databases:

mysql -u root
mysql> CREATE USER 'admin'@'localhost';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'admin'@'localhost' WITH GRANT OPTION;
sudo su - postgres
createuser root
psql
GRANT ALL ON SCHEMA public TO root;
GRANT ALL PRIVILEGES ON DATABASE "YOUR_DB" to root ;

Migration

PG-Loader can be run directly from the command-line but also supports a config file in SQL-like format. In our case:

LOAD DATABASE
        FROM mysql://admin:yourpassword@127.0.0.1/your_db
        INTO postgresql:///target_db
WITH
        preserve index names,prefetch rows = 100
        alter schema 'dbname' rename to 'public'

Run it with:

/usr/local/bin/pgloader -v pgloader.load

The options that we used:

  • preserve index names - this will try to keep the names of indices the same, otherwise, you will get indices in PG with a number suffix like index_foo_on_bar_12311
  • prefetch rows = 100 - this will limit the row length to 100 which fixed an import heap error for us
  • alter schema: it seems that pgloader likes to put the DB into a PG schema, so you might want to rename it to the public schema. That might be only possible if your access rights are right.

Differences in Dialect

The effort to change SQL access patterns is totally dependent on your codebase. In our case, we had a few things that we had to change, especially around date-time handling or JSON access:

DescriptionMySQL PostgreSQL
Rand vs Random order(Arel.sql("rand()")) order(Arel.sql("random()"))
group by year of a datetime col where('year(created_at) = ?', year) where('extract(year from created_at) = ?', year)
Format Month in DB group("DATE_FORMAT(senddate, '%Y-%m')") group("to_char(senddate, 'YYYY-MM')")
extract JSON keys where("json_column->>\"$.json_key_boolean\""). where("(json_column->>'json_key_boolean')::boolean")
Truncate to date to group by date date(senddate) date_trunc('day', senddate) AS senddate
Date arithmetic where day > DATE_SUB(CURDATE(),INTERVAL 6 month) where day > (CURRENT_DATE - INTERVAL '6 month')