How to setup a large Magento Mysql database locally with Docker

By August 31, 2016Magento
docker-compose-run-mydumper-myloader-example

Yeah, it’s been a while I know. Recently I needed to export and import the database of a fairly large Magento website. The database dump file(s) were around 15GB in size. Yeah the database isn’t tremendously large, but can get messy when we try to setup a local development environment with Magento and Docker. I will focus on how we setup Magento on Docker in a different article, here though we’ll focus on the export/import of the database and how we achieved it.

The Setup

It’s no secret that I love what Mahmoudz has built, so my docker-compose file is largely based on his laradock setup. My local environment essentially runs on Docker and is several docker containers that together power a Magento site. The containers include: nginx, php-fpm (5.6), redis, and mysql server (5.6). We will be focusing on exporting a large mysql DB from a remote RDS, then importing it into our mysql container. Focusing on the mysql container for a min, we have 3 main ways to work properly with persistent data:

Option 1: Hold the data inside the container

Import the DB into mysql and keep it inside the container image. Then we’ll have to keep track of the image and make sure we don’t accidentally delete it. So the main pro here is that there isn’t much we need to do and this should work out of the box, just import the DB into the mysql container. The biggest con is that we now have to keep track of our mysql container image and frankly the container itself, every time we run docker-compose up and docker-compose down we will lose our DB changes inside the container. Theoretically performance would suffer since docker uses the union file system.

 

Option 2: Container Volume

Use the container volume as defined in the laradock files. We essentially define mysql to use the volumes from the volumes container and then define the folders to map inside the volume container. The Pros are the simplicity of use and our docker compose file becomes super easy to read and organized. The Downside is that I have found that this method isn’t the most stable and docker seems to crash with large DB inserts. Another issue with this method is that CPU usage is super high with this method.

 

Option 3: Named Volume

Use docker named volumes. This method was suggested to me directly by @justincormack who has a presentation about docker and osfx mac/windows. I reached out with the issue and he recommended named volumes. The pros: seems to be the proper way to setup persistent data volume and it’s one of the new features introduced in docker-compose yml ver 2. Once it’s setup in your yml file, it’s super easy to share this data container with other containers. It’s definitely persistent, unlike some documentations out there, it actually stays on your HD even after you run docker-compose down. It was stable and used very little CPU power. The one and only con I was able to find is that it was surprisingly slower than volume containers. I’m not sure if I can optimize it for performance, but for now it was slower.

 

From the amount of information above you can assume that I have tried all three of these methods. You are right. I have tried making the volume container method work, however it always failed at some point and crashed. It also used a ton of CPU power for no apparent reason. I have ended up working with the named volumes: while it was performing relatively slow and overall took a while to import all that DB, it did work and never failed. Even when mysql reported errors, the container itself didn’t crash. I’m not sure why #2 performed much better than #3 but perhaps it’s only an indication of an early feature and near future releases will bring performance enhancements. Only time will tell.

The Tools

Ok, up until now I described our dev environment and how we set it up. However, in order to export and import the larger-than-usual DB I had to dump mysqldump in favor of mydumper – a tool written by the Percona team and offered for free (open source) on Github. The cool thing with mydumper is that you do not need to compile it locally these days, instead I used a docker image that’s available publicly and made my life so much easier (go Docker!).

Exporting on the remote:

Exporting turned out to be a bit easier than expected. I’ve installed docker locally, started the daemon and issued a command:

We then get a folder full of files, 2 files per table and a metadata file. I gzipped the folder and copied it over to my local dev environment.

I ran mysql container through docker-compose in daemon mode and had it listening on port 3306 and ip 127.0.0.1. The exposing of the port isn’t necessary for this tutorial but is helpful if you want to check on the status of the server with tools like MySQL Workbench. Then I actually added the mydumper as a service inside the same docker-compose so that both services will share the same network. I then used the myloader service which comes within the same mydumper docker image and complements the mydumper tool and imports the data. The usage was as follows, notice the cool docker-compose run which allows you to run one off commands in containers that are listed inside the docker-compose file.

 

The Process

mydumper-myloader-mysql-docker-process

The process is relatively simple: using docker image of mydumper, we dump the DB on the remote production server. Notice we’re using the prod main web node to dump the files locally and connecting to the remote DB similar to how mysql allows remote host/user/pass. We gzip the folder and use wget to download it onto the dev machine. We then run mysql as daemon container and run myloader in the mydumper container.

Here is where it gets interesting: we use docker containers, which is built to run one process at a time, however both mysql and mydumper/myloader are built to allow for multi threading – so we can in fact scale up the number of threads used for myloader (using the –threads command line argument) and mysql will automatically adjust and span new threads in order to support the incoming traffic, even when inside the docker container.

MySQL Workbench showing the current threads running on the MySQL Server. Notice it's multi threaded while inside a docker container.

MySQL Workbench showing the current threads running on the server. Multi threaded while inside a docker container

Success!

A couple of hours later and we got all the data in our docker mysql setup. The interesting part is that I did encounter an issue and it seems that we have a DB corruption somewhere because one of the threads inserting data into the bigger tables halted and later reported a semaphore error. However since we were using myloader instead of mysqldump and it is multi threaded, the rest of the tables did complete successfully. Luckily, the specific table wasn’t too important for what we were trying to do. Mission accomplished!

Thoughts & Tips

  • The conclusion here is to use named volumes, however if you are working with smaller mysql DB or starting a new project from scratch, using volume container will work just fine and it will be slightly faster for some odd reason.
  • The biggest gain in total import time is by truncating certain large tables at the source. If you know Magento enough, you know that certain tables are huge but only hold indexing information or information that can be regenerated later. I saved 50% of the total size by truncating certain unused tables on the live RDS.
  • One of the biggest takeaways here is that while docker’s philosophy is single process containers we can still leverage threading as a huge performance boost overall. I think this is why nginx is so much easier to deploy than apache and probably will have better performance footprint.

 

One Comment