Category Archives: MySQL

MySQL Performance and Scalability Notes

In Play Problem with MariaDB JDBC Driver Client org.mariadb.jdbc.Driver

My latest Play problem was actually a jdbc driver bug.

With all the fixing (and learning) I’ve been doing the last few days trying to get the hang of ebeans in Play 2.4.x, the last thing I expected was one of my biggest hangups to be a jdbc bug.  As anyone who has been here knows, I am a huge fan of MariaDB so don’t take this to mean otherwise.  However, if using the Maria client 1.3.2 in Play with ebeans or likely with many other common ORM’s, and you get an error like:

[PersistenceException: ERROR executing DML bindLog[] error[getGeneratedKeys error. Statement.RETURN_GENERATED_KEYS must be precised on connection.prepareStatement(String sql, int autoGeneratedKeys) or statement.executeUpdate(String sql, int autoGeneratedKeys)]]

It isn’t you. It is actually a bug. A smart guy timowest pointed this out on github 10 days ago

Looks like a bug in the driver

The flag Statement.RETURN_GENERATED_KEYS is missing.

I switched to the oracle driver in application.conf and all works great.  Good find Tim!  I’ll be switching back as soon as this is resolved in the maria client binary.  I’m hoping this has been reported.

While I’m giving a shout out.  Thanks also to Matthieu Guillermin and his great post on getting around ebean optimistic locking in Play.  You can find his great post here


Disable Automatic Startup of MySQL on Ubuntu

After installing MySQL or MariaDB on your Ubuntu machine, you will notice that it is configure to automatically start.  The install sets rc.d up for it on all runtime levels.  While a running database server isn’t a huge strain on the machine, why waste memory when you don’t have to.  I generally disable this the way you would any init.d service that you don’t use all the time.  Simply type this at a command prompt

sudo update-rc.d mysql disable

and your development machine will no longer start the database automatically.  If you are going to be using MySQL a lot and want to re-enable automatic start, simply type the same changing disable to enable.  With auto-start disabled, you can start/stop your database service by typing

/etc/init.d/mysql start

/etc/init.d/mysql stop

/etc/init.d/mysql restart

Simple enough but I had to look it up the first time 🙂

Problems Installing Mariadb 5.5, 10.1 on Ubuntu 12, 13, 14.04 etc.

Just a quick note to help anyone who runs in to the same thing I normally do when installing mariadb on a dev machine.  Without going in to all the hows and whys of the problem, when installing maria on ubuntu, it is highly likely that you will run in to some problems related to the mysql client stuff already on your machine.  As you go back an forth updating and upgrading to try and get your machine back to a state in which it will install any software at all, you will likely end up with apt reporting duplicate sources or some sort of conflict between the maria stuff you’ve tried to install and the mysql stuff that was already there.  In the end, the whole upgrade process is messed up and you still don’t have a working database.  For me, the simple way out given that I’ve managed to do this to myself again is to purge both clients and try the install once more.  Hopefully this will work for you.

First, completely get rid of the two conflicting clients:

sudo apt-get –purge remove libmariadbclient1

sudo apt-get –purge remove libmysqlclient18

and then try the mariadb install again.  It should work now.

sudo apt-get update

sudo apt-get install mariadb-server-10.1

Finally try a

sudo apt-get upgrade

to see that everything is back in order.

All should be back to normal on your Ubuntu 14.04 trusty machine.  Have fun with the best MySQL fork!

MySQL Performance and Scalability Notes Introduction

The MySQL category of this blog consists of a collection of my notes on MySQL Administration for Performance and Scalability.
I am starting with information related to single node systems. Some of the things I intend to address include:
1. Basic scaling concepts
2. Various MySQL forks and engines
3. How to use logs and tools to understand your database’s performance
4. The configuration of your single node MySQL database for optimal performance
5. Raw and logical backups
6. Disaster recovery procedures and considerations
7. The generation of sample data and the de-identification of data for testing and demo purposes.
My goal here is to present effective MySQL based solutions to the problems that arise when working with extremely large data sets. I will discuss how to eliminate bottlenecks and tune for maximum performance and scalability in single node environments. In addition, I intend to cover preparing for and dealing with failures including high performance methods of moving and recovering very big databases.  I will note that I’m not much of a web developer but hopefully this information, while not very artfully delivered, will be of use to someone.  If you are new to MySQL or databases in general, stop reading now!  There are much better ways to spend your weekends.  Oh, and keep in mind that these are basically just notes.  I was not writing a book but just compiling a big list of helpful stuff for myself and am now sharing it.  As such, I have plagiarized liberally using copy/paste and done very little in the way of proofreading.  If something is technically wrong or debatable, please point it out and I will fix or qualify it.  If something needs credits or even removal please let me know that too.

MySQL Scaling

Types of scaling:
1. Scaling down
2. Scaling up
3. Scaling out

The scalability of a database system in its most basic sense is how much the database can handle.  This of course is a measure that is very dependent upon what you are actually doing with the database.  For our discussion, making a database more scalable means increasing its ability to handle the job at hand, whether it be highly transactional or a data warehouse built only for running giant reports, on very large data sets.  Increasing your MySQL databases ability to do its job with these large data sets can be approached in several different ways without resorting to more nodes and machines.  Next we will discuss in general the most common approaches to database scaling.


Methods of scaling your database can be grouped into 3 categories.  1. scaling down 2. scaling up and 3. scaling  out which is becoming very popular as of late.

Scaling Down

Eliminating data from the database.
1. Unused data
2. Old data

Decreasing load on the server
1. Optimizing queries
2. Limiting access


Scaling down is a software solution which involves understanding your goals and current implementation enough to repair code and eliminate unnecessary data to put less of a load on your system.  Smart query benchmarking and design (which we will not discuss in this here but most likely in another post) can drastically decrease the load on your database server in both transactional and data warehouse scenarios.  We have all experienced the poorly written report (often created by a guy using Access and and odbc connection) that brings the new 50k dollar server to its knees for hours.  Eliminating this type of mistake is critical.  Often, simply limiting access can be a game changer.  In some cases, users may be running reports against a production server and competing for resources with customers.  In this case, creation of a data warehouse to offload this function to another machine or simply saying don’t do that between these hours can scale down the load on your system thus increasing its capacity for intended purposes.  Also, limiting the amount of data in your database can provide great performance improvements.  Very often I have found production databases on large systems to be loaded with data that is not being used or too old to be relevant.  Intelligently reviewing what is actually stored in your database relative to its purpose and culling out the unnecessary stuff will go a long way towards providing acceptable performance with your one node system.

Scaling Up

1. Upgrading Storage
2. Upgrading Processing
3. Better Servers
4. Optimize Database Configuration
5. Optimize Database Maintenance


Usually when people use this term they are  talking about a hardware solution.  Often the most cost effective initially, hardware upgrades are a good first approach to scalability.  Buying newer hardware, faster hard drives, more memory and more or better processors will often be the most cost effective way to approach increasing the capacity of your single node database.  Once you have the best hardware you can afford or get approved it is time to turn to software solutions.  Along with hardware upgrades, I also like to consider software upgrades performed on a single node as part of “scaling up”.  Reviewing your database’s configuration and optimizing it can help eliminate bottlenecks in much the same way hardware upgrades do.  Using higher performance maintenance solutions, like the enterprise level tools offered by Oracle and Percona, can also critical to maximizing the scalability of your database.

Scaling Out or Horizontal Scaling

1. Separating data into different databases
2. Placing data on different servers
3. Clustering
4. Sharding
5. Distributed processing
6. Document databases and key-value stores


Once everything that can be done in terms of scaling down or up has been considered or exhausted and it is clear that a one node solution just won’t cut it, it is time to consider a multi-node configuration or scaling out.  Solutions of this type include placing more related data into different databases and servers logically, clustering, sharding, distributed processing solutions like hadoop and spark, and switching to NoSQL data stores like document databases and key value stores.  This is not a decision that should be taken lightly.  Multi node solutions are inherently complex and difficult to use and maintain.  Very often I see people jump to scaling out in situations where scaling down and up are more than enough for current and all expected future capacity.  It makes the contractor’s more money, provides interesting challenges to the team and satisfies the customer by checking off stuff in their bucketlist of buzzwords.  Often, the customer will insist on a multi-node solution with no real understanding of whether or not it is necessary.  While giving these customers what they want can be fun and educational, not to mention profitable, I feel more ethical when given the opportunity to provide the right solution to the problem.  There is definitely a time and place for scaling out, but it can be a very expensive and time consuming mistake for the larger share of projects out there.  For this reason, it is important to completely understand single node scaling before making the decision to go with an expensive, complicated horizontally scaled solution.  That being said, the primary focus of these posts will be the software side of scaling up.  Scaling down, moving lesser used data, splitting data, and eliminating unnecessary data from your database, is a fairly obvious task which doesn’t require much more explanation than has already been given.  Scaling out is a topic later on.  Scaling up your hardware is equally as obvious as scaling down.  Basically, when a resource related bottleneck is located, and you have room to upgrade hardware in such a way as to resolve it, it is usually a good idea to do so.

MySQL Forks

1. Oracle
2. Drizzle
3. MariaDB
4. Percona
5. Webscale
(these are the main ones at least)

MySQL is open-source.  People, usually former MySQL employees, have branched off of the open source MySQL codebase with their own versions, usually offering some degree of compatibility with the original product.  From my experience, MariaDB and PerconaDB are forks that can be considered drop in replacements for MySQL while adding functionality and performance.

MySQL (Oracle)

This is the original MySQL database as well as new development since Oracle’s purchase of the product. There is still a “community” edition however Oracle seems committed to differentiating community(free for non-embeded uses) and enterprise(the one you always pay for) licenses by concentrating its development of advanced features included in the enterprise version. The community edition is just as good if not better than its ever been however do not expect modern “big-data” style enhancements to come from Oracle for free in the future.  Basically, if you are dealing with 20 million rows or less in your largest tables and don’t expect to need advanced features such as clustering or raw backups, the original community edition of this product as Oracle offers it is a fine way to go.  Anything beyond this will require either an investment in Oracle Enterprise technology or consideration of another fork.


Drizzle is open source and was forked from the  6.0 development branch of the MySQL.  It has a client/server architecture and works with standard SQL.  Drizzle is distributed under the the GNU General Public License with certain drivers and such issued under the BSD license.


MariaDB is a fork designed to offer a higher performance drop in replacement for the MySQL product offered by Oracle. I would say it is best know for its advanced query optimizer which really speeds things up on certain types of queries without the user really having to worry about what has changed.  It was forked by some of the original developers of MySQL who were worried about how the Oracle acquisition would turn out.  The plan for Maria is to strive for as high of a level of compatibility as possible including all commands and functions.  While you can use InnoDB with Maria, the default storage engine is Percona’s XtraDB which offers quite a few performance enhancements and fixes for particular types of queries that InnoDB did not seem to like.

Percona Server

Percona Server is another fork aiming for high compatibility with official MySQL releases.  In addition, they have focused on performance enhancements and monitoring tools to aid in administration and query optimization.  Like MariaDB, Percona server uses XtraDB.  Percona generously offers for free lots of scaling, monitoring and backup tools rivalling Oracle’s enterprise level products which are, of course, not free.   My usual product choice as of late for a 1-node product is to install MariaDB with XtraDB tables and using Percona enterprise tools against it when needed for optimization and replication.


WebScaleSQL is a branch developed in a partnership between Facebook, Google, LinkedIn and Twitter for the purpose of handling very large scale data sets horizontally scaled across ridiculous amounts of commodity hardware.   All of these companies being faced with similar challenges were wise to pool their efforts in the development of this branch.   WebScale is also GNU and you can access the source on github.


MySQL Engines

2. InnoDB and XtraDB
3. Memory
4. Merge
5. Archive
6. Federated
7. NDB

MySQL supports several different types of Table Engines also known as “Table Types”. A database can have its tables being a mix of different table engine types or all of the same type.
Most common by far are MyISAM vs InnoDB
MyISAM is older and less sophisticated of the two.  In some ways its actually down right primitive, however, you I still run in to MyISAM tables a lot so it’s good to be familiar with them.
InnoDB improves on MyISAM by offering modern features such as:
1. It is fully transactional
2. Has automated crash recovery using logged transactions which we will discuss later
3. And, most importantly, (at least to me) row level locking. MyISAM is limited to table level locking.


MySQL uses a pluggable storage engine architecture that enables storage engines to be loaded into and unloaded from a running MySQL server.

The default storage engine for MySQL as of 5.5 was InnoDB.  It is ACID compliant which basically means that it is transactionally safe.  It has configurable rollback as well as crash recovery capability.  It functions much better than earlier architectures such as MyISAM in dw and reporting uses due to it’s non-locking reads.  It supports foreign key constraints which is of course important (just don’t over-do them because they are also a pain) and uses clustered indexes speed up queries based on primary keys.

Percona XtraDB is an enhanced version of the InnoDB storage engine, designed to better scale on modern hardware, and including a variety of other features useful in high performance environments. It is completely backwards compatible with InnoDB.  I have yet to run in to an issue in this regard and would swap engines in production without even running a bunch of tests.  It just works.  Like I mentioned before, XtraDB is the default in Percona and Maria.   However, you can still use InnoDB if you want to in Maria (not sure about Percona)

To skip XtraDB in MariaDB add this to my.cnf



# The following should not be needed if you are using a mariadb package:


MyISAM: The default storage engine prior to MySQL 5.5.  If you run in to MyISAM tables, you might want to consider upgrading them.  You can do this a table at a time as you can run MyISAM and InnoDB/XtraDB tables in the same database.  In some cases, MyISAM actually seems to perform better, however this is not usually the case (table level locking yuck) and furthermore, when things go wrong with MyISAM, they can really go wrong.  Minor corruption/failure events which more modern engines handle in stride will send you searching for backups with MyISAM.

Memory Engine: This used to be called the Heap engine.  Basically, as you would expect, it stores everything in RAM similar to something you would write yourself using hashmaps in java.  This can really really speed things up in certain situations, however, limits the amount of data you can deal with in one table quite a bit.

Merge: Allows grouping of identical MyISAM tables so they can be referenced at the same time.  I haven’t really run in to a need for this (some uses in datawarehousing and I guess transactional safety is less of an issue in reporting)

Archive: Designed for large data sets that do not change often.

Federated: Allows for databases spread across many machines to be viewed virtually as one.  I’ve never tried it this way.

NDB (NDBCLUSTER):  Another engine designed to spread a database across multiple machines.

Understanding MySQL Performance Bottlenecks and Logs

1. Hardware
2. Software
a. Configuration
b. Tools and Procedures

Informational Logs
1. Slow query log
2. General log
3. Error log

In deciding the best approach to scaling your system, one of the most important concepts is that of bottlenecks.  Bottlenecks are the limitations of your current system that have been reached first and prevent the maximization of other resources as the load on your system increases.  So, obviously, the identification of bottlenecks is critical to knowing what improvements, if any, will allow your system to perform faster and handle more data.  In this section we will discuss how to analyse your system to identify these bottlenecks and the types of solutions that certain bottlenecks suggest to us.

Informational Logs

MySQL has a few different logs that can be utilized to identify bottlenecks and monitor server status.  Each of these is enabled and managed through the MySQL configuration file, my.cnf.  This file in linux will most likely be located in /etc/mysql or just /etc.  On windows, look for my.ini rather in the windows directory, c:, or the MySQL instalation directory.  MySQL actually looks for the file in this order so make sure you modify the one that it finds first.  The first type of log we will discuss is the slow query log.  To enable this log in your my.cnf file add this in the [mysqld] section.log_slow_queries = 1;

slow_query_log_file = <path to slow log>/mysql-slow.log

Additionally you can specify long_query_time=<some number of seconds> to tell it how long a query has to run for you to consider it slow.  Other options are available in the mysql documentation.  One of the most helpful is log-queries-not-using-indexes.  Monitor this log over a period of time to identify indexes that need to be added and queries being submitted that need to be optimized.  MySQL is fast so long as the amount of data being operated upon at any given time stays within memory.  Once swapping to disk begins, it’s performance declines dramatically.  The size of your available ram in relation to the amount of data being processed is a common source of bottlenecks.  Using this log to identify the queries that are taking a lot of time will give you hints as to whether this type of paging to disk is going on.  Increasing ram and adjusting queries to minimize paging will allow you to process much larger tables in a reasonable amount of time.

The general log is enabled by adding general_log_file=<path to general log/mysql.log or whatever file name you wish.  This contains information about all queries that have been run including selects and who, that is, which database user, attempted to execute them.  Whatever directory you chose for your logs, it is usually a good practice to put them all in the same place make sure that mysql has the correct permissions necessary to write to that directory.  Otherwise, it is not critical to name or place these files in any particular manner.  Just choose a strategy that works for you.

Finally, you can log errors with the error log by adding log-error=<path to error log>/mysql-error.log  This, as can be expected, logs errors and if configured to, warnings generated by the database.  Otherwise, errors will be logged to /var/syslog in linux.  The error log helps in diagnosing startup and shutdown issues as well as failures and warnings that impact performance and stability.  When dealing with large tables, disk space limitations and slow queries due to memory issues, failures resulting in logged errors will be common until your system is properly tuned.

Use these logs to identify the source of trouble and once issues have been resolved and everything is running smoothly, you can turn them off if you like.  After all, logging is an expensive operation too.  With MySQL, disk space issues related to operational and informational logs are more common than those due to actual data storage.  Only enable logging that you need. Due to the wide range of errors and warnings that can be logged in these files, I will not go into detail regarding the interpretation of any particular one.  When looking for the cause of an error or performance problem, check the logs and search on google for more information.  MySQL is open source and the internet will have answers to most any problem you encounter.


MySQL Configuration (the my. file)

my.cnf (linux)
or my.ini (windows)
/etc/mysql or /etc linux
c:/windows, just c: or mysql installation directory windows

innodb buffer_pool_instances
innodb_buffer_pool_dump_at_shutdown innodb_buffer_pool_load_at_startup

These are the settings that matter most when configuring your InnoDB/XtraDB engine.  There are a lot more available but most not worth mentioning as defaults are usually just fine.  This was all sourced directly from Muhammad Irfan’s blog post and as there would be little value in rewriting the stuff in my own words, I have just copied them here.

innodb_buffer_pool_size: InnoDB relies heavily on the buffer pool and should be set correctly, so be sure to allocate enough memory to it. Typically a good value is 70%-80% of available memory. More precisely, if you have RAM bigger than your dataset setting it bit larger should be appropriate with that keep in account of your database growth and re-adjust innodb buffer pool size accordingly. Further, there is improvement in code for InnoDB buffer scalability if you are using Percona Server 5.1 or Percona Server 5.5 You can read more about it here.

innodb_buffer_pool_instances: Multiple innodb buffer pools introduced in InnoDB 1.1 and MySQL 5.5. In MySQL 5.5 the default value for it was 1 which is changed to 8 as new default value in MySQL 5.6. Minimum innodb_buffer_pool_instances should be lie between 1 (minimum) & 64 (maximum). Enabling innodb_buffer_pool_instances is useful in highly concurrent workload as it may reduce contention of the global mutexes.

Dump/Restore Buffer Pool: This feature speed up restarts by saving and restoring the contents of the buffer pool. This feature is first introduced in Percona Server 5.5 you can read about it here. Also Vadim benchmark this feature You can read more about it in this post. Oracle MySQL also introduced it in version 5.6, To automatically dump the database at startup and shutdown set innodb_buffer_pool_dump_at_shutdown & innodb_buffer_pool_load_at_startup parameters to ON.

innodb_log_file_size: Large enough InnoDB transaction logs are crucial for good, stable write performance. But also larger log files means that recovery process will slower in case of crash. However this is not such big issue since great improvements in 5.5. Default value has been changed in MySQL 5.6 to 50 MB from 5 MB (old default), but it’s still too small size for many workloads. Also, in MySQL 5.6, if innodb_log_file_size is changed between restarts then MySQL will automatically resize the logs to match the new desired size during the startup process. Combined log file size is increased to almost 512 GB in MySQL 5.6 from 4 GB. To get the optimal logfile size please check this blog post.

innodb_log_buffer_size: Innodb writes changed data record into lt’s log buffer, which kept in memory and it saves disk I/O for large transactions as it not need to write the log of changes to disk before transaction commit. 4 MB – 8 MB is good start unless you write a lot of huge blobs.

innodb_flush_log_at_trx_commit: When innodb_flush_log_at_trx_commit is set to 1 the log buffer is flushed on every transaction commit to the log file on disk and provides maximum data integrity but it also has performance impact. Setting it to 2 means log buffer is flushed to OS file cache on every transaction commit. The implication of 2 is optimal and improve performance if you are not concerning ACID and can lose transactions for last second or two in case of OS crashes.

innodb_thread_concurrency: With improvements to the InnoDB engine, it is recommended to allow the engine to control the concurrency by keeping it to default value (which is zero). If you see concurrency issues, you can tune this variable. A recommended value is 2 times the number of CPUs plus the number of disks. It’s dynamic variable means it can set without restarting MySQL server.

innodb_flush_method: DIRECT_IO relieves I/O pressure. Direct I/O is not cached, If it set to O_DIRECT avoids double buffering with buffer pool and filesystem cache. Given that you have hardware RAID controller and battery-backed write cache.

innodb_file_per_table: innodb_file_per_table is ON by default from MySQL 5.6. This is usually recommended as it avoids having a huge shared tablespace and as it allows you to reclaim space when you drop or truncate a table. Separate tablespace also benefits for Xtrabackup partial backup scheme.

Oh by the way…

Don’t cache in linux


vm.swappiness = 1-5

MySQL Optimization Tools my.cnf tuning


Percona Configuration Wizard

Perhaps the most commonly used my.cnf tuning tool available is MySQLTuner.  This program consists of a simple perl script.  If your server is not currently set up to run perl then you will need to do this.  It can be downloaded or in the case of Ubuntu server, installed directly using apt-get.  Make sure your server has been up and used as it typically is for at least 24 hours.  This use data is important to the script.  Upon running this simple program, a report will be generated suggesting my.cnf configuration changes.  Use this report with caution.  While it will effectively identify some problems and will take care of the capacity estimations you might normally calculate by hand, it’s advice could potentially degrade performance in some cases.  Make sure that you understand the changes you are making regardless of whether or not they have been suggested by an optimization tool.

Another interesting tool is the Percona configuration wizard.  This online tool generates a fairly optimized my.cnf file for your database based on your answers to a sequence of questions.  It could not get any easier than that 🙂