Finally after going to bed really early all week I woke up this morning at a decent hour without the alarm.  For the past 6 days I’ve laid down by 9 each night.  For the first few, I woke up a lot, only getting back to sleep with the help of a few programming books.  Regardless of how early I crashed, I could not get myself up until around 9 each day. This night however, I slept through solidly until 7 am and awoke feeling ready to go.  While not all of it was sleep, I seem to have wasted at least 12 hours a day all week trying.  I am sure that I need more sleep than I used to but am hoping it will be no more than 6-7 hours a day as who wants to spend their life in bed?  I shutter to think that I may have become one of those people who simply don’t function right without 10 hours unconscious a day.  What a waste of time right?  Regardless of what my personal, optimal sleep amount is I plan to shorten it with the help of naps.  Tim Ferris discussed this in his book “the four hour body” where he had managed to reduce his required sleep to just 2 hours a day broken up into 6 evenly placed 20 minute naps spaced evenly 4 hours apart.  Everything about this was critical, the amount and the spacing.  While I do not have his flexibility or discipline, I believe I can still benefit from his research.  I plan to take 2 naps a day ongoing.   I estimate Tim was able to reduce his total required sleep approximately 75% with his new schedule.  As you cannot go without sleep, I am assuming some sort of diminishing returns curve which crosses at a total sleep time line at some minimally optimal point, in his case 2 hours.  Making a lot of assumptions about the shape of these curves, and assuming the worst about my required sleep (10 hours a night) I conservatively am planning to reduce my required sleep by 4 house total with 2 evenly spaced 1 hour naps a day.  My schedule will be roughly as follows.

  • Wake at 6am
  • Nap from 1pm to 2pm
  • Nap from 6pm to 7pm
  • Sleep at 2am
  • Repeat

I figure that lunch time will be probably my best shot at a nap during the work week and the evening nap is early enough to not interfere with band practice.  As afternoons are my least productive time of day, the 1pm nap should get me back on track.  I normally waste this time anyway so might as well be asleep.  The 6pm nap should allow me to stay up till 2am so as to get in a productive evening push at the end of every day.  As a sleep aid, I will continue to use programming textbooks.   I am leaving an hour for each nap rather than the 20 minutes Tim slept as I just do not see myself falling asleep as quickly.  If I do, I will try and shorten these as well.  I will stick to the schedule and deprive myself of sleep if necessary until I adapt.  In the end, I expect to be much more productive in the afternoon and evening than I am currently and will gain 4-6 hours of total time awake each day.  With the increased productivity I expect a net gain of at least 10 hours of productivity a day as compared with my current, horribly inefficient schedule.   I will have 3 “mornings” a day and morning has clearly become my best time for getting things done.  Detailing it out roughly, this 10 hours will come from having an additional 3 hours of morning time, 4 hours regained in the afternoon that are normally wasted and 4 house in the evening programming before I go to bed which I’ve not been good about in the last year.  I will still be spending 6 hours in bed total which for many people is standard.  Realistically, this would be impossible for me if lumped in to say the 12-6am range every day.  Broken in to 3 sections, however, it just may be enough.  I am cautiously optimistic regarding both the sleep time and the productivity gains.  The whole plan is open to adjustment of course.  I may even find it possible to try 3 naps a day depending upon my schedule.  Regardless, if this works even in part, it will be nothing short of life changing.  For more information on Tim’s self experimentation and findings check these out:

Relax Like A Pro: 5 Steps to Hacking Your Sleep
The Secrets of a Fantastic Night’s Rest

Books by self help author Hung Pham


I just finished reading these 3 books by self help author Hung Pham.  I found each to be well worth the time and money (3-4$ kindle).  His bio states “Life is short, do stuff that matters.”  This is something most people agree with and yet still find themselves at a loss as to how to move forward in their own lives.  Pham’s advice is short, to the point, and easy to follow.  His focus is on “actionable steps” rather than broad philosophical change.  I would imagine most anyone could find something useful in each of these easy reads and really encourage you to check them out.  I left a review on Amazon and rather than write it all over, I will post a copy of it here.

“A search for self help in books on Amazon yields over 400k results. There are millions of people searching for answers and thousands upon thousands responding to and profiting from this demand. I myself, among those millions, have read more than my share of approaches to self help. Through this, over time, I have graduated from envy of those so blessed as to be so in touch with the answers to life’s biggest questions as to be able to coach others, to an almost bitter scepticism of the whole concept of self improvement. I believe my disdain for the whole industry peaked a few weeks ago when I was finally talked into reading and watching “The Secret”. These people had helped themselves to millions from the pockets of lost, despairing individuals who had reached such a point of hopelessness as to buy in to a solution that could perhaps best be described as magic? Rather than simply give up at this point, I doubled down deciding to once and for all figure things out with or without the help of the self helpers. It was at this time I stumbled across Hung’s books. I immediately identified with his journey not because it was crafted so as to be easily identified with but because, in it’s simple transparency and honest personal approach, it truly mirrors the life experience of so many of us in today’s age. Just pages into this book I realized this was not going to be another collection of life stories in which belief in magic or some type of quantum theory that could only have been developed in complete isolation from actual science and math had transformed others into shiny happy folk. His refreshingly pragmatic approach is hopeful and inspiring while being simultaneously logically undeniable. You just know what he is saying will work because it makes so much sense. in fact, much of what he addresses, I already knew and believed. My problem, as with most people still searching, is, for whatever reason, a lack of action. Faced with such truths its no wonder so many people turn to quackery and figure the lottery into their retirement. Hung recognizes all of this. He has been through it himself, solving his own problems in a logical, proactive manner. Rather than the hundreds of pages of repetitive fluff that mark the style of most self help writers, every line seems to have a purpose, to be necessary. In direct, efficient prose he details actionable steps which, if followed, could only lead to the promised land. Where I had a question, he was there with a clear, specific answer. Where I felt doubt, he responded with reassurance couched in logic. His approach is not exhaustively exhaustive but definitely sufficient and complete. One cannot help but be convinced that the author sincerely desires to help others achieve success and meaning in their own lives. Read his books, and before moving on to another writer, read them a few more times.”


Distributed Processing Test Server

My SUN server is back up.  It is an x4600 m2 with 8 – 4 core opterons and 64 gig of ram.  Just 2 146 gig 10k SAS drives so will be adding more local storage soon.  Last week I played around with doing a bare metal install of openstack.  After running into a few annoying problems with the UAR i have finally decided to just use virtualbox on top of Solaris 11.3 for distributed processing tests.  Probably not near the performance of a bare metal install but should be plenty adequate to test systems before deploying into production elsewhere.  These machines are an incredible bargain nowadays.  You can grab one off ebay for under 500$ all day long.  The SUN software repository is gone now.  If you want to run linux native you will have to scam the appropriate drivers somewhere else or purchase an oracle support contract.  Seems kinda lame that they want to charge the new owner for this.  A paid support contract makes sense when you are buying hardware like this new at 50k+ a pop.  Not so much when grabbing off of ebay for the price of a crappy laptop.  Fortunately, this doesn’t force the thing into retirement as Solaris installs just fine.  I will typically set up 6 virtualboxes running Ubuntu 14.04 lts each with 8 gigs to play with.  This is perfect for testing clustered projects (Galera, SPARK, Scattersphere).  While it won’t keep up with my i-7 desktop for single core, it is amazingly quick even running virtuals for problems distributed correctly across it’s resources.  Its awesome having something like this to work with at home.  Would have made a hell of a game server back in the day.  Additionally I might add some NAS and host me some opensim.  The little server on top running FreeNAS will do the trick.







In 1945, Isaac Asimov wrote Escape!, a short story featuring a character known as “The Brain”, a self-aware supercomputer.  Such machines have played a prominent role in science fiction ever since.  At the same time, real computer technology has advanced at an astounding pace, often surpassing even the wildest speculation of anterior futurists.   While some areas of artificial intelligence technology have shown great progress,  others, including many critical to the development of sentient silicon, have lagged.  Even today, whether or not a machine will ever possess the sense of self-awareness shown by an average house cat is a matter of serious debate.  It may never happen, or, it may one day be man’s greatest accomplishment.

Really it has been kinda disappointing watching the progress of this technology.  Only now do things really seem to be taking off.  As a little kid, i created an Applesoft basic game of pick-up sticks using lo-res graphics and a simple AI.  In college philosophy as an undergrad I wrote papers on theories about free-will and their implications for the development of artificial personalities.  Fresh out of school I started looking into chatterbots and the turing test.  I was fond of checking out every bot I could find, spending lots of time on the Simon Laven page and pondering the problem of context.  After another 15 years of programming, java, big data, medical software, etc. I have come back to the problem with a few ideas at a point where available hardware and software might finally be up to the task.  Or at least some of the task.  Most significantly, concepts such as horizontal scaling, neural networks, contextual thesauruses, etc may finally allow computers to replace humans in many decision making roles and create the appearance of self awareness, consciousness or even fondness.  Will  this consciousness be real or just appear so?  Does it matter?  After all, are we not as people judged entirely by what can be perceived through observation?  Is the unobservable quality we are looking for really the soul/magic?

MySQL Performance and Scalability Notes Introduction

The MySQL category of this blog consists of a collection of my notes on MySQL Administration for Performance and Scalability.
I am starting with information related to single node systems. Some of the things I intend to address include:
1. Basic scaling concepts
2. Various MySQL forks and engines
3. How to use logs and tools to understand your database’s performance
4. The configuration of your single node MySQL database for optimal performance
5. Raw and logical backups
6. Disaster recovery procedures and considerations
7. The generation of sample data and the de-identification of data for testing and demo purposes.
My goal here is to present effective MySQL based solutions to the problems that arise when working with extremely large data sets. I will discuss how to eliminate bottlenecks and tune for maximum performance and scalability in single node environments. In addition, I intend to cover preparing for and dealing with failures including high performance methods of moving and recovering very big databases.  I will note that I’m not much of a web developer but hopefully this information, while not very artfully delivered, will be of use to someone.  If you are new to MySQL or databases in general, stop reading now!  There are much better ways to spend your weekends.  Oh, and keep in mind that these are basically just notes.  I was not writing a book but just compiling a big list of helpful stuff for myself and am now sharing it.  As such, I have plagiarized liberally using copy/paste and done very little in the way of proofreading.  If something is technically wrong or debatable, please point it out and I will fix or qualify it.  If something needs credits or even removal please let me know that too.

MySQL Scaling

Types of scaling:
1. Scaling down
2. Scaling up
3. Scaling out

The scalability of a database system in its most basic sense is how much the database can handle.  This of course is a measure that is very dependent upon what you are actually doing with the database.  For our discussion, making a database more scalable means increasing its ability to handle the job at hand, whether it be highly transactional or a data warehouse built only for running giant reports, on very large data sets.  Increasing your MySQL databases ability to do its job with these large data sets can be approached in several different ways without resorting to more nodes and machines.  Next we will discuss in general the most common approaches to database scaling.


Methods of scaling your database can be grouped into 3 categories.  1. scaling down 2. scaling up and 3. scaling  out which is becoming very popular as of late.

Scaling Down

Eliminating data from the database.
1. Unused data
2. Old data

Decreasing load on the server
1. Optimizing queries
2. Limiting access


Scaling down is a software solution which involves understanding your goals and current implementation enough to repair code and eliminate unnecessary data to put less of a load on your system.  Smart query benchmarking and design (which we will not discuss in this here but most likely in another post) can drastically decrease the load on your database server in both transactional and data warehouse scenarios.  We have all experienced the poorly written report (often created by a guy using Access and and odbc connection) that brings the new 50k dollar server to its knees for hours.  Eliminating this type of mistake is critical.  Often, simply limiting access can be a game changer.  In some cases, users may be running reports against a production server and competing for resources with customers.  In this case, creation of a data warehouse to offload this function to another machine or simply saying don’t do that between these hours can scale down the load on your system thus increasing its capacity for intended purposes.  Also, limiting the amount of data in your database can provide great performance improvements.  Very often I have found production databases on large systems to be loaded with data that is not being used or too old to be relevant.  Intelligently reviewing what is actually stored in your database relative to its purpose and culling out the unnecessary stuff will go a long way towards providing acceptable performance with your one node system.

Scaling Up

1. Upgrading Storage
2. Upgrading Processing
3. Better Servers
4. Optimize Database Configuration
5. Optimize Database Maintenance


Usually when people use this term they are  talking about a hardware solution.  Often the most cost effective initially, hardware upgrades are a good first approach to scalability.  Buying newer hardware, faster hard drives, more memory and more or better processors will often be the most cost effective way to approach increasing the capacity of your single node database.  Once you have the best hardware you can afford or get approved it is time to turn to software solutions.  Along with hardware upgrades, I also like to consider software upgrades performed on a single node as part of “scaling up”.  Reviewing your database’s configuration and optimizing it can help eliminate bottlenecks in much the same way hardware upgrades do.  Using higher performance maintenance solutions, like the enterprise level tools offered by Oracle and Percona, can also critical to maximizing the scalability of your database.

Scaling Out or Horizontal Scaling

1. Separating data into different databases
2. Placing data on different servers
3. Clustering
4. Sharding
5. Distributed processing
6. Document databases and key-value stores


Once everything that can be done in terms of scaling down or up has been considered or exhausted and it is clear that a one node solution just won’t cut it, it is time to consider a multi-node configuration or scaling out.  Solutions of this type include placing more related data into different databases and servers logically, clustering, sharding, distributed processing solutions like hadoop and spark, and switching to NoSQL data stores like document databases and key value stores.  This is not a decision that should be taken lightly.  Multi node solutions are inherently complex and difficult to use and maintain.  Very often I see people jump to scaling out in situations where scaling down and up are more than enough for current and all expected future capacity.  It makes the contractor’s more money, provides interesting challenges to the team and satisfies the customer by checking off stuff in their bucketlist of buzzwords.  Often, the customer will insist on a multi-node solution with no real understanding of whether or not it is necessary.  While giving these customers what they want can be fun and educational, not to mention profitable, I feel more ethical when given the opportunity to provide the right solution to the problem.  There is definitely a time and place for scaling out, but it can be a very expensive and time consuming mistake for the larger share of projects out there.  For this reason, it is important to completely understand single node scaling before making the decision to go with an expensive, complicated horizontally scaled solution.  That being said, the primary focus of these posts will be the software side of scaling up.  Scaling down, moving lesser used data, splitting data, and eliminating unnecessary data from your database, is a fairly obvious task which doesn’t require much more explanation than has already been given.  Scaling out is a topic later on.  Scaling up your hardware is equally as obvious as scaling down.  Basically, when a resource related bottleneck is located, and you have room to upgrade hardware in such a way as to resolve it, it is usually a good idea to do so.

MySQL Forks

1. Oracle
2. Drizzle
3. MariaDB
4. Percona
5. Webscale
(these are the main ones at least)

MySQL is open-source.  People, usually former MySQL employees, have branched off of the open source MySQL codebase with their own versions, usually offering some degree of compatibility with the original product.  From my experience, MariaDB and PerconaDB are forks that can be considered drop in replacements for MySQL while adding functionality and performance.

MySQL (Oracle)

This is the original MySQL database as well as new development since Oracle’s purchase of the product. There is still a “community” edition however Oracle seems committed to differentiating community(free for non-embeded uses) and enterprise(the one you always pay for) licenses by concentrating its development of advanced features included in the enterprise version. The community edition is just as good if not better than its ever been however do not expect modern “big-data” style enhancements to come from Oracle for free in the future.  Basically, if you are dealing with 20 million rows or less in your largest tables and don’t expect to need advanced features such as clustering or raw backups, the original community edition of this product as Oracle offers it is a fine way to go.  Anything beyond this will require either an investment in Oracle Enterprise technology or consideration of another fork.


Drizzle is open source and was forked from the  6.0 development branch of the MySQL.  It has a client/server architecture and works with standard SQL.  Drizzle is distributed under the the GNU General Public License with certain drivers and such issued under the BSD license.


MariaDB is a fork designed to offer a higher performance drop in replacement for the MySQL product offered by Oracle. I would say it is best know for its advanced query optimizer which really speeds things up on certain types of queries without the user really having to worry about what has changed.  It was forked by some of the original developers of MySQL who were worried about how the Oracle acquisition would turn out.  The plan for Maria is to strive for as high of a level of compatibility as possible including all commands and functions.  While you can use InnoDB with Maria, the default storage engine is Percona’s XtraDB which offers quite a few performance enhancements and fixes for particular types of queries that InnoDB did not seem to like.

Percona Server

Percona Server is another fork aiming for high compatibility with official MySQL releases.  In addition, they have focused on performance enhancements and monitoring tools to aid in administration and query optimization.  Like MariaDB, Percona server uses XtraDB.  Percona generously offers for free lots of scaling, monitoring and backup tools rivalling Oracle’s enterprise level products which are, of course, not free.   My usual product choice as of late for a 1-node product is to install MariaDB with XtraDB tables and using Percona enterprise tools against it when needed for optimization and replication.


WebScaleSQL is a branch developed in a partnership between Facebook, Google, LinkedIn and Twitter for the purpose of handling very large scale data sets horizontally scaled across ridiculous amounts of commodity hardware.   All of these companies being faced with similar challenges were wise to pool their efforts in the development of this branch.   WebScale is also GNU and you can access the source on github.


MySQL Engines

2. InnoDB and XtraDB
3. Memory
4. Merge
5. Archive
6. Federated
7. NDB

MySQL supports several different types of Table Engines also known as “Table Types”. A database can have its tables being a mix of different table engine types or all of the same type.
Most common by far are MyISAM vs InnoDB
MyISAM is older and less sophisticated of the two.  In some ways its actually down right primitive, however, you I still run in to MyISAM tables a lot so it’s good to be familiar with them.
InnoDB improves on MyISAM by offering modern features such as:
1. It is fully transactional
2. Has automated crash recovery using logged transactions which we will discuss later
3. And, most importantly, (at least to me) row level locking. MyISAM is limited to table level locking.


MySQL uses a pluggable storage engine architecture that enables storage engines to be loaded into and unloaded from a running MySQL server.

The default storage engine for MySQL as of 5.5 was InnoDB.  It is ACID compliant which basically means that it is transactionally safe.  It has configurable rollback as well as crash recovery capability.  It functions much better than earlier architectures such as MyISAM in dw and reporting uses due to it’s non-locking reads.  It supports foreign key constraints which is of course important (just don’t over-do them because they are also a pain) and uses clustered indexes speed up queries based on primary keys.

Percona XtraDB is an enhanced version of the InnoDB storage engine, designed to better scale on modern hardware, and including a variety of other features useful in high performance environments. It is completely backwards compatible with InnoDB.  I have yet to run in to an issue in this regard and would swap engines in production without even running a bunch of tests.  It just works.  Like I mentioned before, XtraDB is the default in Percona and Maria.   However, you can still use InnoDB if you want to in Maria (not sure about Percona)

To skip XtraDB in MariaDB add this to my.cnf



# The following should not be needed if you are using a mariadb package:


MyISAM: The default storage engine prior to MySQL 5.5.  If you run in to MyISAM tables, you might want to consider upgrading them.  You can do this a table at a time as you can run MyISAM and InnoDB/XtraDB tables in the same database.  In some cases, MyISAM actually seems to perform better, however this is not usually the case (table level locking yuck) and furthermore, when things go wrong with MyISAM, they can really go wrong.  Minor corruption/failure events which more modern engines handle in stride will send you searching for backups with MyISAM.

Memory Engine: This used to be called the Heap engine.  Basically, as you would expect, it stores everything in RAM similar to something you would write yourself using hashmaps in java.  This can really really speed things up in certain situations, however, limits the amount of data you can deal with in one table quite a bit.

Merge: Allows grouping of identical MyISAM tables so they can be referenced at the same time.  I haven’t really run in to a need for this (some uses in datawarehousing and I guess transactional safety is less of an issue in reporting)

Archive: Designed for large data sets that do not change often.

Federated: Allows for databases spread across many machines to be viewed virtually as one.  I’ve never tried it this way.

NDB (NDBCLUSTER):  Another engine designed to spread a database across multiple machines.

Understanding MySQL Performance Bottlenecks and Logs

1. Hardware
2. Software
a. Configuration
b. Tools and Procedures

Informational Logs
1. Slow query log
2. General log
3. Error log

In deciding the best approach to scaling your system, one of the most important concepts is that of bottlenecks.  Bottlenecks are the limitations of your current system that have been reached first and prevent the maximization of other resources as the load on your system increases.  So, obviously, the identification of bottlenecks is critical to knowing what improvements, if any, will allow your system to perform faster and handle more data.  In this section we will discuss how to analyse your system to identify these bottlenecks and the types of solutions that certain bottlenecks suggest to us.

Informational Logs

MySQL has a few different logs that can be utilized to identify bottlenecks and monitor server status.  Each of these is enabled and managed through the MySQL configuration file, my.cnf.  This file in linux will most likely be located in /etc/mysql or just /etc.  On windows, look for my.ini rather in the windows directory, c:, or the MySQL instalation directory.  MySQL actually looks for the file in this order so make sure you modify the one that it finds first.  The first type of log we will discuss is the slow query log.  To enable this log in your my.cnf file add this in the [mysqld] section.log_slow_queries = 1;

slow_query_log_file = <path to slow log>/mysql-slow.log

Additionally you can specify long_query_time=<some number of seconds> to tell it how long a query has to run for you to consider it slow.  Other options are available in the mysql documentation.  One of the most helpful is log-queries-not-using-indexes.  Monitor this log over a period of time to identify indexes that need to be added and queries being submitted that need to be optimized.  MySQL is fast so long as the amount of data being operated upon at any given time stays within memory.  Once swapping to disk begins, it’s performance declines dramatically.  The size of your available ram in relation to the amount of data being processed is a common source of bottlenecks.  Using this log to identify the queries that are taking a lot of time will give you hints as to whether this type of paging to disk is going on.  Increasing ram and adjusting queries to minimize paging will allow you to process much larger tables in a reasonable amount of time.

The general log is enabled by adding general_log_file=<path to general log/mysql.log or whatever file name you wish.  This contains information about all queries that have been run including selects and who, that is, which database user, attempted to execute them.  Whatever directory you chose for your logs, it is usually a good practice to put them all in the same place make sure that mysql has the correct permissions necessary to write to that directory.  Otherwise, it is not critical to name or place these files in any particular manner.  Just choose a strategy that works for you.

Finally, you can log errors with the error log by adding log-error=<path to error log>/mysql-error.log  This, as can be expected, logs errors and if configured to, warnings generated by the database.  Otherwise, errors will be logged to /var/syslog in linux.  The error log helps in diagnosing startup and shutdown issues as well as failures and warnings that impact performance and stability.  When dealing with large tables, disk space limitations and slow queries due to memory issues, failures resulting in logged errors will be common until your system is properly tuned.

Use these logs to identify the source of trouble and once issues have been resolved and everything is running smoothly, you can turn them off if you like.  After all, logging is an expensive operation too.  With MySQL, disk space issues related to operational and informational logs are more common than those due to actual data storage.  Only enable logging that you need. Due to the wide range of errors and warnings that can be logged in these files, I will not go into detail regarding the interpretation of any particular one.  When looking for the cause of an error or performance problem, check the logs and search on google for more information.  MySQL is open source and the internet will have answers to most any problem you encounter.


MySQL Configuration (the my. file)

my.cnf (linux)
or my.ini (windows)
/etc/mysql or /etc linux
c:/windows, just c: or mysql installation directory windows

innodb buffer_pool_instances
innodb_buffer_pool_dump_at_shutdown innodb_buffer_pool_load_at_startup

These are the settings that matter most when configuring your InnoDB/XtraDB engine.  There are a lot more available but most not worth mentioning as defaults are usually just fine.  This was all sourced directly from Muhammad Irfan’s blog post and as there would be little value in rewriting the stuff in my own words, I have just copied them here.

innodb_buffer_pool_size: InnoDB relies heavily on the buffer pool and should be set correctly, so be sure to allocate enough memory to it. Typically a good value is 70%-80% of available memory. More precisely, if you have RAM bigger than your dataset setting it bit larger should be appropriate with that keep in account of your database growth and re-adjust innodb buffer pool size accordingly. Further, there is improvement in code for InnoDB buffer scalability if you are using Percona Server 5.1 or Percona Server 5.5 You can read more about it here.

innodb_buffer_pool_instances: Multiple innodb buffer pools introduced in InnoDB 1.1 and MySQL 5.5. In MySQL 5.5 the default value for it was 1 which is changed to 8 as new default value in MySQL 5.6. Minimum innodb_buffer_pool_instances should be lie between 1 (minimum) & 64 (maximum). Enabling innodb_buffer_pool_instances is useful in highly concurrent workload as it may reduce contention of the global mutexes.

Dump/Restore Buffer Pool: This feature speed up restarts by saving and restoring the contents of the buffer pool. This feature is first introduced in Percona Server 5.5 you can read about it here. Also Vadim benchmark this feature You can read more about it in this post. Oracle MySQL also introduced it in version 5.6, To automatically dump the database at startup and shutdown set innodb_buffer_pool_dump_at_shutdown & innodb_buffer_pool_load_at_startup parameters to ON.

innodb_log_file_size: Large enough InnoDB transaction logs are crucial for good, stable write performance. But also larger log files means that recovery process will slower in case of crash. However this is not such big issue since great improvements in 5.5. Default value has been changed in MySQL 5.6 to 50 MB from 5 MB (old default), but it’s still too small size for many workloads. Also, in MySQL 5.6, if innodb_log_file_size is changed between restarts then MySQL will automatically resize the logs to match the new desired size during the startup process. Combined log file size is increased to almost 512 GB in MySQL 5.6 from 4 GB. To get the optimal logfile size please check this blog post.

innodb_log_buffer_size: Innodb writes changed data record into lt’s log buffer, which kept in memory and it saves disk I/O for large transactions as it not need to write the log of changes to disk before transaction commit. 4 MB – 8 MB is good start unless you write a lot of huge blobs.

innodb_flush_log_at_trx_commit: When innodb_flush_log_at_trx_commit is set to 1 the log buffer is flushed on every transaction commit to the log file on disk and provides maximum data integrity but it also has performance impact. Setting it to 2 means log buffer is flushed to OS file cache on every transaction commit. The implication of 2 is optimal and improve performance if you are not concerning ACID and can lose transactions for last second or two in case of OS crashes.

innodb_thread_concurrency: With improvements to the InnoDB engine, it is recommended to allow the engine to control the concurrency by keeping it to default value (which is zero). If you see concurrency issues, you can tune this variable. A recommended value is 2 times the number of CPUs plus the number of disks. It’s dynamic variable means it can set without restarting MySQL server.

innodb_flush_method: DIRECT_IO relieves I/O pressure. Direct I/O is not cached, If it set to O_DIRECT avoids double buffering with buffer pool and filesystem cache. Given that you have hardware RAID controller and battery-backed write cache.

innodb_file_per_table: innodb_file_per_table is ON by default from MySQL 5.6. This is usually recommended as it avoids having a huge shared tablespace and as it allows you to reclaim space when you drop or truncate a table. Separate tablespace also benefits for Xtrabackup partial backup scheme.

Oh by the way…

Don’t cache in linux


vm.swappiness = 1-5