Types of scaling:
1. Scaling down
2. Scaling up
3. Scaling out
The scalability of a database system in its most basic sense is how much the database can handle. This of course is a measure that is very dependent upon what you are actually doing with the database. For our discussion, making a database more scalable means increasing its ability to handle the job at hand, whether it be highly transactional or a data warehouse built only for running giant reports, on very large data sets. Increasing your MySQL databases ability to do its job with these large data sets can be approached in several different ways without resorting to more nodes and machines. Next we will discuss in general the most common approaches to database scaling.
Methods of scaling your database can be grouped into 3 categories. 1. scaling down 2. scaling up and 3. scaling out which is becoming very popular as of late.
Eliminating data from the database.
1. Unused data
2. Old data
Decreasing load on the server
1. Optimizing queries
2. Limiting access
Scaling down is a software solution which involves understanding your goals and current implementation enough to repair code and eliminate unnecessary data to put less of a load on your system. Smart query benchmarking and design (which we will not discuss in this here but most likely in another post) can drastically decrease the load on your database server in both transactional and data warehouse scenarios. We have all experienced the poorly written report (often created by a guy using Access and and odbc connection) that brings the new 50k dollar server to its knees for hours. Eliminating this type of mistake is critical. Often, simply limiting access can be a game changer. In some cases, users may be running reports against a production server and competing for resources with customers. In this case, creation of a data warehouse to offload this function to another machine or simply saying don’t do that between these hours can scale down the load on your system thus increasing its capacity for intended purposes. Also, limiting the amount of data in your database can provide great performance improvements. Very often I have found production databases on large systems to be loaded with data that is not being used or too old to be relevant. Intelligently reviewing what is actually stored in your database relative to its purpose and culling out the unnecessary stuff will go a long way towards providing acceptable performance with your one node system.
1. Upgrading Storage
2. Upgrading Processing
3. Better Servers
4. Optimize Database Configuration
5. Optimize Database Maintenance
Usually when people use this term they are talking about a hardware solution. Often the most cost effective initially, hardware upgrades are a good first approach to scalability. Buying newer hardware, faster hard drives, more memory and more or better processors will often be the most cost effective way to approach increasing the capacity of your single node database. Once you have the best hardware you can afford or get approved it is time to turn to software solutions. Along with hardware upgrades, I also like to consider software upgrades performed on a single node as part of “scaling up”. Reviewing your database’s configuration and optimizing it can help eliminate bottlenecks in much the same way hardware upgrades do. Using higher performance maintenance solutions, like the enterprise level tools offered by Oracle and Percona, can also critical to maximizing the scalability of your database.
Scaling Out or Horizontal Scaling
1. Separating data into different databases
2. Placing data on different servers
5. Distributed processing
6. Document databases and key-value stores
Once everything that can be done in terms of scaling down or up has been considered or exhausted and it is clear that a one node solution just won’t cut it, it is time to consider a multi-node configuration or scaling out. Solutions of this type include placing more related data into different databases and servers logically, clustering, sharding, distributed processing solutions like hadoop and spark, and switching to NoSQL data stores like document databases and key value stores. This is not a decision that should be taken lightly. Multi node solutions are inherently complex and difficult to use and maintain. Very often I see people jump to scaling out in situations where scaling down and up are more than enough for current and all expected future capacity. It makes the contractor’s more money, provides interesting challenges to the team and satisfies the customer by checking off stuff in their bucketlist of buzzwords. Often, the customer will insist on a multi-node solution with no real understanding of whether or not it is necessary. While giving these customers what they want can be fun and educational, not to mention profitable, I feel more ethical when given the opportunity to provide the right solution to the problem. There is definitely a time and place for scaling out, but it can be a very expensive and time consuming mistake for the larger share of projects out there. For this reason, it is important to completely understand single node scaling before making the decision to go with an expensive, complicated horizontally scaled solution. That being said, the primary focus of these posts will be the software side of scaling up. Scaling down, moving lesser used data, splitting data, and eliminating unnecessary data from your database, is a fairly obvious task which doesn’t require much more explanation than has already been given. Scaling out is a topic later on. Scaling up your hardware is equally as obvious as scaling down. Basically, when a resource related bottleneck is located, and you have room to upgrade hardware in such a way as to resolve it, it is usually a good idea to do so.