MySQL Database Design for Scalability and Performance

Resource Usage and Size Limits

Operating System File-size Limit
Win32 w/ FAT/FAT32 2GB/4GB
Win32 w/ NTFS 2TB (possibly larger)
Linux 2.2-Intel 32-bit 2GB (LFS: 4GB)
Linux 2.4+ (using ext3 file system) 4TB
Solaris 9/10 16TB (who does file systems better than the guys at Sun?)
MacOS X w/ HFS+ 2TB
NetWare w/NSS file system 8TB

In this section I will be discussing database design for large data sets.  First, it is good to be aware of what the actual limits imposed upon your database by the hardware and operating system it is running on are.  MySQL in most cases on modern machines is capable of managing very large data sets, often times, much larger than most applications ever call for.  In general, the hard limits to be aware of are imposed by the file system.  If you are using a 32 bit linux os or a fat file system, table sizes will be limited by this to between 2 and 4 gig.  A newer 64 bit linux os with an ext3-4 file system expands this to 4 terrabytes.  On solaris, table file sizes can reach as much as 16 terrabytes without being limited by the file system.  This is quite a lot of data for a single table.  Be sure to start new projects on a newer 64 bit OS and use a modern file system with higher file size limits if at all possible to minimize the impact of this type of limit.  Even if you do not see a current need, database applications are usually better than most at taking advantage of such newer technologies as 64 bit processors and operating systems as well as multi-core processors and large ram allocations.  In cases in which this hard limit is reached, partitioning data sets in different tables by such logical attributes as date is possible.  Rarely will this be the case in a single node system though.  Having enough disk space is another important limitation to be aware of.  The amount of disk space needed for your particular implementation will depend upon the amount of data you have, indexing schemes, logging needs, backup methods and data population requirements.  In today’s environments, disk space is now inexpensive enough that it is rarely lacking due to its cost but more often due to the administrator’s failure to specify enough.

First, the database engine and tools themselves require space on your node.  This will vary based upon your choice of tools but is minimal relative to the space used for data, index’s, backups etc.  To understand these requirements, check disk usage before installing MySQL and tools(df -h works in linux) and then directly after installation.

The space requirements for you data and indexes a related to the sum of data length and index length.  To look at a particular table try:

SELECT data_length+index_length FROM information_schema.tables WHERE table_schema='mydb' AND table_name='mytable';

To look at usage across your entire database you can use something like this which I ripped off the web somewhere:

SELECT IFNULL(B.engine,'Total') "Storage Engine", CONCAT(LPAD(REPLACE(FORMAT(B.DSize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Data Size", CONCAT(LPAD(REPLACE(FORMAT(B.ISize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Index Size", CONCAT(LPAD(REPLACE(FORMAT(B.TSize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Table Size" FROM (SELECT engine,SUM(data_length) DSize,SUM(index_length) ISize,SUM(data_length+index_length) TSize FROM information_schema.tables WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') AND engine IS NOT NULL GROUP BY engine WITH ROLLUP) B,(SELECT 3 pw) A ORDER BY TSize;

Or you could just check the disk usage in your data partition with df -h again or file manager in windows.  You can also just count bytes per row based on known/specified lengths and expected volume.

Indexes can take a bunch of space too.  Some ways to alleviate this include:

Avoid creating secondary indexes on the really big tables if you can.  The way they are stored, the primary index is duplicated along with them.  This would usually mean one entry per row even if there were only a few distinct values in the second field.

Fragmentation in indexes is bad…worse when using secondary indexes.  Dump and reload tables periodically when index fragmentation is suspected.  I do believe more recent versions of MySQL have done a bit to address this though.

In addition, you must have space available for backing up your entire database and for raw files used in load procedures.  Your particular implementation  will dictate the size of these files.  Logical backups take a lot of room initially as will be discussed later.  These are full backups of your data in text format either using insert statements or delimited rows.  Raw backups or snapshots just keep track of changes so they are small at first but do grow rapidly on a busy system.  Informational and operational logs can also take up a lot of space and vary depending upon your implementation.  This space and managing it will be discussed later in the course and will depend greatly upon your specific requirements.  As noted earlier, informational logs should usually be turned off when not being used to diagnose a problem but there should still be space available to use them when needed.  Operational log size dictates the level of information safety you have relative to your ability to roll back to an earlier state and avoid data loss in the event of a failure.  In most cases it is better to be safe than sorry here.  Leave plenty of space for your binlogs.  This will also be discussed later.  It is a good idea to specify innodb_file_per_table in your config file as separating your tables into individual files allows for future flexibility, especially during disaster recovery and disk space recovery when a table’s size decreases(scaling down). MySQL’s create database command is fairly simple and doesn’t offer a lot of options.  Most configuration is done at the table level as table’s using different engines such as MyISAM vs InnoDB can exist in the same database.  The majority of specification options are at the table level.

Leave a Reply

Your email address will not be published. Required fields are marked *