MySQL: MyISAM or INNODB?

Shamil

Well-known member
Recently, vBulletin suggested that we convert our tables and engines to INNODB, slating many performance improvements, such as: faster query times and fewer table-locking issues.

Now, INNODB doesn't support FULLTEXT searching, which means that an external searching mechanism needs to be used.

What do you recommend and what schema will XF be using?
 
Out of interest is there technical need for them to be InnoDB? I converted our VB board over to InnoDB once as a test and it was really, really slow.

InnoDB wins when it comes to row level locking, I agree, but myISAM is simply so much faster, not to mention, easier to backup / restore and transfer between servers.

I'm currently doing a test migration of our forum (86k users, 4m posts, 370k threads) and bloody hell its slow! Took nearly an hour just to do the users.

I may well do some tests to compare myISAM vs InnoDB and see what the difference is.
 
One should never arbitrarily convert all tables over to InnoDB from MyISAM - the latter has some distinct benefits, including very very fast write performance. However, it doesn't support row-level locking or transactions, both of which are supported by InnoDB, so careful choices need to be made as to which table suits what purpose. Anyone looking at the XenForo schema will see that we use both.
 
Could BLOB hold attachments then?
BLOB can hold any binary data, but it's not typically advisable to store that data in a database. There is no reason, for example, to store images in a database, but for some reason a lot of people like to do that. Store your binary data in directories and use the database to store the path(s) (pointers, references, whatever you want to call them) to that data. Your database will thank you.
 
One should never arbitrarily convert all tables over to InnoDB from MyISAM - the latter has some distinct benefits, including very very fast write performance. However, it doesn't support row-level locking or transactions, both of which are supported by InnoDB, so careful choices need to be made as to which table suits what purpose. Anyone looking at the XenForo schema will see that we use both.

Interesting.. I haven't looked into the tables properties yet. Will check it out, but I haven't really played with different formats.

Learned something new. :)

That reminds me. What is the difference between mysql and mysqli? I think mysqli was used for XF correct?
 
BLOB can hold any binary data, but it's not typically advisable to store that data in a database. There is no reason, for example, to store images in a database, but for some reason a lot of people like to do that. Store your binary data in directories and use the database to store the path(s) (pointers, references, whatever you want to call them) to that data. Your database will thank you.
Interestingly, I had a debate with a person about this a few months ago. They claimed that when sites get big, there is actually a lot of benefit to be had from storing large binary BLOBs in the database. Here's an excerpt from that conversation:

Attachments need to be able to be stored on the filesystem or in the database. Most people don't know it, but having your attachments in the database is HIGHLY preferable in most scaling hosting situations, for many reasons.
The first major reason is cluster filesystems. If you have 10 webservers accessing a GFS share on an iSCSI SAN via gigabit LAN, and they are all running DLM (also via gigabit LAN), you want to keep the number of files down(or at least not intentionally make hundreds of thousands of them), and DLM introduces overhead to the network/systems for every webserver in the cluster. Efficiency isn't always the goal though. In the case of my guild's forum, it has a 20GB attachment table and the whole cluster just performs much better when it's in the database, even if the database server is spending more cycles, storage(indexes) and memory/caching to retrieve that data than all of the webservers would via GFS/DLM. It usually takes me about 8 webservers to overwhelm 1 database server, so I'm interested in being able to shift that load to the database servers. On the topic of load, I've had 5,000 people viewing a thread on a forum that typically has 50-100MB of attachments per page and the load from the attachment table is still literally negligible, aside from network bandwidth - but at least it's coming from 1 server, not flying all over the place with DLM and the SAN.
Database servers typically not only have local storage versus network storage on a webserver cluster, but it's usually extremely fast local storage. This is great for situations such as when 5,000 people are hitting an attachment thread. It's also easy to load balance multiple network adapters on the database server via host entries on the webservers, whereas it's quite a bit more complicated(and expensive) to make a multipath SAN fabric or expand beyond gigabit ethernet to 10GbE or FC/infiniband, and you have to expand your DLM network to a faster fabric along with it.

So it's something we have in the back of our minds for the future.
 
S/he lost me at DLM, because I have no idea what that is.

Coming from the shared hosting perspective, I don't generally work with a lot of people (meaning zero) running clustered servers for one forum, and my advice for the typical database user has always been to avoid unnecessary binary data in the db. But then again, maybe the old keep yer BLOBS outta the database! idea is quaint and antiquated, like books or integrity, and it isn't necessary anymore. The MySQL servers we work with today are certainly a wee bit more efficient than those we worked with a dozen years ago, that's for sure.

But even in the scenario above, for web servers (even a cluster, which I have had the misfortune to work with in the past), I would prefer file storage like NetApp to hundreds of gigs of BLOB data fed from a MySQL box. We never do file storage on the web servers, so the data "flying all over the place" is not really an issue I've dealt with. I'm not the network guy anyway, I'm just the company buffoon and gadfly. So what do I know (well, I know that not everyone can run out and buy NetApps to solve their file storage problems, so maybe I'm just talking out of my ass).

But I can't argue with anything above, because s/he's doing PRETTY LARGE SCALE in real life, so that's valuable front-line knowledge. That kind of scale for a single site is way outside of my experience.
 
I went over a few of the points either in a blog reply to a sitepoint piece I read :

http://www.jeremyhutchings.com/2010/11/top-10-improvements-for-php-developers.html

As with any complex system, focusing on just 1 aspect is typically wrong; it's all a balancing act. Given how much 32-64 gig of RAM is these days, what ever the storage system is - be that a file based or dB (on top of file based) one - setting up the caches/buffers/etc correctly so for the most part the live app is running out of ram, and disk is just long term storage, is the way to go.

In the PHP/LAMP world, memcache and Gearman basically protect most apps from dropping to disk speeds.
 
Something strange happened and I don't know how. I remember that most of my database tables were MyISAM and now lot of them turned to InnoDB. How could such thing happen, I have no idea what triggered it or caused it. How could I reverse this, would it cause damage to my database ? Thanks.
 
My host replied that they don't support InnoDB at the moment, and I can't use the repair function which returns with this error:

Code:
dbname_.EWRcarta_cache
dbname_.EWRcarta_history
dbname_.EWRcarta_pages
dbname_.EWRcarta_search                        OK
dbname_.EWRcarta_templates
dbname_.EWRmedio_categories
dbname_.EWRmedio_comments
dbname_.EWRmedio_keylinks
dbname_.EWRmedio_keywords
dbname_.EWRmedio_media
dbname_.EWRmedio_playlists
dbname_.EWRmedio_search                        OK
dbname_.EWRmedio_services
dbname_.EWRporta_cache
dbname_.EWRporta_modules
dbname_.EWRporta_settings
dbname_.dark_azucloud_terms
dbname_.dark_azucloud_terms_pages
dbname_.dark_taigachat
dbname_.kingk_bbcm                             OK
dbname_.xf_addon
dbname_.xf_admin
dbname_.xf_admin_navigation
dbname_.xf_admin_permission
dbname_.xf_admin_permission_entry
dbname_.xf_admin_template
dbname_.xf_admin_template_compiled
dbname_.xf_admin_template_include
dbname_.xf_admin_template_phrase
dbname_.xf_attachment
dbname_.xf_attachment_data
dbname_.xf_attachment_view
dbname_.xf_ban_email
dbname_.xf_ban_ip
dbname_.xf_bb_code_media_site
dbname_.xf_code_event
dbname_.xf_code_event_listener
dbname_.xf_content_type
dbname_.xf_content_type_field
dbname_.xf_conversation_master
dbname_.xf_conversation_message
dbname_.xf_conversation_recipient
dbname_.xf_conversation_user
dbname_.xf_cron_entry
dbname_.xf_data_registry
dbname_.xf_deletion_log
dbname_.xf_email_template
dbname_.xf_email_template_compiled
dbname_.xf_email_template_phrase
dbname_.xf_error_log
dbname_.xf_flood_check
dbname_.xf_forum
dbname_.xf_forum_read
dbname_.xf_identity_service
dbname_.xf_import_log                          OK
dbname_.xf_ip
dbname_.xf_language
dbname_.xf_liked_content
dbname_.xf_link_forum
dbname_.xf_login_attempt
dbname_.xf_moderation_queue
dbname_.xf_moderator
dbname_.xf_moderator_content
dbname_.xf_mood
dbname_.xf_news_feed
dbname_.xf_node
dbname_.xf_node_type
dbname_.xf_option
dbname_.xf_option_group
dbname_.xf_option_group_relation
dbname_.xf_page
dbname_.xf_permission
dbname_.xf_permission_cache_content
dbname_.xf_permission_cache_content_type
dbname_.xf_permission_cache_global_group
dbname_.xf_permission_combination
dbname_.xf_permission_combination_user_group
dbname_.xf_permission_entry
dbname_.xf_permission_entry_content
dbname_.xf_permission_group
dbname_.xf_permission_interface_group
dbname_.xf_phrase
dbname_.xf_phrase_compiled
dbname_.xf_phrase_map
dbname_.xf_poll
dbname_.xf_poll_response
dbname_.xf_poll_vote
dbname_.xf_post
dbname_.xf_profile_post
dbname_.xf_profile_post_comment
dbname_.xf_report
dbname_.xf_report_comment
dbname_.xf_route_prefix
dbname_.xf_search
dbname_.xf_search_index                        OK
dbname_.xf_session                             OK
dbname_.xf_session_activity
dbname_.xf_session_admin                       OK
dbname_.xf_smilie
dbname_.xf_spam_cleaner_log
dbname_.xf_style
dbname_.xf_style_property
dbname_.xf_style_property_definition
dbname_.xf_style_property_group
dbname_.xf_template
dbname_.xf_template_compiled
dbname_.xf_template_include
dbname_.xf_template_map
dbname_.xf_template_phrase
dbname_.xf_thread
dbname_.xf_thread_read
dbname_.xf_thread_redirect
dbname_.xf_thread_user_post
dbname_.xf_thread_view
dbname_.xf_thread_watch
dbname_.xf_trophy
dbname_.xf_trophy_user_title
dbname_.xf_upgrade_log
dbname_.xf_user
dbname_.xf_user_alert
dbname_.xf_user_alert_optout
dbname_.xf_user_authenticate
dbname_.xf_user_ban
dbname_.xf_user_confirmation
dbname_.xf_user_external_auth
dbname_.xf_user_follow
dbname_.xf_user_group
dbname_.xf_user_group_change
dbname_.xf_user_group_relation
dbname_.xf_user_identity
dbname_.xf_user_news_feed_cache
dbname_.xf_user_option
dbname_.xf_user_privacy
dbname_.xf_user_profile
dbname_.xf_user_status
dbname_.xf_user_trophy
dbname_.xf_user_upgrade
dbname_.xf_user_upgrade_active
dbname_.xf_user_upgrade_expired
dbname_.xf_user_upgrade_log

note: The storage engine for the table doesn't support repair
 
I went over a few of the points either in a blog reply to a sitepoint piece I read :

http://www.jeremyhutchings.com/2010/11/top-10-improvements-for-php-developers.html

As with any complex system, focusing on just 1 aspect is typically wrong; it's all a balancing act. Given how much 32-64 gig of RAM is these days, what ever the storage system is - be that a file based or dB (on top of file based) one - setting up the caches/buffers/etc correctly so for the most part the live app is running out of ram, and disk is just long term storage, is the way to go.

In the PHP/LAMP world, memcache and Gearman basically protect most apps from dropping to disk speeds.

Just a quick question. I've got a table that stores content for a CMS. For each page call, I'm doing a very evil SELECT *. Out of the 6 columns in the table, 5 are actually used. Would you suggest I still call all or just those 5 columns? The unused column returns an integer.
 
Just a quick question. I've got a table that stores content for a CMS. For each page call, I'm doing a very evil SELECT *. Out of the 6 columns in the table, 5 are actually used. Would you suggest I still call all or just those 5 columns? The unused column returns an integer.
Shamil, it probably makes little difference to be honest. Although its worth bearing in mind that mySQL / PHP stores all the rows in the return set when the query returns, so from a memory use point of view, if you don't need it, don't return it.

Not really too much to worry about with a small result set. However, with a result set containing millions of rows, its a bit more of an issue ;)
 
Shamil, it probably makes little difference to be honest. Although its worth bearing in mind that mySQL / PHP stores all the rows in the return set when the query returns, so from a memory use point of view, if you don't need it, don't return it.

Not really too much to worry about with a small result set. However, with a result set containing millions of rows, its a bit more of an issue ;)

thought so, thanks for confirming :)
 
It's not a stupid limitation.

Hosts have certain responsibilities, things that customers expect, and one of those is database backups. When you are backing up a database that uses MyISAM tables, you can do a simple file copy. Proper InnoDB backup requires a table lock, and that's expensive when you are backing up tens of thousands of databases (there is a "hot" backup method, but that consumes even more system resources). And depending on the method the host uses to make backups, one InnoDB table that's in use - one row locked - can stop the backup process for the entire server.

Transaction abilities are nice to have from a user standpoint, but they do introduce challenges in a shared hosting environment. Which is why they have traditionally been available only on databases built for enterprise use, like MSSQL or Oracle, because an enterprise user can suspend access to the database during a backup. If your host suspends access to your database at some random time when you don't expect it - even briefly - you're going to scream bloody murder.

Most hosts do not impose limitations for no reason. Maintaining thousands of sites or databases or email accounts or whatever is much different than maintaining one or two or 10. Think about why they might not offer something next time you complain about those limitations. There is usually a valid reason.
 
Yeah, the backup aspects of InnoDB are a concern as backing up myISAM tables, as you say mjp, is very simple by comparasion.

I presume the only way to do a backup on an InnoDB is to do it via an SQL dump?

I did setup one copy of XF & imported the VB forum into myISAM tables and it appeared to work fine (didn't do much testing though).

But I seriously would not advise XF to require InnoDB, as you've found on here some hosts won't enable it.

As long as speed isn't an issue (somehow we need to test this) and I can find a solution to the backup, I don't mind using InnoDB, but myISAM does have quite a bit of an advantage, in some areas.
 
Hosts have certain responsibilities, things that customers expect, and one of those is database backups. When you are backing up a database that uses MyISAM tables, you can do a simple file copy. Proper InnoDB backup requires a table lock, and that's expensive when you are backing up tens of thousands of databases (there is a "hot" backup method, but that consumes even more system resources). And depending on the method the host uses to make backups, one InnoDB table that's in use - one row locked - can stop the backup process for the entire server.
That's really backwards. If someone has a pending transaction, the backup could simply not backup the pending transaction. It depends on isolation level, though yes you could see a row/page-level lock here.

However, your backup certainly doesn't have to deal with it at that level. Here, I'm referring to this: http://www.percona.com/docs/wiki/percona-xtrabackup:start
It's an extension of the built-in innobackupex. It allows backups of InnoDB tables to be done--consistently--without locking. It's actually done at the file level.

MyISAM, on the other hand, requires table level locking to get a consistent backup. While in theory you can simply copy the myi/myd (and frm) files, you run the risk of partially flushed data. You can do a flush, write lock, and cp though (and of course there's mysqldump).

As for comments regarding speed, InnoDB can be slower for writes. This is mostly because it actually flushes to disk before it tells you it's done (MyISAM doesn't). This is one part of ACID compliance. There are configuration variables in MySQL that can be tuned to make this process more efficient (eg, only flushing once per second). In terms of reads, depending on the situation, InnoDB can be faster than MyISAM due to clustering on the primary key. I would say this difference isn't gigantic.

While they are not strictly required, XF uses transactions throughout to ensure better data consistency. (And for better write performance over auto-commit.)
 
Top Bottom