digitalpoint
Well-known member
I'm curious if any "big boards" have run into issues with how avatars and attachments are stored in the web server file system?
Currently I run a setup with 4 load balanced web servers that all independent of each other (that way there is no single point of failure with things like a SAN failing). Let's say I had 200,000 users that had avatars plus 5,000,000 attachments. 200k avatars becomes 600k files because there are 3 sizes that are stored.
Of course you can use something like csync2 to keep files in sync across multiple servers, but I wonder if it's really worth it to try and keep millions of attachment/avatar files in sync every time someone uploads a new one. You also start having issues that need to be addressed if you have a web server that is down (either failed or taken down intentionally for maintenance).
On top of it, I'm probably going to be moving to a bigger cluster of 12 web servers sometime soon. It just seems that storage of attachments and avatars doesn't scale well as you bring on more web servers. It's more time trying to keep everything in sync every time someone uploads something, but also at the extreme you start running into disk space issues. For sake of argument, let's say I had 100 web servers... do I really want to store attachments/avatars 100 times each? Csync2 is extremely fast, but even if it took "only" 1 second to sync millions of files to a server (it would be longer), you are talking 12 seconds of delay every time someone uploaded something.
On a massive scale, imagine if Facebook was trying to store user avatars locally and sync all their web servers with those files every time someone uploaded a new one.
In addition to scalability issues, there are a few additional things I don't like about how avatars are stored... When using Imagemagick, you can't reject animated avatars. And all avatars are also stored in the file system as .jpeg files (even GIF/PNGs which makes the web server kick out invalid mime types for GIF or PNG avatars... so you are more or less relying on the web browser to assume site owners are stupid and correct invalid file types).
I'm thinking I may need to just rewriting parts of the Model_Avatar and Model_Attachment class to address all my issues and store stuff in the database rather than file system (1 query to read it from the database really isn't that much overhead as long as you use the proper HTTP headers so browsers aren't checking if it's changed since the last time it downloaded it).
Currently I run a setup with 4 load balanced web servers that all independent of each other (that way there is no single point of failure with things like a SAN failing). Let's say I had 200,000 users that had avatars plus 5,000,000 attachments. 200k avatars becomes 600k files because there are 3 sizes that are stored.
Of course you can use something like csync2 to keep files in sync across multiple servers, but I wonder if it's really worth it to try and keep millions of attachment/avatar files in sync every time someone uploads a new one. You also start having issues that need to be addressed if you have a web server that is down (either failed or taken down intentionally for maintenance).
On top of it, I'm probably going to be moving to a bigger cluster of 12 web servers sometime soon. It just seems that storage of attachments and avatars doesn't scale well as you bring on more web servers. It's more time trying to keep everything in sync every time someone uploads something, but also at the extreme you start running into disk space issues. For sake of argument, let's say I had 100 web servers... do I really want to store attachments/avatars 100 times each? Csync2 is extremely fast, but even if it took "only" 1 second to sync millions of files to a server (it would be longer), you are talking 12 seconds of delay every time someone uploaded something.
On a massive scale, imagine if Facebook was trying to store user avatars locally and sync all their web servers with those files every time someone uploaded a new one.
In addition to scalability issues, there are a few additional things I don't like about how avatars are stored... When using Imagemagick, you can't reject animated avatars. And all avatars are also stored in the file system as .jpeg files (even GIF/PNGs which makes the web server kick out invalid mime types for GIF or PNG avatars... so you are more or less relying on the web browser to assume site owners are stupid and correct invalid file types).
I'm thinking I may need to just rewriting parts of the Model_Avatar and Model_Attachment class to address all my issues and store stuff in the database rather than file system (1 query to read it from the database really isn't that much overhead as long as you use the proper HTTP headers so browsers aren't checking if it's changed since the last time it downloaded it).