Partial fix xf_ip table & cookie_login & IP check

Xon

Well-known member
With the cookie_login being logged to the xf_ip table, checking the IP of content or running shared IP checks now takes massively longer.

Since XenForo 1.5.8 (2015-02 or so), Spacebattles has accumulated some 18 million records for cookie_login*. Checking the IP of content often takes unexpectedly long since the registration & confirmation IP are actions of the content_type user instead of just a field off the user table.

Adding the registration/confirmation IP id's directly on a user table would also simplify lookups to have the same efficiency regardless of the table size.

This is perhaps something for XenForo 2.x, but adding a summary table that the shared ip function hits would dramatically reduce the need for bulky indexes on the xf_ip table. With gradually of a day, userid/date/ip/event count would be a vastly simplier structure for displaying IP information in the admincp and running shared ip checks against.

*Pruning this many records will require a script it iterate over records to prevent downtime.
 
Due to this, just opening the IP tab on a member's profile in the AdminCP caused the entire site to go down while the query took 580 seconds to complete.
 
Just recently had the entire SpaceBattles forum lock for ~580 seconds because an admin looked at the IP tab for a user with an abnormal number of IP's recorded against it. 'Copy to temp' was blocking writes :(

Due to this, just opening the IP tab on a member's profile in the AdminCP caused the entire site to go down while the query took 580 seconds to complete.
It didn't complete, I had to kill -9 (hard-killed) MySQL to stop the query.
 
This was a concern with the cookie login thing when we added it, but equally, not logging was is definitely a gap in the logged IPs for a user. This will effectively ensure we have a log just from a user browsing which can be important in specific situations.

Interesting that it blocked writes though. I don't think I would have expected that.

I'm not really sure of an obvious (immediate) solution for your situation while still keeping a basic login record at least.
 
This was a concern with the cookie login thing when we added it, but equally, not logging was is definitely a gap in the logged IPs for a user. This will effectively ensure we have a log just from a user browsing which can be important in specific situations.

Interesting that it blocked writes though. I don't think I would have expected that.

I'm not really sure of an obvious (immediate) solution for your situation while still keeping a basic login record at least.

Between the cookie logins and the history of the site, our xf_ip table has over 50 million records. Compressed it is 4.5gb, and ~70% of that space is taken up by indexes. :|
 
It seems like it is not necessary to maintain the login IPs permanently. Perhaps a cutoff setting can be added, to config.php even? It can default to 1 year or something, and older login IPs are pruned by a cron job. That would reduce a large portion of Xon's table right there.

I don't see there would be much use for login IPs that are so old. Even if your forum gets hacked, it's unlikely to need to go back that far to find the breach.
 
That's been an option for a while: "Delete IP usage data after X days". It defaults to 1 year as well.

That doesn't necessarily mitigate this particular issue on a very large installation.
 
Based off a dump of the xf_ip table loaded into another instance (with ~51 million records):
- 4.3gb with 2 indexes. (uncompressed)
- 4.4gb with 5 indexes. (compressed)
- 6.8gb with 5 indexes. (uncompressed)

Removing the user login events from xf_ip (to their own table), shrinks xf_ip to ~33 million rows (2.3gb, with 2 indexes, uncompressed). Note; I summarize cookie-login events to the nearest day, since most users appear to hit it a number of times a day and I don't have a need for more detail and all the UI display is in whole days anyway.

I've dropped the following indexes:
  • `content_type_content_id` (`content_type`,`content_id`)
  • `ip_log_date` (`ip`,`log_date`)
  • `log_date` (`log_date`)
content_type_content_id is only used to find the user registration/confirmation record, ip_log_date for the shared ip function, and log_date for pruning. The log pruning can be rewritten to take advantage of the fact that ip_id increasing nearly always corresponds to log_date increasing:

Code:
select ip_id from xf_ip where log_date > ? order by ip_id limit 1;
delete from xf_ip where ip_id < ?

I've written an add-on & migration script which extracts the following info into separate tables:
  • xf_ip_user_login - records user logins, xf_ip doesn't receive these records since they are expected to churn while xf_ip has more static content ip's.
  • xf_ip_user_reg - the first user registration/confirmation record, ensures an O(1) lookup times.
  • xf_ip_summary - per-day summary of user/ip/event count, allows shared ip detection to run vastly faster.
I plan to fiddle with how data is inserted into xf_ip_summary to reduce the possibility of locking occurring during the user's request and queries hitting xf_ip_summary ('copy to temp' appears to be a wonky state which can hold ranged row locks for the duration). Probably a deferred task which checks a xf_data_registry entry for the last submitted xf_ip.ip_id.
 
Last edited:
Even if your forum gets hacked, it's unlikely to need to go back that far to find the breach.
Just wanted to throw this in: It's not always because of a breach... there are some other factors in play for some sites that need to have a more definitive tracking system. Sure, we have web server logs and using tools like Splunk, we can pull out raw data in custom models very quickly. Having the IP data readily available and tied to specific accounts, saves time.

Sites that don't have strict requirements about IP logging can trim their databases to keep the sizes down.

@Xon - Appreciate your continuous work on improving the IP data handling methods. I think we are handling about 2m IP records right now, haven't bothered to calculate a growth rate yet though.
 
By current working theory is that the VM's where experiencing periodic very high CPU steal that showed up poorly on the averaged daily graphs of our monitoring.

We had a couple VM's moved to new hosts, after discovering CPU steal would jump massively for a very short period of time which triggered death-spiral like behavior for a short period of time which the system would then gradually recover from.

The reason we fingered the high amounts of data in the xf_ip table as problematic, was it was just the most likely thing to be a victim since everything was suddenly vastly more costly. Especially touching large amounts of compressed data.
 
I'm going to call this a partial fix. A total redesign of the IP table and storage methods would be a totally different thing.

XF2 has for a while limited the number of cookie_login events that we log. Essentally, we don't log cookie_login entries unless we haven't seen the IP in the last 6 hours. This ensures that we still have a log for the usage of a particular IP even if they didn't do anything else on it (such as post), but it should limit overzealous logging. Unless the user is changing IP constantly, though in that case, having the cookie_login event trigger is intentional as otherwise there would be holes in the IP logs (which is what the cookie_login logging was added to prevent).

I've also these changes to XF 1.5.
 
Top Bottom