XF 2.2 Compressing the xenForo Database?

chris p

Member
Our xenForo database will have thousands of extra id "holes" where we've imported users, posts, and threads from two different forum sites. In other words, in users for example, there will be a post_id gap between the imports from one forum site, and the imports from another. Aside from this, within the table there are thousands of gaps where content was deleted in the past.

For tidying up, I was wondering if xenForo has a utility (cmd.php?) that can compress the database back to contiguous autoincrement id numbers?
 
You shouldn't ever try to do this as the IDs are referenced in lots of other tables and it will almost certainly just completely break everything. It's just an ID, don't put too much importance on it.
 
I figured that, which was why I was wondering if they had put together a organized compression function that took into account all the relational links. You're correct of course, it's not important. I had just sweated through 15+ years of worrying that phpbb2 ids were going to exceed the field size with our busy database coupled with replication skips in ids (and they are close, but we'll be migrating to xenForo long before they can overflow). The field definition is larger in xenForo. I will have a several million "holes" but it probably isn't significant.

Thank you very much for the reply, it's one less thing to worry about in going live.
 
There definitely isn't an official script at least, I don't think it would ever be encouraged officially for XF to make one. Not sure if anyone else has ever attempted it.

I suppose if one was hell bent on it, you might be able to import one XF forum database into a clean install and opt not to retain IDs. But honestly I wouldn't think it's worth it and wouldn't recommend it.

As it happens regarding ID overflow if that's a concern, there are some things to be mindful of: https://xenforo.com/community/threads/xf_thread_read-auto_increment-out-of-range.203415/

Some tables might have issues if you have IDs incrementing by more than 1, but as the data that caused issues here won't exist from phpBB you probably won't ever have an issue.
 
I suppose if one was hell bent on it, you might be able to import one XF forum database into a clean install and opt not to retain IDs. But honestly I wouldn't think it's worth it and wouldn't recommend it.
This is literally the only method we would ever recommend. And even then we agree with you that it’s highly unimportant, even more so for a well established site because of links to existing content, IDs and links referenced in existing content etc.

You’d have to maintain redirection scripts which would only work to an extent but would completely break things when the new content overlaps existing IDs.

In short, life’s too short. Don’t worry about it 😂
 
That is some interesting stuff. I've been worrying only about post_id's growing over the years. The thread_read concept takes that to a new level. I'm only bringing in about 6 millions posts initially (though a few million more will be coming from archives and consolidation of other forums). Having a ceiling of 420 million on reads is a little distressing, but perhaps I should study and understand that more before getting all stressed out.

I will take Chris' advice, and life's way more short for an old guy like me than you all. I'll see if I have any of those chill pills left. ;-)
 
You really only need to worry about it if the highest post_id starts to approach 4,000,000,000 (they are stored as 32-bit numbers, so they can range between 0 and 4,294,967,295). Even then, the solution isn't going to be to reorganize post_id and fill the gaps (for reasons others have already given). If you start approaching the upper limit of a 32-bit number (either because of huge gaps or your site has billions of posts), it's really just a matter of changing the post_id column in the respective tables to be 64-bit numbers instead (then you can have post_ids up to 18,446,744,073,709,551,615).

Side note: I'm actually in a situation where I have a couple tables that have billions of records (without ID gaps), and I'm not particularly looking forward to running out of 32-bit numbers for the primary keys of those tables. I never really expected the tables to get that big, so I didn't define them as 64-bit out of the gate, but now they are getting big. 😕
 
You really only need to worry about it if the highest post_id starts to approach 4,000,000,000 (they are stored as 32-bit numbers, so they can range between 0 and 4,294,967,295). Even then, the solution isn't going to be to reorganize post_id and fill the gaps (for reasons others have already given). If you start approaching the upper limit of a 32-bit number (either because of huge gaps or your site has billions of posts), it's really just a matter of changing the post_id column in the respective tables to be 64-bit numbers instead (then you can have post_ids up to 18,446,744,073,709,551,615).

Side note: I'm actually in a situation where I have a couple tables that have billions of records (without ID gaps), and I'm not particularly looking forward to running out of 32-bit numbers for the primary keys of those tables. I never really expected the tables to get that big, so I didn't define them as 64-bit out of the gate, but now they are getting big. 😕
This clarifies a confusion I had from the link above. If it is just the post_id, then there would never be a worry given the field size. The post talked about a thread_read id which I took to mean records generated on read of the posts which would be many times the number of threads. We only will have about 12M posts and only grow by a few hundred thousand a year (though the plan is to grow that with what we are doing). Far past my opportunity to retire (again) ;-).

Worst case, it sounds like a solution would be mysqldump->vi->mysql <loadfile

Thank you for the input.
 
This clarifies a confusion I had from the link above. If it is just the post_id, then there would never be a worry given the field size. The post talked about a thread_read id which I took to mean records generated on read of the posts which would be many times the number of threads. We only will have about 12M posts and only grow by a few hundred thousand a year (though the plan is to grow that with what we are doing). Far past my opportunity to retire (again) ;-).

Worst case, it sounds like a solution would be mysqldump->vi->mysql <loadfile

Thank you for the input.
Yep... same general principle. If you are approaching a value of ~4B xf_thread_read.thread_read_id, just make it a 64-bit number for that column in the table.

Although just skimming the source (so don't take this as a definitive answer), I don't see anywhere where that column is used. If it was problematic because there aren't enough 32-bit numbers for it's primary key, you could probably drop that column completely if you had to (again, don't do anything like that without asking someone with more insight to why it's there, like a XF dev). But from the looks of it, I don't see anywhere that the column is used/referenced and originally it wasn't even there (looks like it was added in 1.0.0 beta 5). I assume there must have been a reason it was added, but I don't see it with a quick skim.
 
This clarifies a confusion I had from the link above. If it is just the post_id, then there would never be a worry given the field size. The post talked about a thread_read id which I took to mean records generated on read of the posts which would be many times the number of threads. We only will have about 12M posts and only grow by a few hundred thousand a year (though the plan is to grow that with what we are doing). Far past my opportunity to retire (again) ;-).

Worst case, it sounds like a solution would be mysqldump->vi->mysql <loadfile

Thank you for the input.
It’s primarily an issue for data that is created at larger scale. The main ones I know of are thread reads and alerts. Thread reads obviously have a row per thread/user combination, and while the table gets pruned the IDs will be used up quickly on an active forum with a larger auto increment increment. The forum we had this issue with had ~6m rows in the table when I reset the IDs (so they were used up to ~60m), and we hit the overflow again just under 2 years later. Alerts are prone to it too as there could be hundreds or thousands created for a single reply if lots of people are watching a thread, and there’s the admin communication feature too, all of which adds up over time.

Although just skimming the source (so don't take this as a definitive answer), I don't see anywhere where that column is used. If it was problematic because there aren't enough 32-bit numbers for it's primary key, you could probably drop that column completely if you had to (again, don't do anything like that without asking someone with more insight to why it's there, like a XF dev). But from the looks of it, I don't see anywhere that the column is used/referenced and originally it wasn't even there (looks like it was added in 1.0.0 beta 5). I assume there must have been a reason it was added, but I don't see it with a quick skim.
I wondered this too, I think there are some tables that don’t have an auto incrementing column and have a unique index from other columns.
 
Back
Top Bottom