Speed up / Allow --jobs on php cmd.php xf-rebuild:attachment-optimization

ekool

Well-known member
I've been optimizing the attachments on one of our sites for over 24 hours... currently at 89,240 seconds. The process isn't halfway done.

It would be nice if there was some way to speed up this process or specify --jobs for this on the command line. I don't feel like this process should take nearly this long. The load on the machine is very low and there is clearly way more hardware that could be put to use. The site is live, and the server isn't breaking a sweat.

Code:
CPU: AMD EPYC 7251 (16) @ 2.10 GHz
Memory: 27.23 GiB / 62.27 GiB (44%)
Swap: 0 B / 4.00 GiB (0%)
 
Upvote 0
Jobs are only run by a single process, so they can use only one core.

There is no built-in way to run multiple processes so you'll have to wait until all attachments have been processed.

This may take a loooong time, but can easily be executed in the background or stopped and resumed at any time.

Would certainly be nice though if multiple processes could be used.
 
Jobs are only run by a single process, so they can use only one core.

There is no built-in way to run multiple processes so you'll have to wait until all attachments have been processed.

This may take a loooong time, but can easily be executed in the background or stopped and resumed at any time.

Would certainly be nice though if multiple processes could be used.

There are some options in cmd.php that have --jobs support... or perhaps those are just add-ons from other authors? Some digital point add-ons come to mind but I could be wrong.
 
Optimizing... Attachments (907750)

Almost halfway done and it's been running for 3.2 days. Only another ~4 ish days to complete maybe. Nice.
 
Still slogging away.... a stop and a restart of the process did work though. It picked up where it left off after trying to run through the previous ones that were already done, though it did that process pretty fast.

Optimizing... Attachments (1059834)
 
sigh.... still truckin.

Optimizing... Attachments (1297104)

The idea that there is no way to speed up this process and that just accepting that it might take weeks or months is kind of ridiculous.
 
@ekool Are you using Media Gallery?... I'm curious, do you allow Google to index these attachments? Thx!

Yes, and yes. And here is our robots.txt --- open to suggestions/feedback.

Edit, and might as well update: Optimizing... Attachments (1836384)

Code:
User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: SemRush
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
Disallow: /


User-agent: Amazonbot
Disallow: /threads/*/reply

User-agent: *
Disallow: /whats-new/
Disallow: /whats-new/*
Disallow: /conversations/
Disallow: /find-threads/
Disallow: /*/create-thread
Disallow: /*/post-thread
Disallow: /login/
Disallow: /logout/
Disallow: /lost-password/
Disallow: /misc/
Disallow: /online/
Disallow: /profile-posts/
Disallow: /*/add-reply
Disallow: /*/approve
Disallow: /*/draft
Disallow: /*/latest
Disallow: /*/post
Disallow: /*/reply
Disallow: /*/unread
Disallow: /account/
Disallow: /members/
Disallow: /attachments/
Disallow: /goto/
Disallow: /help/
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Allow: /

Sitemap: https://www.nnn.com/sitemap.xml
 
Still chugging along... I feel sorry for you guys that have even more attachments.

Optimizing... Attachments (1958461)
 
Finally... from January 19th to February 9th this process ran. It's finally done. The last of the Mohicans right here.....

Optimizing... Attachments (2002297)
 
Yes, and yes. And here is our robots.txt --- open to suggestions/feedback.

Edit, and might as well update: Optimizing... Attachments (1836384)

Code:
User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: SemRush
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
Disallow: /


User-agent: Amazonbot
Disallow: /threads/*/reply

User-agent: *
Disallow: /whats-new/
Disallow: /whats-new/*
Disallow: /conversations/
Disallow: /find-threads/
Disallow: /*/create-thread
Disallow: /*/post-thread
Disallow: /login/
Disallow: /logout/
Disallow: /lost-password/
Disallow: /misc/
Disallow: /online/
Disallow: /profile-posts/
Disallow: /*/add-reply
Disallow: /*/approve
Disallow: /*/draft
Disallow: /*/latest
Disallow: /*/post
Disallow: /*/reply
Disallow: /*/unread
Disallow: /account/
Disallow: /members/
Disallow: /attachments/
Disallow: /goto/
Disallow: /help/
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Allow: /

Sitemap: https://www.nnn.com/sitemap.xml


There are some issues here...

The first major one is you say you want Google to index attachments, but in robots.txt you are specifically disallowing them from crawling the /attachments/ directory.

You have numerous references to directories/links only seen by logged in users. Search engines will never see these, unless you specifically give them login credentials which is uncommon.

Recommend putting directories in alphabetical order, this will help eliminate duplicates.

Your current robots.txt with notes...

Code:
User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: SemRush
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
Disallow: /

User-agent: Amazonbot
Disallow: /threads/*/reply    unnecessary, these urls only show if logged in

User-agent: *
Disallow: /whats-new/
Disallow: /whats-new/*    unnecessary exact same affect as /whats-new/
Disallow: /conversations/    old reference should be /direct-messages/
Disallow: /find-threads/    unnecessary, these urls only show if logged in
Disallow: /*/create-thread    unnecessary, these urls only show if logged in
Disallow: /*/post-thread    unnecessary, these urls only show if logged in
Disallow: /login/
Disallow: /logout/    unnecessary, these urls only show if logged in
Disallow: /lost-password/
Disallow: /misc/
Disallow: /online/
Disallow: /profile-posts/    not sure this directory exists (likely now a subdirectory of /whats-new/ or /members/ which are already covered)
Disallow: /*/add-reply    unnecessary, these urls only show if logged in
Disallow: /*/approve    unnecessary, these urls only show if logged in
Disallow: /*/draft    unnecessary, these urls only show if logged in
Disallow: /*/latest    not positive, but I think this is all covered by /whats-new/
Disallow: /*/post    unnecessary, these urls only show if logged in
Disallow: /*/reply    unnecessary, these urls only show if logged in
Disallow: /*/unread     unnecessary, these urls only show if logged in
Disallow: /account/
Disallow: /members/
Disallow: /attachments/    remove if you want attachments indexed
Disallow: /goto/
Disallow: /help/    if you have any custom terms, privacy policy, or help pages you may want to remove this
Disallow: /posts/
Disallow: /login/    this is a duplicate
Disallow: /search/
Allow: /    unnecessary, this is already inferred

Sitemap: https://www.nnn.com/sitemap.xml

I would recommend you start with this and then make any additional adjustments:
Code:
Sitemap: https://www.nnn.com/sitemap.xml

User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: SemRush
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
Disallow: /

User-agent: *
Disallow: /admin.php
Disallow: /account/
Disallow: /direct-messages/
Disallow: /goto/
Disallow: /help/
Disallow: /login/
Disallow: /lost-password/
Disallow: /members/
Disallow: /misc/
Disallow: /online/
Disallow: /posts/
Disallow: /register/
Disallow: /search/
Disallow: /whats-new/
 
Back
Top Bottom