Good insight and that sounds like a reasonable explanation. XF should not use a lightbox for a direct image resource.
hmmm I don't get that, just a Failed: Soft 404I don't think the Lightbox explains it. When you use Inspect google for that direct url, it comes back with a Robots block error for me, even though they aren't robots blocked.
hmmm I don't get that, just a Failed: Soft 404
Good insight and that sounds like a reasonable explanation. XF should not use a lightbox for a direct image resource.
Do the live test button on the top right.
I guess my question for you would be, have all of the Soft 404s had any negative repercussions on your search engine rankings? If it's true that these image "pages" are actually pages, they would certainly be considered thin. According to pretty much everyone, thin pages are bad for rankings. I would be curious to get your perspective on this. Do you suspect that these pages have dampened your search engine position at all?
Good question, hard for me to tell. All I know is Google thinks it's an issue. Seems like a good idea to believe them. I agree they could be considered thin pages. Maybe the question is how to allow Google to crawl and index your image files, but not these lightbox pages. Is there a way?
Since the source of the image file is visible in the page code and since the actual physical images are stored in the /data/attachments/ directory, blocking the /attachments/ directory would have no effect in the indexing of the images themselves. Two completely different directories. I have the /attachments/ directory blocked on my site and the images are still being indexed by Google.
I don't want to be the one to recommend that you go ahead and block the /attachments/ directory, which in turn would cause another section of the Google Search Console to flare up. Instead of having "Soft 404" errors, you'd have "Blocked by robots.txt" errors. Which one is worse is the big question. As I mentioned in a previous post, from my personal experience, I've seen blocked pages drop out of the index after three months, which would be a good thing.
I don't understand this because when I copy an image address or look at the src in devtools it's not /data/, it's /attachments/. So how is Google finding this /data/ folder?
Also it would help with crawl budget. Google wouldn't be wasting their time crawling these pages. I either need to block or noindex these lightbox pages, but I can't find anything in templates.
This depends on which version of the image is showing in a post. If you inspect the thumbnail image, you'll see the /data/attachments/ directory. If the images are showing at full size in the post, you'll only see the /attachments/ directory.
The thing is, I don't want to be internally linking to thousands and thousands of essentially dead pages in my website. To deal with these internal links, I purchased two very awesome add-ons that remove them:
With these add-ons in place and the permissions set the way I have them set, the attachment pages as well as member pages don't exist to the search engine crawlers (they're not linked to anymore). The only thing that needs to happen now is for the pages that have been seen, to fall out of the index over the next few months.
This is an issue because the vast majority are full image. I need to try some sql statements to change all thumbnails to full.
We use essential cookies to make this site work, and optional cookies to enhance your experience.