yeah amazon can probably only be solved using their API or something. i remember wordpress used to have some addons, paid probably, that generated a very nice product box for blog posts.
From my understanding, this has always been the case with XenForo attachments out of the box. Mostly because attachments have permissions and guests might not have access to them.
https://xenforo.com/community/threads/disabling-xenforos-cache-control-header-for-better-caching.195564/post-1523011
Worth mentioning here that apple confirmed this week that they are using applebot for their own AI training.
ai.txt seems like a proposed standard. Are existing AI companies supporting it?
It works here so I am guessing it was fixed as part of recursion bug. Still posting for reference.
Embedding itself does work but if I post a fresh link, it is not converted into the relevant code that enables embedding. It is posted with regular unfurl code. Embed code works on external sites...
yeah it's likely an old account was taken over by a spammer through one of the password dump leak. there are ongoing discussions about this. running spam cleaner on these accounts is probably not the best idea (without verifying existing content) as you might end up deleting useful content from...
cannot find a definitive list of ips used by bytedance. and they seem to use amazon aws for their spider which means you are going to block non bytespider ips as well in the end. in the end, it should not matter in almost all cases.
bytespider went crazy for a while. petalbot too. and at one point of time i had to block applebot coz it was everywhere in logs. and then there are smaller companies like ahrefsbot which would hammer your server randomly. one has to decide if they are going to get any value from these bots...
Yeah, only option is to block using IP ranges if available. But it is going to be a constant work to keep adding new IPs.
Google and OpenAI both have spiders that can be blocked through robots file (a list here). Worth doing it. But Google has mentioned that their AI overview can use data from...