This is a very important topic and I don't think forum owners are paying enough attention to it. We are just four days away from many Imgur images being deleted and it will affect many of our sites. I am surprised there is not significantly more concern here.
So, if I have my proxy settings adjusted like this:
...is this good enough to preserve the images?
Yes, this will effectively cache the images indefinitely, as you want.
However, as you have guessed, this will only cache images indefinitely from the point in time when you change this setting. Images which were purged from the cache before this time will
not be cached indefinitely until they are accessed again.
I have been using this setting since December of 2018 and have nearly 2 million images cached in my image proxy, and I would personally recommend all professional forum owners use this setting unless you have extreme budget limitations. In this case, I'm now going to trust that all of the Imgur images served on my forum over the past five years have been saved by the proxy as expected.
I wonder if I need to somehow spider my own site to trigger the rest of the embedded images to be proxied.
This is the only way to guarantee all the images will be saved.
You can find out how many posts are using embedded Imgur links with this SQL query:
SQL:
SELECT COUNT(*) FROM xf_post WHERE LOWER(message) LIKE '[img]%i.imgur.com%[/img]';
Here is a custom PHP script to output all of the post IDs which contain those Imgur links into a single-column CSV file.
PHP:
<?php
// Prompt for output file name
echo "Enter the output file name: ";
$output_file = trim(fgets(STDIN));
// MySQL connection details
$host = 'localhost';
$username = 'username';
$password = 'password';
$dbname = 'database_name';
// Connect to the MySQL database
$mysqli = new mysqli($host, $username, $password, $dbname);
// Check for connection errors
if ($mysqli->connect_errno) {
echo "Failed to connect to MySQL: " . $mysqli->connect_error . "\n";
exit();
}
// Hard-coded MySQL query
$query = "SELECT post_id FROM xf_post WHERE LOWER(message) LIKE '[img]%i.imgur.com%[/img]'";
// Execute the query and fetch the result
if ($result = $mysqli->query($query)) {
// Open the output file for writing
$file = fopen($output_file, 'w');
// Write the CSV header
fputcsv($file, ['post_id']);
// Write the CSV data
while ($row = $result->fetch_assoc()) {
fputcsv($file, $row);
}
// Close the output file
fclose($file);
// Free the result set
$result->free();
} else {
echo "Error: " . $mysqli->error . "\n";
}
// Close the MySQL connection
$mysqli->close();
?>
You can then use these post IDs to construct URLs of the format
https://www.example.com/forum/posts/XXX
where
XXX
is the post ID. That will do a 301 redirect to the thread page which has the Imgur images on them. This means you can write another script which visits all those pages, parses the page, and then sends additional requests to download each of those images.
Does the proxy retain images that were posted prior to the proxy being enabled, as the posts/threads are viewed by visitors?
Yes.
Are dormant threads, once they are viewed in the future, going to try to contact the image source to retrieve the image? I would think so, which is why I might want a way to spider the forum so it can retrieve and proxy those images so they are stored while they are still available.
Yes. If the images aren't saved in the proxy, it will attempt to connect to the origin server and in this case of Imgur links, it might not work.
Do search engine page views cause images to be proxied? I wouldn't think so but on the other hand...maybe they do? Or what about archive.org scraping the site? (I normally block archive.org because it's unauthorized used of my content but, if I enabled it long enough for images to be scraped and proxied, I could deny it once again.)
I think so.
According to Google, Googlebot does fully render the page, which implies that images are downloaded as well.
I'm not convinced converting to attachments is the answer either, so I haven't looked into it. The process seems too prone to error and/or exclusion when something goes wrong with the process.
I agree with you, as XF's Image Proxy is excellent and I don't understand why people bother with the image-proxy-to-attachment conversion process or plugins. The only way I see something like that could be useful is if it actually parses all of the existing hotlinked images in your forum, fetches all of them, and then saves them as attachments, so you have a permanent snapshot without having to write your own fetcher/parser.