XF 2.2 Settings to keep Imgur attachments forever

Wildcat Media

Well-known member
This month, with Imgur removing images that are not tied to an account, I wondered if our image proxy could hold onto these images so we do not have a lot of broken images in forum posts.

In addition to explicit images, Imgur is also removing old and unused photos that are not tied to an account. As a Twitter user noted, Imgur has been used for years to upload photos without an account and post links to different kinds of sites. Removing those images might result in a lot of dead links all over the internet.

Source:

If I am reading it correctly, our proxy allows us to set 1) an expiration time for proxied images and 2) a way to set the amount of time before the proxy system will try to refresh the image by retrieving it again.

So, if I have my proxy settings adjusted like this:

1682021018748.webp

...is this good enough to preserve the images?

I might also try to find a way to turn images into attachments (via an addon) in the near future but I won't be around until mid May to take on that type of project.
 
Solution
This is a very important topic and I don't think forum owners are paying enough attention to it. We are just four days away from many Imgur images being deleted and it will affect many of our sites. I am surprised there is not significantly more concern here.

So, if I have my proxy settings adjusted like this:

1682021018748.png


...is this good enough to preserve the images?

Yes, this will effectively cache the images indefinitely, as you want.

However, as you have guessed, this will only cache images indefinitely from the point in time when you change this setting. Images which were purged from the cache before this time will not be cached indefinitely until they...
Can't speak from the point of a developer, but from reading the descriptors, it should be.
I agree--I feel the settings should be OK as well, but was hoping someone on XF staff could confirm it for us. It's the TTL setting that has the clout--it is apparently what determines how often a linked resource is checked to update it. I actually prefer a thread or post remain as a snapshot of the day it was posted, complete with whatever resources (images) were originally included in the post, so leaving the TTL at zero seems to be the best choice for us.

Now, I just had another thought. I wonder if I need to somehow spider my own site to trigger the rest of the embedded images to be proxied. I only turned on the proxy about five months ago, and I am not sure how/when the proxy enacts itself. These are the questions I'm considering at the moment.
  • The proxy obviously handles resources (images) as they are included in new posts to the forum.
  • Does the proxy retain images that were posted prior to the proxy being enabled, as the posts/threads are viewed by visitors?
  • Are dormant threads, once they are viewed in the future, going to try to contact the image source to retrieve the image? I would think so, which is why I might want a way to spider the forum so it can retrieve and proxy those images so they are stored while they are still available.
  • Do search engine page views cause images to be proxied? I wouldn't think so but on the other hand...maybe they do? Or what about archive.org scraping the site? (I normally block archive.org because it's unauthorized used of my content but, if I enabled it long enough for images to be scraped and proxied, I could deny it once again.)
I'm not convinced converting to attachments is the answer either, so I haven't looked into it. The process seems too prone to error and/or exclusion when something goes wrong with the process.
 
We need a solution by May 15th.
There is no legally unproblematic full/real/automatic solution as you (as the board owner) haven't aquired any rights on images hosted on Imgur, maybe not even the user who posted them has any rights on those images.
(Assuming that the images are protected by coyright laws which might or might not be the case, but IMHO it is reasonable to assume that most images are protected by copyright laws).

While it is okay (at least by german "Urheberrechtsgesetz") to take and deliver "temporary copies" (this is what the image proxy does), it is IMHO & IANAL not legal to keep those "temporary copies" indefinitely, especially not after the original source is gone.
 
While it is okay (at least by german "Urheberrechtsgesetz") to take and deliver "temporary copies" (this is what the image proxy does), it is IMHO & IANAL not legal to keep those "temporary copies" indefinitely, especially not after the original source is gone.
That is why I'm not comfortable changing them all to attachments. 👍
 
There is no legally unproblematic full/real/automatic solution as you (as the board owner) haven't aquired any rights on images hosted on Imgur, maybe not even the user who posted them has any rights on those images.
Let us be real, nobody cares. Public domain.
 
This is a very important topic and I don't think forum owners are paying enough attention to it. We are just four days away from many Imgur images being deleted and it will affect many of our sites. I am surprised there is not significantly more concern here.

So, if I have my proxy settings adjusted like this:

1682021018748.png


...is this good enough to preserve the images?

Yes, this will effectively cache the images indefinitely, as you want.

However, as you have guessed, this will only cache images indefinitely from the point in time when you change this setting. Images which were purged from the cache before this time will not be cached indefinitely until they are accessed again.

I have been using this setting since December of 2018 and have nearly 2 million images cached in my image proxy, and I would personally recommend all professional forum owners use this setting unless you have extreme budget limitations. In this case, I'm now going to trust that all of the Imgur images served on my forum over the past five years have been saved by the proxy as expected.

I wonder if I need to somehow spider my own site to trigger the rest of the embedded images to be proxied.

This is the only way to guarantee all the images will be saved.

You can find out how many posts are using embedded Imgur links with this SQL query:

SQL:
SELECT COUNT(*) FROM xf_post WHERE LOWER(message) LIKE '[img]%i.imgur.com%[/img]';

Here is a custom PHP script to output all of the post IDs which contain those Imgur links into a single-column CSV file.

PHP:
<?php
// Prompt for output file name
echo "Enter the output file name: ";
$output_file = trim(fgets(STDIN));

// MySQL connection details
$host = 'localhost';
$username = 'username';
$password = 'password';
$dbname = 'database_name';

// Connect to the MySQL database
$mysqli = new mysqli($host, $username, $password, $dbname);

// Check for connection errors
if ($mysqli->connect_errno) {
    echo "Failed to connect to MySQL: " . $mysqli->connect_error . "\n";
    exit();
}

// Hard-coded MySQL query
$query = "SELECT post_id FROM xf_post WHERE LOWER(message) LIKE '[img]%i.imgur.com%[/img]'";

// Execute the query and fetch the result
if ($result = $mysqli->query($query)) {
    // Open the output file for writing
    $file = fopen($output_file, 'w');

    // Write the CSV header
    fputcsv($file, ['post_id']);

    // Write the CSV data
    while ($row = $result->fetch_assoc()) {
        fputcsv($file, $row);
    }

    // Close the output file
    fclose($file);

    // Free the result set
    $result->free();
} else {
    echo "Error: " . $mysqli->error . "\n";
}

// Close the MySQL connection
$mysqli->close();
?>

You can then use these post IDs to construct URLs of the format https://www.example.com/forum/posts/XXX where XXX is the post ID. That will do a 301 redirect to the thread page which has the Imgur images on them. This means you can write another script which visits all those pages, parses the page, and then sends additional requests to download each of those images.

Does the proxy retain images that were posted prior to the proxy being enabled, as the posts/threads are viewed by visitors?

Yes.

Are dormant threads, once they are viewed in the future, going to try to contact the image source to retrieve the image? I would think so, which is why I might want a way to spider the forum so it can retrieve and proxy those images so they are stored while they are still available.

Yes. If the images aren't saved in the proxy, it will attempt to connect to the origin server and in this case of Imgur links, it might not work.

Do search engine page views cause images to be proxied? I wouldn't think so but on the other hand...maybe they do? Or what about archive.org scraping the site? (I normally block archive.org because it's unauthorized used of my content but, if I enabled it long enough for images to be scraped and proxied, I could deny it once again.)

I think so. According to Google, Googlebot does fully render the page, which implies that images are downloaded as well.

I'm not convinced converting to attachments is the answer either, so I haven't looked into it. The process seems too prone to error and/or exclusion when something goes wrong with the process.

I agree with you, as XF's Image Proxy is excellent and I don't understand why people bother with the image-proxy-to-attachment conversion process or plugins. The only way I see something like that could be useful is if it actually parses all of the existing hotlinked images in your forum, fetches all of them, and then saves them as attachments, so you have a permanent snapshot without having to write your own fetcher/parser.
 
Solution
I agree with you, as XF's Image Proxy is excellent and I don't understand why people bother with the image-proxy-to-attachment conversion process or plugins. The only way I see something like that could be useful is if it actually parses all of the existing hotlinked images in your forum, fetches all of them, and then saves them as attachments, so you have a permanent snapshot without having to write your own fetcher/parser.
My issue with converting proxies to attachments is that we've had copyright complaints, and at least if the image was originally hotlinked, we only have to remove the link from the post to satisfy the complaint, vs. having to explain a member uploaded it to our server and we are hosting it now. (They don't need to know about the proxy, in other words; they wouldn't even understand a proxy for that matter.)

I like the idea of a "snapshot" of the post/thread being preserved as attachments but in reality, with greedy attorneys looking for ways to earn their retainer, it's better not to do it.

I may not have time to look into scripting something to spider the site, but in the event Imgur delays things, I have to thank you for the notes above as it's something I can look into and try so I can save as much in the proxy as I can.
 
For our part, I have it in our rules that we expect members will not post inappropriately, have permission to post what they do, and that we will remove anything for whatever reason necessary, including copyright claims. We also mention the same thing regarding member-generated content. Granted, nobody probably reads the rules. But the rule is there, and when we get a copyright notice, we remove the link or image and point the claimant to our rules, and removing the image satisfies them enough to leave us alone. Our forum's topic is in an area that can have a lot of sensitivity to use of copyrighted images, so we've taken the path of least resistance to handle it when it comes up.

It's nothing that keeps me awake at night, honestly.

If anything, I wish we could control the viewing of attachments and embedded images (via the [img] tag) and limit both to logged-in users. So far we can only do that with attachments, and the forum's owner really doesn't want "broken" posts with unviewable images to visitors who are not logged in.
 
Any forms of copyright/trademark infringement require a legal process,
I’m not totally sure what you mean, most infringements I have experienced only involved a legal process once there is a dispute, and most were settled.

I’m under the impression in most jurisdictions all forms of copyright/trademark infringement are illegal and can be prosecuted under either criminal or civil laws depending on the nature of the infringement. Most often civil in my experience.
 
Last edited:
I’m not totally sure what you mean, most infringements I have experienced only involved a legal process once there is a dispute, and most were settled.

I’m under the impression in most jurisdictions all forms of copyright/trademark infringement are illegal and can be prosecuted under either criminal or civil laws depending on the nature of the infringement. Most often civil in my experience.
I will clarify, when I said "nobody cares" - it was tongue in cheek.
 
Top Bottom