Question: Does the add-on handle properly the URLs containing Unicode characters?
For all admins working on boards using Unicode and other non-URL compatible characters in URLs we would like to stress the importance of understanding, how URLs containing special characters work. URLs themselves are not allowed to have Unicode/special characters. All Unicode characters are converted to their compatible version (URL encoding). You can confirm this by editing any post, which has a Unicode-containing link and see the source code. And you will see the link does not contain Unicode characters and is fully encoded. Basically, our product just takes the URLs from the post and processes them just as they are in the text. If a user follows the URL, browsers show the text in Unicode characters, but this does not mean, that the URL actually has this characters inside. We have decided, however, to avoid confusion for admins, to show decoded characters on the list and Batch Update page. When you are editing a link, however, we will show the real URL, and any URL you paste as a replacement should be properly encoded. You should not try to type Unicode characters in the field directly, but rather copy the link from a browser, which will automatically do the conversion for you.
Here is the list of processes, where we do URL encoding conversion:
All admins dealing with Unicode URLs are strongly advised to backup the database before using the tool. We recommend testing batch replacement with one-two URLs only, checking the actual post content, and making sure the replacement was processed as expected.
Question: What is the User Agent option and how should I use it?
- In Manage Links page, you will see the Unicode characters inside URLs.
- When editing a link, you will see the Unicode characters in Full Code sector but will see the encoded version in the field to edit the URL.
- On Batch update page, Search Link URL content will be automatically URL-encoded before the search. So the text you type there can be in Unicode or contain special characters, they will be converted to URL format on the fly.
- Process URL field content will be transformed into URL-encoded version only if you use "Replace found part" mode. So, if using "Replace found part" mode, you should use Unicode characters in both search and replacement field. However, if you are using "Replace Fully" mode, the URL you will use in the field should be fully valid, meaning all Unicode characters should be already converted to URL-compatible version.
- Regular expression matching is done after the search term is URL-encoded.
- Subpattern replacement is done before the replacement term is converted to URL-encoded version.
- In replacement preview, you will see the Unicode characters, but in actual replacement will use valid URL-encoded strings.
User Agent is a string identifier, that identifies your board for the remote sites, for them to know "who" makes the request. By default, the "User Agent" contains your board URL. In some cases, this identifier is used on remote sites to block your website from making further requests to them. You can change the user agent used in this option, but you should do so at your own risk, as changing of the user agent to overcome site access limitations may be illegal in some cases. User agent format is usually ProductName/number.number (any comment here).
Question: Why I see Timeout status on some links sometimes, and the same links work in another case?
Answer: Timeout status means, that the request to check link's validity timed out, meaning the remote server did not reply in time. This can happen, when, for example, the remote server is down for some time. It also can be due to the fact, that links are checked "quickly" when the user posts them, with 0.5-second timeout, but the links are scheduled for more thorough check, with 5 seconds timeout, via a deferred task. So, you may see a link as invalid right after posting, but after some time, if the link is valid, it will turn to valid status.
Some remote servers prefer not to send 403 Forbidden error when the access to some resource is denied (e.g. you IP is blocked on their server), but to block the request by keeping it open long time and eventually timing out. In this case you will get permanent timeout status even if you re-check the link.
Rebuild Data admin page also allows to manually set the number of seconds for a request to timeout. If you see too many timeout errors, you should consider to increase this limit. We have also implemented the option to check only timed-out links, so we would recommend to check all your links with normal timeout limit (1 second), and only after that check the links that timed-out with higher timeout limit (e.g. 5 seconds).
Please note, that timeout error is a connectivity issue and not a bug in the product, and if the same link times out one time and not another time with all other conditions being equal, that does not mean the product works differently each time, but that the link is available for not at that moment.
Question: Why I see some URLs as valid, but visiting them I see they dead/not found?
Answer: The product depends on information provided by the remote server in page headers when checking the status of a link. In the case described, that means the server returns an error page to users, but in headers it actually returns a success code.