frm
Well-known member
A lot of blackhat scraping is done, primarily for email addresses (sometimes for content).
I propose BB code for anti-scraping purposes that can either be used for 1) email only or 2) full anti-scraping purposes of entire blocks of text. While it's not fool-proof, as some software could learn and scrape (as well as read) it anyway, it could be designed to use different fonts and display "CAPTCHA style" as well using either the PHP built-in GD image library or Imagemagick PECL extension, as seen in the Default image processor, making it more difficult, where human CAPTCHA solvers need to be used in the process, making it an expensive endeavor where they may not go over your sites at all.
I would propose something along the lines of an email such as
The second suggestion is to implement an entire text output to be readable by viewing the image only, such as the QUOTE tag, but the text converted to a human-readable image.
I'd prefer #1 over #2, if anything, as #2 could reduce SERPS, but, there are some things you can hide in images as opposed to text (grey hat).
If either is used, it would be permission-based (i.e., if you can't view attachments, you can't view images generated).
I propose BB code for anti-scraping purposes that can either be used for 1) email only or 2) full anti-scraping purposes of entire blocks of text. While it's not fool-proof, as some software could learn and scrape (as well as read) it anyway, it could be designed to use different fonts and display "CAPTCHA style" as well using either the PHP built-in GD image library or Imagemagick PECL extension, as seen in the Default image processor, making it more difficult, where human CAPTCHA solvers need to be used in the process, making it an expensive endeavor where they may not go over your sites at all.
I would propose something along the lines of an email such as
john@doe.com
to be validated and converted to an inline image containing the text of the email address only if used with BB code such as [EMAIL]john@doe.com[/EMAIL]
.The second suggestion is to implement an entire text output to be readable by viewing the image only, such as the QUOTE tag, but the text converted to a human-readable image.
I'd prefer #1 over #2, if anything, as #2 could reduce SERPS, but, there are some things you can hide in images as opposed to text (grey hat).
If either is used, it would be permission-based (i.e., if you can't view attachments, you can't view images generated).
Upvote
1