SELECT p1.post_id AS 'original post id'
, p2.post_id AS 'possible duplicate post id'
FROM xf_post AS p1
INNER JOIN xf_post AS p2 ON (
p2.message = p1.message
AND p2.post_date BETWEEN p1.post_date AND p1.post_date+100
AND p2.user_id = p1.user_id
AND p2.post_id != p1.post_id
AND p2.post_id BETWEEN 1 AND 5000
AND p1.post_id BETWEEN 1 AND 5000
ORDER BY p1.post_id
I added the red pieces to operate on a small subset of my post table because it was taking too long to process all 130k posts.
This query will return post_ids for posts with the same message by the same user that were made within 100 seconds of each other.