[OzzModz] Conversation/DM Keyword Monitor

[OzzModz] Conversation/DM Keyword Monitor [Paid] 2.0.1 Patch Level 1

No permission to buy ($10.00)
I wouldn't create a new thread when it's triggered, just rely on the log (which should have a link to the DM)
That would only work if you had my search conversation addon installed, and I'm not sure if I can check if the addon is installed in a template conditional. It may be possible but I don't remember coming across that.

To not have duplicate log entries or double posts, you have all the information pre-built before posting anyway. So you could hypothetically search if the thread message already exists, and if it does, don't post again (or log it again). The only thing that would cause a double post, from your screenshot, is if a new recipient was added to the conversation; otherwise, the message would be a duplicate and not repost no matter how many times it's run.
Actually what I would do is delete the log and repopulate it, as you could add or remove keywords, so the log would obviously need to be changed to reflect that.
 
Actually what I would do is delete the log and repopulate it, as you could add or remove keywords, so the log would obviously need to be changed to reflect that.
The log would be easy to handle, as I'm assuming it's a table on its own. But the thread creation would be a problem if it ran again, and you didn't search for the pre-built message before posting, thus a 2nd thread.
 
The log would be easy to handle, as I'm assuming it's a table on its own. But the thread creation would be a problem if it ran again, and you didn't search for the pre-built message before posting, thus a 2nd thread.

I honestly wouldn't do the thread creation on scanning of existing conversations, that could create a big mess of threads on the site.
 
I built an add on that runs on a cron job, but I didn't want it to go wild and create duplicate posts (if I accidentally ran it or added something that needed to be posted to post that message after the cron already ran). This may or may not be correct, as I'm still learning, but paints a better picture of what I mean:
PHP:
            // Check for duplicates
            $existingPost = $db->fetchOne(
                "SELECT post_id FROM xf_post WHERE thread_id = ? AND message = ?",
                [$threadId, $postContent]
            );

            if ($existingPost) {
                \XF::logError("Skipping duplicate thread ID: " . $threadId);
                continue;
            }

In the case of this add on, you know what $postContent is and it wouldn't change unless a recipient is added. So, that would prevent a double-post thread.

I think XF:Finder is the way to do it, but this works for now...

This is all from a 4AM perspective and not having downloaded the add on yet...
 
Last edited:
I think given the following, if I batch update the log in batches of 2500 with a 2 second delay between each batch, depending on the following.

Server Performance:
CPU speed, memory, and disk I/O performance.
Database performance (e.g., MySQL/MariaDB optimization).

Batch Size and Delay:
The code processes messages in batches of 2,500 with a 2-second delay between batches.

Message Complexity (time taken to process each message depends on ):
The number of keywords to check.
The length of the message content.
The complexity of the BBCode stripping and keyword matching logic.

Scanning a million messages would be,

Number Of Batches:
Total Batches = 1,000,000 / 2,500 = 400 batches

Time Per Batch:
Total Time Per Batch = 0.5 seconds + 2 seconds = 2.5 seconds

Estimated time:
Total Time = 400 × 2.5 seconds = 1,000 seconds
1,000 seconds ÷ 60 = ~16.66 minutes

Factors that could affect the estimate, server performance, database optimization, message complexity, network latency, PHP configuration.

On my test site, scanning the messages and populating the log takes about ten seconds. I have 17,028 messages and scanning for 25 keywords takes about 10 seconds. It populated the log with 830 entries (I scanned for a slightly popular keyword to get the count up for testing). So it appears if I had 1,000,000 messages (assuming they are not to complex) it would take around 10 minutes to scan and log them.

I am also caching the keywords and conversation titles which can significantly reduce processing time by avoiding redundant computations or database queries. I could increase the batch size, but that could require more memory and could increase processing time per batch.

I am not creating threads for older conversations, just populating the log.
 
I just tried again and added another keyword, "hey" ran the population and it logged 2155 entries and took ~11.73 seconds
 
So I ran a test again, I added another keyword "a" (yes I now) and ran the scan. I did get a timeout (cloudflare page), but the scan continued and eventually finished. I'm uncertain of the time it took, it was not too long though, maybe 3 - 5 minutes, it returned 15,720 log entries out of 17,028 messages, so not too bad of a test. I think this would kill a shared hosting environment though. My specs are,

PHP version 8.4.4
MySQL version 8.0.41
PHP memory_limit 200M
PHP post_max_size 128M
PHP upload_max_filesize 128M
PHP max_input_vars 1000
PHP max_execution_time 120
 
I just noticed when sending to a thread, we do not get alerts (if watching the forum for new threads) . Would it be possible?

Also I not it does not flag edits, is that also possible?
 
OK, I have made a list based on my own (limited) recollection of rude words and a list of terms used by sex predators I found on Australian Federal Police site, including the emojis.

Not sure how to disallow boobies while allowing Blue Footed Boobies. Also breasts could catch quite a few chicken recipes. But as we know this is in most cases just going to be a box ticking exercise.
 

Attachments

Not sure how to disallow boobies while allowing Blue Footed Boobies.
With over-complicated regex. I would err on the side of a false positive than toying around and getting nothing when you could get something that should alert you.

Still haven't had the time to play with this add on, but it seems promising to say the least.
 
With over-complicated regex. I would err on the side of a false positive than toying around and getting nothing when you could get something that should alert you.

This is where scanning the entire conversation table for possible keywords can get tricky, it could slow down the process.
Absolutely agreed, keep it simple.

But we also need a list of bomb making ingredients and the last thing I'm going to do is search Google for that, then sit around waiting for the knock on the door. Or most likely swat team and battering ram.
 
Last edited:
So after a bit more tweaking, I added keywords, a, e, i, o, u and did a scan. Having 17,028 messages in the DB. it returned 16,636 log entries in ~27.75 seconds so not to bad TBH.
 
Looks like partial words are triggering when they shouldn’t (I think)

I’m test with a list that includes the word POS

It triggered when the word post was used.

As I had no wild card * ( as in pos*) I dont think post should have triggered, am I right?
 
That is because it treats the keywords as wildcards, it could be changed so POS does not get triggered by post, compost etc, if thats what we think is best.
 
So I've refined the logic.

Code:
If the keyword is pos, it will match "pos" but not "post", "compost", "position"
If the keyword is pos*, it will match "pos" and "post",  but not "compost", "position"
If the keyword is *pos, it will match "pos" and "compost",  but not "post", "position"
If the keyword is *pos*, it will match "pos", "compost",  "post", "position"
 
Last edited:
Back
Top Bottom