[OzzModz] Conversation/DM Keyword Monitor [Deleted]

Ozzy47 · Mar 16, 2025

frm said:
I wouldn't create a new thread when it's triggered, just rely on the log (which should have a link to the DM)

That would only work if you had my search conversation addon installed, and I'm not sure if I can check if the addon is installed in a template conditional. It may be possible but I don't remember coming across that.

frm said:
To not have duplicate log entries or double posts, you have all the information pre-built before posting anyway. So you could hypothetically search if the thread message already exists, and if it does, don't post again (or log it again). The only thing that would cause a double post, from your screenshot, is if a new recipient was added to the conversation; otherwise, the message would be a duplicate and not repost no matter how many times it's run.

Actually what I would do is delete the log and repopulate it, as you could add or remove keywords, so the log would obviously need to be changed to reflect that.

frm · Mar 16, 2025

Ozzy47 said:
Actually what I would do is delete the log and repopulate it, as you could add or remove keywords, so the log would obviously need to be changed to reflect that.

The log would be easy to handle, as I'm assuming it's a table on its own. But the thread creation would be a problem if it ran again, and you didn't search for the pre-built message before posting, thus a 2nd thread.

Ozzy47 · Mar 16, 2025

frm said:
The log would be easy to handle, as I'm assuming it's a table on its own. But the thread creation would be a problem if it ran again, and you didn't search for the pre-built message before posting, thus a 2nd thread.

I honestly wouldn't do the thread creation on scanning of existing conversations, that could create a big mess of threads on the site.

frm · Mar 16, 2025

I built an add on that runs on a cron job, but I didn't want it to go wild and create duplicate posts (if I accidentally ran it or added something that needed to be posted to post that message after the cron already ran). This may or may not be correct, as I'm still learning, but paints a better picture of what I mean:

PHP:

            // Check for duplicates
            $existingPost = $db->fetchOne(
                "SELECT post_id FROM xf_post WHERE thread_id = ? AND message = ?",
                [$threadId, $postContent]
            );

            if ($existingPost) {
                \XF::logError("Skipping duplicate thread ID: " . $threadId);
                continue;
            }

In the case of this add on, you know what $postContent is and it wouldn't change unless a recipient is added. So, that would prevent a double-post thread.

I think XF:Finder is the way to do it, but this works for now...

This is all from a 4AM perspective and not having downloaded the add on yet...

Ozzy47 · Mar 17, 2025

I think given the following, if I batch update the log in batches of 2500 with a 2 second delay between each batch, depending on the following.

Server Performance:
CPU speed, memory, and disk I/O performance.
Database performance (e.g., MySQL/MariaDB optimization).

Batch Size and Delay:
The code processes messages in batches of 2,500 with a 2-second delay between batches.

Message Complexity (time taken to process each message depends on ):
The number of keywords to check.
The length of the message content.
The complexity of the BBCode stripping and keyword matching logic.

Scanning a million messages would be,

Number Of Batches:
Total Batches = 1,000,000 / 2,500 = 400 batches

Time Per Batch:
Total Time Per Batch = 0.5 seconds + 2 seconds = 2.5 seconds

Estimated time:
Total Time = 400 × 2.5 seconds = 1,000 seconds
1,000 seconds ÷ 60 = ~16.66 minutes

Factors that could affect the estimate, server performance, database optimization, message complexity, network latency, PHP configuration.

On my test site, scanning the messages and populating the log takes about ten seconds. I have 17,028 messages and scanning for 25 keywords takes about 10 seconds. It populated the log with 830 entries (I scanned for a slightly popular keyword to get the count up for testing). So it appears if I had 1,000,000 messages (assuming they are not to complex) it would take around 10 minutes to scan and log them.

I am also caching the keywords and conversation titles which can significantly reduce processing time by avoiding redundant computations or database queries. I could increase the batch size, but that could require more memory and could increase processing time per batch.

I am not creating threads for older conversations, just populating the log.

Ozzy47 · Mar 17, 2025

I just tried again and added another keyword, "hey" ran the population and it logged 2155 entries and took ~11.73 seconds

Ozzy47 · Mar 17, 2025

So I ran a test again, I added another keyword "a" (yes I now) and ran the scan. I did get a timeout (cloudflare page), but the scan continued and eventually finished. I'm uncertain of the time it took, it was not too long though, maybe 3 - 5 minutes, it returned 15,720 log entries out of 17,028 messages, so not too bad of a test. I think this would kill a shared hosting environment though. My specs are,

PHP version 8.4.4
MySQL version 8.0.41
PHP memory_limit 200M
PHP post_max_size 128M
PHP upload_max_filesize 128M
PHP max_input_vars 1000
PHP max_execution_time 120

Mr Lucky · Mar 17, 2025

I just noticed when sending to a thread, we do not get alerts (if watching the forum for new threads) . Would it be possible?

Also I not it does not flag edits, is that also possible?

Ozzy47 · Mar 17, 2025

Mr Lucky said:
I just noticed when sending to a thread, we do not get alerts (if watching the forum for new threads) . Would it be possible?

I wouldn’t expect to as it’s “automated “ but I’ll look into if in a future update.

Mr Lucky said:
Also I not it does not flag edits, is that also possible?

Maybe, but I’d just set a short edit time.

Mr Lucky · Mar 17, 2025

Ozzy47 said:
I'm fine with people sharing their lists, but like you said, it would have to be in a zip file to get around XF's censorship.

OK to share in this thread?

Ozzy47 · Mar 17, 2025

Mr Lucky · Mar 17, 2025

Ozzy47 said:
Sure

OK, I have made a list based on my own (limited) recollection of rude words and a list of terms used by sex predators I found on Australian Federal Police site, including the emojis.

Not sure how to disallow boobies while allowing Blue Footed Boobies. Also breasts could catch quite a few chicken recipes. But as we know this is in most cases just going to be a box ticking exercise.

Ozzy47 · Mar 17, 2025

Yeah it may be like whack-a-mole with some of the words, but better op safe than sorry.

frm · Mar 17, 2025

Mr Lucky said:
Not sure how to disallow boobies while allowing Blue Footed Boobies.

With over-complicated regex. I would err on the side of a false positive than toying around and getting nothing when you could get something that should alert you.

Still haven't had the time to play with this add on, but it seems promising to say the least.

Ozzy47 · Mar 17, 2025

frm said:
With over-complicated regex

This is where scanning the entire conversation table for possible keywords can get tricky, it could slow down the process.

Mr Lucky · Mar 17, 2025

frm said:
With over-complicated regex. I would err on the side of a false positive than toying around and getting nothing when you could get something that should alert you.

Ozzy47 said:
This is where scanning the entire conversation table for possible keywords can get tricky, it could slow down the process.

Absolutely agreed, keep it simple.

But we also need a list of bomb making ingredients and the last thing I'm going to do is search Google for that, then sit around waiting for the knock on the door. Or most likely swat team and battering ram.

Ozzy47 · Mar 18, 2025

So after a bit more tweaking, I added keywords, a, e, i, o, u and did a scan. Having 17,028 messages in the DB. it returned 16,636 log entries in ~27.75 seconds so not to bad TBH.

Mr Lucky · Mar 19, 2025

Looks like partial words are triggering when they shouldn’t (I think)

I’m test with a list that includes the word POS

It triggered when the word post was used.

As I had no wild card * ( as in pos*) I dont think post should have triggered, am I right?

Ozzy47 · Mar 19, 2025

That is because it treats the keywords as wildcards, it could be changed so POS does not get triggered by post, compost etc, if thats what we think is best.

Ozzy47 · Mar 19, 2025

So I've refined the logic.

Code:

If the keyword is pos, it will match "pos" but not "post", "compost", "position"
If the keyword is pos*, it will match "pos" and "post",  but not "compost", "position"
If the keyword is *pos, it will match "pos" and "compost",  but not "post", "position"
If the keyword is *pos*, it will match "pos", "compost",  "post", "position"

[OzzModz] Conversation/DM Keyword Monitor [Deleted]

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Attachments

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Similar threads

We value your privacy