Media Attachment mirroring - search query

briansol

Well-known member
I want to set up attachment mirroring on my site.

At first, i made a gallery category called 'attachments', and it got large and stupid fast. :D

So, i thought i could break it down more into topics or tags and put related media there. 'widget A' and 'widget B' for example.
but if i only the option to source from a forum, 'widgets', i can't separate out the desired images. Some may say i should split these 2 up into their own forum, but topically, it doesn't make sense to do so.

Now, i realize that there is no good way to know if the attachment represents widget A or B, but perhaps i could search for it and tweak my search to a couple keywords. Maybe "tall" only describes "A" widgets so a mirror from widgets where title like %tall% could populate that one, and a not like %tall% the rest?

I'm sure there's better ways, but it would be nice to have more tools to do this other than a whole forum. Maybe leverage the 'search result as a forum' tool to build a list of desired media to do the mirroring from?
 
Upvote 3

arn

Well-known member
the attachment mirroring does seem a very 1.0 feature with a lot of potential features.

Search media categories would be nice. But the challenge is how search would work for primarily images. People are bad at naming things.

right now I believe it just searches file names and tags. Seems like post content and thread title could be weaker signals as well.
 

briansol

Well-known member
Yeah, it's a hard ask because the attachment data is not labeled data. DSC12312312_2020.jpg doesn't tell us much.

Which elevates the question -- is there a tool to help label our data better with some manual 'training' set? The obvious answer built in to XF is no. but maybe it doesn't have to be a native XF tool. It's probably better off built in python anyway.

We're quickly venturing into machine learning, but it's something i've been thinking about, running a POC on Sagemaker (since my attachments are all on s3 bucket anyway), but sagemaker isn't good about feature weighting which may be a critical success factor to a model like this.


build a quick training set
20 images that represent A
20 images that represent B

run the rest through the model.
if no match, leave them as unlabeled and a more refined model can be built on 40 more samples.

build some way to store the meta data in a new table? (ID, img.jpg, bucketA) ?

Sure, it may mess up some outliers, but i'm not expecting perfection? moderators can move them manually later (is that a feature?) as they show up.
Otherwise, 80 or 90% of the attachments are sorted in the correct bucket.
 
Last edited:

briansol

Well-known member
wordpress plugins are light years ahead of xf. it's discouraging, but understandable.... wp marketshare is likely 10000s of times larger than xf. I haven't messed with azure services, but that looks really simple.
 
Top