Add-on Top Content

Daniel Hood

Well-known member
Something that has always bugged me about forum software is that new content is heavily more featured than existing content. Years into a community's life cycle, a forum may have some really quality posts/threads that don't get any attention any more. I get that things, when they get older may hold less significant value but they can still have some importance.

This example isn't perfect due to the fact XenForo offers a solution to combat this particular instance but take a look at the suggestion board: there's 4,268 suggestions (assuming that each 'discussion' is in fact a suggestion). There could be some quality suggestions in there, and there's probably several duplicates, but it has 54 pages. Nobody is going through all 54 pages. The feature I was referring to combat this is the ability to sort nodes threads by likes instead of last post, which they actually do here but that still requires people liking the first post.

What I'm proposing is an add-on (that I will be developing soon, ad manager is still my highest priority), that will display a new sub menu under Forums called "Top Content". This will have searchable content, where you select threads or posts (default options, I'm going to make it semi-easy (no coding will be required) to add more content types such as blogs, gallery items, etc) along with which node you want to view. Then I'll return a list (or the actual post layout, depending on options set) of the highest quality results.

The quality score will be determined with a formula that I'm not entirely sure of at the moment. I plan to take into account the posts age (as it's still relevant), the # of likes received, the # of views and replies (only applicable to threads).

I imagine it'll look something like this:
(5*likes) + (50 * (replies / views)) - (daysOld/4)

so a thread with 5 likes, 100 views, and 1 reply that was made yesterday would receive a score of:

25 + 50(1/100) - .25 = 25.25

I need help deciding how important each aspect is. The actual score won't ever be seen so it doesn't really matter how big of a number we end up working with.
 
I tried this, it sort of helps. I was playing with the weights a lot until I got what I wanted.

Basically, logarithmic works better. As a guide, look at the algorithm used by reddit which uses upvotes to move things up, and the date. Only in here instead of upvotes you would have likes and replies.

http://scienceblogs.com/builtonfact...ddit-rankings-or-how-upvotes-are-time-travel/

The main problem to overcome witht the formula is that, say, you assign too much weight to a variable (say, replies), then your top content became effectively ordered by replies in descending order, so your formula of 5*likes and 50*replies does not sound right out of the box.

why are you making replies/views? I would expect replies to have a eight, views to have a weight, and that to get added instead.

The other problem I came across is that I identified that likes are a good indicator of good to promote content. However, the fact that the likes in the thread are not stored makes it kind of useless. I ended up having to extend the system to store the number of likes in the thread (likes in the first post is not good enough signal).

Other than that, I would say, implement it, and play with the numbers, until you get comfortable. Being able to see "why" a thread is ranked so highed (by inspecting how the weight is calculated) helps a lot. Then you can fine tune the formula. Basically what happened with me is that I was showing my "top content", then I looked at something and though "this is not interesting, why is this thing here?" and I ended up decreasing the weight of some variables.
 
Something that has always bugged me about forum software is that new content is heavily more featured than existing content.
Definitely !
I've *NEVER* seen a forum software package with Searching that provides meaningful results.
And of course, we are all spoiled by Google.

What I'm proposing is an add-on (that I will be developing soon, ad manager is still my highest priority), that will display a new sub menu under Forums called "Top Content".
I don't like the idea of hiding the Top Content. I think you should use the default search box (or at least the option to do so).

The quality score will be determined with a formula that I'm not entirely sure of at the moment. I plan to take into account the posts age (as it's still relevant), the # of likes received, the # of views and replies (only applicable to threads).
That might be similar to Xenforo's Search addon that has "Relevance".
upload_2013-9-17_20-36-22.webp

One thing for you to study is .... comparing forum searches with a google search of the site. Google almost always provides more useful searches.

The most neglected user in searching is the new user. They really are usually looking for the "Top Content" but never find it.

Use Real Data: One good idea to ground yourself in reality might be to ask admins to submit what their users are *actually* searching for. I am not sure if Xenforo records what people are searching for. (Anyone?). Ideally admins could send you what their users are actually searching for.

Another factor in searching might be to consider your sponsors. If McDonalds sponsored your site ... if someone searched for "Hamburger" .. maybe McDonalds might get some preference in the results. Another idea I just had ... what if sponsors could pay for certain search terms ... what does that sound like ? (A: Adsense).

Another interesting thing for you to think about is .... I'll bet for alot of search terms ... the admin probably knows that the top results should be.

... more later :)
 
Last edited:
why are you making replies/views? I would expect replies to have a eight, views to have a weight, and that to get added instead.

I can't really respond to everything perfectly right now, but the fact that a thread is viewed a lot doesn't mean it is good. In fact, it can mean the opposite. If a thread has a million views but only 3 replies, that's probably a crap thread with a very interesting title. That was kind of the logic.

I'll respond to all the other thoughts in a little bit. Sorry.
 
I can't really respond to everything perfectly right now, but the fact that a thread is viewed a lot doesn't mean it is good. In fact, it can mean the opposite. If a thread has a million views but only 3 replies, that's probably a crap thread with a very interesting title. That was kind of the logic.

Number of views can often reflect the age of a thread, not the quality.

The number of views a thread receives still is probably the most robust indicator of it's popularity.

Another idea: Give more weight to the threads that Google likes on your forum.
 
Now that I have more time I'll respond in more detail. Wasn't expecting to be available that quickly.

I tried this, it sort of helps. I was playing with the weights a lot until I got what I wanted.

Basically, logarithmic works better. As a guide, look at the algorithm used by reddit which uses upvotes to move things up, and the date. Only in here instead of upvotes you would have likes and replies.

http://scienceblogs.com/builtonfact...ddit-rankings-or-how-upvotes-are-time-travel/

Thanks for the link, I'll take that into consideration for sure.

The main problem to overcome witht the formula is that, say, you assign too much weight to a variable (say, replies), then your top content became effectively ordered by replies in descending order, so your formula of 5*likes and 50*replies does not sound right out of the box.

Well first off, it was 50*(replies/views) which is very very very different than 50*replies. That entire formula was a spur of the moment example, and because finding a good weight for each variable is so important I was attempting to explain what I was going for and asking fellow developers + admins + forum users for help in establishing what is important.

The other problem I came across is that I identified that likes are a good indicator of good to promote content. However, the fact that the likes in the thread are not stored makes it kind of useless. I ended up having to extend the system to store the number of likes in the thread (likes in the first post is not good enough signal).

This is still entirely do-able by joining the post table on the thread id and adding together all the likes count, we could even do an average per post... not sure how to weigh the average like count per post though. I have no problem making this as complex as it needs to be to get each community's top content out there.

Other than that, I would say, implement it, and play with the numbers, until you get comfortable. Being able to see "why" a thread is ranked so highed (by inspecting how the weight is calculated) helps a lot. Then you can fine tune the formula. Basically what happened with me is that I was showing my "top content", then I looked at something and though "this is not interesting, why is this thing here?" and I ended up decreasing the weight of some variables.
That's a good point, the forumla may be different for each community. Maybe I could make a system where admins can adjust the weights to their preference. Of course I'd provide a default.

Definitely !
I've *NEVER* seen a forum software package with Searching that provides meaningful results.
And of course, we are all spoiled by Google.


I don't like the idea of hiding the Top Content. I think you should use the default search box (or at least the option to do so).

I could modify the search system I suppose but I wasn't really planning on hiding the Top Content, it was going to be right next to the New Content (New Posts).

That might be similar to Xenforo's Search addon that has "Relevance".
View attachment 56798

One thing for you to study is .... comparing forum searches with a google search of the site. Google almost always provides more useful searches.

XenForo's search does seem pretty good by default, every time I use it I find what I'm looking for. It just seems old, high quality content does get lost. When I'm trying to find the next add on I want to make I check a few boards (custom requests, requests, and suggestions being the main ones). If I don't find something on the first page then I get discouraged and make something random (Promoted Content, Posts Per Day, Notable Members [the most followers tab part of it, the caching and recent activity parts came from suggestions] are my random ones, Hashtag, XMWidgets, and my others are from requests and suggestions). If there was a link where I could view "Top Content" which showed me the threads with the most quality activity (based of user engagement (likes and replies), lots of views with no likes or replies is a sign of a bad thread) then it'd save me effort + result in more desired stuff).

The most neglected user in searching is the new user. They really are usually looking for the "Top Content" but never find it.
Bingo.

Use Real Data: One good idea to ground yourself in reality might be to ask admins to submit what their users are *actually* searching for. I am not sure if Xenforo records what people are searching for. (Anyone?). Ideally admins could send you what their users are actually searching for.

Pretty sure it is recorded. This is never going to be the same for any community though. For example while on a coding forum AJAX should result in the best ajax tutorial, on motherlyhomecare.com (100% made up off the top of my head right now) would want to display the cleaning product.

Another factor in searching might be to consider your sponsors. If McDonalds sponsored your site ... if someone searched for "Hamburger" .. maybe McDonalds might get some preference in the results. Another idea I just had ... what if sponsors could pay for certain search terms ... what does that sound like ? (A: Adsense).

Well now we're just getting way off base and into some custom work lol.

Another interesting thing for you to think about is .... I'll bet for alot of search terms ... the admin probably knows that the top results should be.

... more later :)

That's a good point and I should find a way to use that to my advantage (for them to use to their advantage).
 
Here's a more serious effort at a qs:
Code:
(
    (t.first_post_likes) +
    ((sum(p.likes) / t.reply_count) * 5) +
    5 * (count(DISTINCT(p.user_id)))
)
Basically it considers the first post likes, the amount of unique posters (so one person can't just bump it all the way up), and the average likes per post.
Not taking into account of the post's age yet. This produced these threads on xenmods as my top threads:

http://xenmods.com/threads/grand-opening.1/ (23 likes on posts in this thread and 14 unique posters)
http://xenmods.com/threads/most-liked-posts.9/
http://xenmods.com/threads/xenmods-widgets-bd-widget-framework-extension.8/
http://xenmods.com/threads/xenmods-widgets-bd-widget-framework-extension.6/
http://xenmods.com/threads/notable-members-most-liked-posts.10/
http://xenmods.com/threads/shop.29/
http://xenmods.com/threads/notable-members.11/
http://xenmods.com/threads/multiple-prefixes.15/
http://xenmods.com/threads/image-cloud-widget.27/

Some of these are in a private board and not viewable by everybody (permissions will be checked when output is started). It's obviously not perfect and can be improved upon. However, I think it's a decent start. I'm going to start working on the output shortly.
 
Any "group by", especially on the post table would kill a busy board, it is just too slow.
For conceptual testing it is fine, but for the real thing I'd rather have a weight precalculated on the thread table
 
Any "group by", especially on the post table would kill a busy board, it is just too slow.
For conceptual testing it is fine, but for the real thing I'd rather have a weight precalculated on the thread table

Seeing as the content won't change often, I was planning to cache the results and update via cron/deferred method every 12 hours or so. On an established board, it won't change frequently, or it shouldn't at least. If it's pre-calculated and stored on the thread table it'd have to be updated every post or like on the thread or any post in the thread.

Hope I don't come across as dismissive. I appreciate all feedback on how to approach this.
 
Last edited:
Seeing as the content won't change often, I was planning to cache the results and update via cron/deferred method every 12 hours or so. On an established board, it won't change frequently, or it shouldn't at least. If it's pre-calculated and stored on the thread table it'd have to be updated every post or like on the thread or any post in the thread.

Yeah. it's a lot more work. It would mean updating the thread table on every like (or a secondary table to prevent locking on the thread one). It starts being useful if the calculation is more complex, or needs to be refreshed more often.

I did a sample test on my mysql slave, just to see how slow it could be. This installation of mine has 1,200,00 posts

To test, I did a
Code:
select thread_id, sum(likes) from xf_post group by thread_id;
It took 25.17 seconds.

So it might not be that bad. Worst case scenario the table is locked for less than 30 seconds. It might work. The slave is underworked, though, it might be slower on the main production machine.

Now,to spice up your test, I did a
Code:
SELECT t.thread_id, t.title FROM xf_thread t
INNER JOIN xf_post p ON t.thread_id = p.thread_id
GROUP BY p.thread_id
ORDER BY
(
    (t.first_post_likes) +
    ((sum(p.likes) / t.reply_count) * 5) +
    5 * (count(DISTINCT(p.user_id)))
) DESC LIMIT 20;

Which is your actual query
That one took 32.89 seconds

... and by the way all the topics generated are really crappy :) But that is my fault, I have an offtopic forum and threads like "What are you eating now", "Which videogame are you playing now" and "Say the first word that come to mind" are eerily active, but not really that interesting. In my particular case I would probably need to add a forum exclusion list in there :)
 
Last edited:
I see your point with the constant re-calculation. Plus then the top content wouldn't have to be cached, it could be served live every load (even though it still won't change often at all).

There would definitely have to be options to remove nodes from the global page, I imagine each forum would also have it's own page though and you could see those threads in that page (for example http://xenmods.com/forums/announcements.2/Top-Threads would display the same info just with "where node_id = 2").

I think you are right about recalculating on each post/like is a good idea.
 
Back
Top Bottom