Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a Reddit thread between various users and keltranis, (one of the more senior coders Reddit ever had) explaining the code here:

http://www.reddit.com/r/programming/comments/td4tz/reddits_a...

And a quotation for those not wanting to click:

  *ProfDrMorph* 2 points 1 year ago
  So that means all posts in all subreddits (when browsing
  'hot') are sorted this way:
  1. all posts with more upvotes than downvotes with the
  order determined by age (newer posts are preferred) and
  popularity
  2. all posts with the same number of up- and downvotes in
  whatever order the database returns them
  3. all posts with less upvotes than downvotes with the
  order determined by age (older posts are preferred) and
  popularity (posts with a lot more downvotes are preferred)
  Because that's what the _hot() function implies if the
  sorting algorithm uses it as a 'key'.

  *ketralnis* 2 points 1 year ago
  Yes that's accurate


I'm glad you found that thread and got to the top with it, I hate trying to dive into ongoing conversations and try to change peoples' minds when the hivemind has already made its decision. This comes up every 6 months or so, always with some sensational title like this.

It feels a little weird quoting myself, but I also said:

> The thing is, the two most important pages are the front page (or a subreddit's own hot page) and the new page. The new page is sorted by date ignoring hotness, and if something has a negative score it's not going to show up on the front/hot page anyway. The two other main opportunities to get popular (rising and the organic box) don't really use hotness either.

> So when it comes down to it, what happens below 0 is pretty moot. Smoothness around the real life dates and scores on the site is more important than smoothness around 0, where we don't really have listings that will display it anyway.

In summary, there don't exist listings in which the discontinuities at 0 really matter


That's begging the question. Posts with negative scores are completely banned from the front/hot page because of this bug/feature/discontinuity. You can't justify it with itself.

What you can say is that you want posts to disappear from the hot page as soon as they go to -1, in which case I'll say that it's more than a little weird for the first voter to hold so much power.


Maybe some civil disobedience will get their attention. What about a script that automatically downvotes every new post?


You would just blend in with all the spammers that try to do this already.


That's a good way to get banned


A lot of accounts + a lot of different IPs would get around that.


No, it wouldn't. Do you really think you're the first person to try and game the system like that? You know who else tries that? Spammers who think that down voting every submission but their own actually works.


I am not, and I have no plans to do that.

But for the particular problem, it would be a solution, provided Reddit did not have any mechanics in place to prevent that exact thing. And it would be stupid to assume you do not.

I have no interest in doing anything like it, I browse Reddit a lot.

PS: Totally unrelated to this, but please look into the API returning a ton of HTTP 503/504 gateway timeouts. It's been happening to me across several servers in different regions of the world.


> Totally unrelated to this, but please look into the API returning a ton of HTTP 503/504 gateway timeouts. It's been happening to me across several servers in different regions of the world.

Ironically, that's probably the rate limiting blocking you. Are you hitting the API more often than once every 30 seconds?


I have a cron job that looks at comments in a thread about every 2-5 minutes. At most it should be making a call to find a subreddit and then a call to find the comments.


I think the latter is exactly what they're saying. It's not weird. What question is being begged?



> [The bug is okay because] if something has a negative score it's not going to show up on the front/hot page anyway.

I...what? This is just wrong. If it were not for the bug, then many posts with a negative score would show up on hot pages. Due to the bug, many posts which would otherwise show up do not. The bug is changing how things are working, and it is doing so in a way which has clear impacts on hot pages.

> In summary, there don't exist listings in which the discontinuities at 0 really matter

To the extent this is true, it is true only because there is a bug in the code that hides posts with negative points from the hot pages. What you are saying is that it doesn't matter if posts with negative points are shown on hot, because posts with negative points are not shown on hot.

...I hesitate to even ask this, but, well: Do you actually understand the bug, and the impact it has on how posts are sorted? Because your attempts at explaining it make no sense. The reason negative posts don't show up on the front page is because of this bug. That's why the bug has some (slight) importance; it is not the reason why the bug has no impact. It does have an impact.


> If it were not for the bug, then many posts with a negative score would show up on hot pages.

I think you misunderstood. What I got from this, is that articles under 0 MUST NOT show up on the Hot page.

Now that could probably have been implemented as a filter, but I don't think it's a bug doing it in this weirder way.


Well, here's the thing. ketralnis said:

> The new page is sorted by date ignoring hotness, and if something has a negative score it's not going to show up on the front/hot page anyway.

The key word there is "anyway". We're discussing the code that makes them not show up on the hot pages, so the word "anyway" makes no sense. If your theory were true, I would expect him to say: "This code is important because if something has a negative score then we don't want it to show up on the hot page", but he doesn't; rather he treats negative articles not showing up as law of nature. Paraphrasing, he says "This code is unimportant because it doesn't do anything, because article wouldn't have shown up anyway" But it does do something, and they would have shown up.

My theory is that the confusion comes from him saying "front/hot" page. Everything he said is true about the front page, and the front and hot pages use the same algorithm. But the same algorithm applied to different data yields very different results, and everything he said is blatantly false when discussing the hot page of small, low-traffic subreddits.

In short, I think he's trying to say "hey, negative articles won't show up on the default front page no matter what, so what are you talking about?". And the response is "yes, but it has a huge impact everywhere". (Notably: The hot page of small subreddits, as well as some customized front pages and multireddits.) You can't conflate front/hot the way he does, because the bug is ONLY shown when _hot() is called on a low-traffic source.


I think he either expressed himself badly or there is an additional filter that prevents <0 posts to show up on any hot-sorted page.


> What I got from this, is that articles under 0 MUST NOT show up on the Hot page.

And the counterexample is given above: a not-very active subredddit, one new post in the last day, with a score of -1 after one vote. Are you absolutely sure that this post MUST NOT show up on the Hot page?


I'm not saying it is the best way, I'm just arguing that it indeed is not a software bug.


You might argue that it is not a software bug but a mistake or odd choice in the design thought process. The way that it seems to fall accidentally and obscurely from the implementation details argues strongly against this.


> In summary, there don't exist listings in which the discontinuities at 0 really matter

This is not true.

I am active in a small (local area) subreddit, and there sometimes a post totally disappears from the "hot" listing. Not only from page 1, but it cannot be found on pages 2 or 3 either.

When there is a recent post with 4 upvotes at the top of the hot list, a recent post with -1 votes still would deserve to rank higher than a week old post with a small positive vote score, don't you think?

I was really wondering, do the mods remove posts that quickly get 2 downvotes. But this bug explains my observations.


But there is! It's called "Not-that-active subreddits".

A post at -1 from yesterday, in a subreddit with only 1 post a day, is completely deserving of being on the front page of that sub (top 30) posts. If all negative ones are banished from the top... that's very bad.


From what I've read, I'm in favor of merging the PR.

That said, maybe they aren't particularly concerned about subreddits with minimal activity, because by definition not many people use them.


By definition, not many people use any particular minimal-activity subreddit. There are a lot of subreddits. It's certainly possible many people use at least one minimal-activity subreddit.


The problem occurs on low-traffic subs. The front-page of a low-traffic sub will often have the 1s listed right on the front - new stuff ready for upvoting.

All it takes is one downvote to kick those things off the front. These low-traffic subs don't have enough users to have their own Knights of New constantly patrolling the /new view, so the Hot page is pretty much it.

Basically, it becomes a job of a moderator to constantly check /new to rescue anything that suffered a downvote infanticide.


> This comes up every 6 months or so

And it has never occurred to the reddit developers to document the behaviour in the code base? Regardless of whether there’s a bug in the implementation, that is one poorly-written piece of code.


The best justification anyone can come up with for not fixing the bug is an excuse that it probably isn't that important? Huh? Even if it only affects 1% of users, why not fix it? It's only 2 characters, and there have apparently been pull requests for ready-made fixes already. Why not just fix it, make Reddit a tiny bit better, and not have people complaining about it every 6 months or whatever.


This makes we wonder, did you A/B test your ranking algorithm?


How does it make any sense to multiple the date by the sign of the score? It's true that it only matters when the score is negative, and so it doesn't really affect much in practice, but the code is clearly a typo.


Looking at that code I find it really hard to believe that is the intended formula. It's clearly an attempt at logarithmic magnitude in either direction with a time bonus added in. The fact that the typo usually doesn't affect things much is almost guaranteed - if it made results obviously bananas it would have been found sooner.


I doubt keltranis is missing the issue. Instead let's assume reddit's devs know what's what think of a why.

My guess is they want to fast bury spam. Spam in old posts will have been deleted and thus old posts are more trustworthy. Voting brigades are a smaller risk than the constant flood of spam.


I believe they have other, closed source, anti-spam measures. This might be the reason, but if it is 'tis a silly reason. If I understand correctly, it means that if 75% of people like a post, and 25% dislike it, that means it has a 25% chance of not being successful because the first vote is negative. Overall, this measure is effectively lowering the quality of all content on the site.


No, spam doesn't have anything to do with it. And actually, older (high-scoring, anyway) links are more likely to have spam comments because the moderators aren't hanging around there anymore.


Maybe you misread, the quote shows it is the intended behavior. They don't care about the number of votes, only if it is more up, equal, or more down. After that they care about age more. The article wants them to base it more on number of votes. You can tell from the parts about returning in whatever order the DB wants that they are going for speed and scalability as well. So more down voted posts being a certain way may not even matter to them as long as they are off the page.


> Voting brigades are a smaller risk than the constant flood of spam.

Are they? This behaviour allows one vote to bury a post. Is the "downvote brigade" risk really so small that you can hand it an opportunity like this?


If he has some justification for the formula to be as it is, why hasn't he shared it with us?


The less hot posts according to reddit are the ones for which people fight each other while the most hot posts are the ones which make unanimity. What's wrong with that?


It adds to the hive mind effect you already have with online communities. This algorithm explains frustrations I've had in the past. In 4 years on the site, I've probably submitted 20 times, but the realization that I'd have to try submitting multiple times to get a fair shot at discussion makes it unpalatable for a user that only submits on occasion. Sure, any system can be gamed, but Reddit favors unity over diversity and karma whores over Regular Joe.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: