Page 1 of 1
[FIXED] Why no entry with more than 50 votes has more than 4 stars
Posted: Wed Oct 24, 2007 6:27 pm
by skOre
In relation to my
earlier post on this matter it seems that not only does it make sense to open a new entry from time to time, its also technically impossible to get a high rating with lots of votes.
Let me explain that:
When a new voting comes in, Mosets Tree uses this formula to determine the new rating:
Code: Select all
$new_rating = ((($link->link_rating * $link->link_votes) + $rating) / ++$link->link_votes);
The old value is taken, multiplied by the number of votes and added to the new rating. This is then devided by the new total number of votings.
Say in the case of the AEC, we currently have 79 votes and about 4 stars. When somebody votes another 5 stars, this happens:
Code: Select all
$new_rating = (((4*79) + 5) / 80); // = 4.0125
Now this is of course all sound and well, except for one problem - in the database table, the link_rating field is declared like so:
Code: Select all
`link_rating` decimal(3,2) NOT NULL default '0.00'
So instead of the 4.0125 points, the component receives 4.01To make the matter even worse, if you do crawl your way up the ladder, there is a point where you simply cant get any further. Say the component had a rating of 4.3 before, then the math is:
Code: Select all
$new_rating = (((4.3*79) + 5) / 80); // = 4.30875
The final funny bit is, that
the more votings you have, the larger this rounding error gets.
And this is why
there is no extension with more than 50 votes and 4 stars.
I call for a bugfix and recount.
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Wed Oct 24, 2007 6:40 pm
by ntropic
I second the motion. I am a quite satisfied AEC Component user and have added my 5 star vote for AEC Component. Now, I feel like it wasn't really counted. skOre is right, this should be fixed and recounted.
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Wed Oct 24, 2007 7:39 pm
by skOre
In a discussion with another developer, he tried to make the point that maybe MySQL already does round the numbers (I'm not sure that it does), so that the effect would be smaller. So after I pointed him at the obvious math, let me state this here again:
Even if there would be rounding, the value would still converge to 4.3 after about 120 votes
I don't think we should turn this into some sort of math heckling. The Extensions Directory is far too important for Users and Businesses that we can allow for such bugs to influence ratings.
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Fri Oct 26, 2007 8:23 pm
by ircmaxell
I think they are doing it backwards... The way they are doing it, rounding errors will be exasorbated... Try something like this.
you have 2 fields in DB , rating and count.
when you get a new vote (call it vote)
new_rating = (rating + (vote - rating)/count)
Rounding errors will be reduced significantly...
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Fri Oct 26, 2007 8:31 pm
by skOre
Well, apart from the fact that rounding errors should not exist at all since they always accumulate to a larger problem...
What you describe is just what I wrote about before. The problem is not the part you write about though, but the fact that this new rating is truncated by the database table, which cuts the number after the first two decimal places. So no matter how good the math is, if you have the wrong number to begin with and then save it truncated, you have a problem.
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Fri Oct 26, 2007 8:38 pm
by ircmaxell
skOre wrote:Well, apart from the fact that rounding errors should not exist at all since they always accumulate to a larger problem...
What you describe is just what I wrote about before. The problem is not the part you write about though, but the fact that this new rating is truncated by the database table, which cuts the number after the first two decimal places. So no matter how good the math is, if you have the wrong number to begin with and then save it truncated, you have a problem.
Well, what would you propose? The ideal solution would be to do something like votes, count. so to display, just do votes/count. To add a new one, add vote + votes, count + 1
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Fri Oct 26, 2007 8:51 pm
by skOre
The ideal solution in computation is already there, the problem is just the truncation. If they would switch the field to either having more decimal places or making it a string, that would already do the job just fine. Oh, and a recount of committed votes would make sense as well of course.
Re: Why no extension with more than 50 votes has more than 4 stars
Posted: Sat Oct 27, 2007 2:04 pm
by skOre
Maybe you need a graph for illustration purpose?
Say we have a component with a rather mixed start: 4, 2, 5, 4, 4, 3, 5, 1, 5, 3, 5, 2 votings in that order.
But after that, it gets only straight 5's all the time. This is how the development would look like until voting number 100:
The real value is set in thick orange and concludes at
4.83 - still steadily rising.
The truncated value (green) without rounding concludes at
4.54 - without having risen since the 47th voting.
The truncated value (blue) with rounding concludes at
4.73 - without having risen since the 56th voting.
One thing that got my further attention was that if you place any ONE other voting but a strict 5 in the course of the development, the truncated AND rounded curves do not regenerate as the real value curve does:
(the lower bar graph showing the actual ratings)
Adding further bad votings (6 total each between 1 and 4 stars) into the curve shows, what permanent damage to a rating this can have:
(the lower bar graph again showing the actual ratings)
Consider particularly that I still think the green curve is how the current system behaves. And that the rounded curve is in nowhere better than that. And consider that this is only to a hundred votings - the more votings a component receives, the less well it recovers from bad votings. Still wonder why even components who are wildly popular since a long time and have more than a hundred votings
still do so badly?
Do I need to write you guys a song about that there is an error here?
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Sat Oct 27, 2007 4:06 pm
by Beat
Interesting find !
The easy and obvious fix would be to extend the number of digits in the database, e.g. to 9 digits after the
Here the SQL command that an admin could type into the database (provided the display follows fine) to at least stop the wrong roundings:
Code: Select all
ALTER TABLE `jos_mt_links` CHANGE `link_rating` `link_rating` DECIMAL( 10, 9 ) NOT NULL DEFAULT '0.000000000';
Now regarding a recount, maybe the jos_mt_log table could help, but I'm not sure if it holds all the information needed.
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Sat Oct 27, 2007 4:21 pm
by skOre
Well, 9 decimal places could help indeed, although you might want to get to using float in the first place.
As for the recount: Since Mosets can make the relation between reviews and votings, I assumed that there is another table storing these values. Maybe I have to dig a little...
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Sat Oct 27, 2007 4:35 pm
by LorenzoG
Hi guys,
I just want to confirm that we have noticed this thread. We are at the present investigating this.
Thanks!
Edit: elucidation
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Sat Oct 27, 2007 4:52 pm
by skOre
Ah, thanks! The pictures did the trick, hm?
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Tue Oct 30, 2007 1:58 pm
by dknight
You raised a valid issue here, skore. The DEC(3,2) column used for storing average ratings is prone to rounding off error, especially when dealing with large number of ratings (>50 votes).
A few observations:
1. MySQL does indeed round a value when you update a DEC(3,2) column with a number with more than 2 decimal points. So the green line in your graph is not relavent to the issue discussed here.
2. The ratings are represented by multiple of half-stars. Mathematically, this means that we are rounding the rating value to one decimal point in order to decide how many stars a listing gets. Therefore it is ok to allow ourself to have some margin for error as long as the final rounding off is correctly represented by the stars and the value stored in database is precise enough for to use for comparison with other listings. There is no need to use a float column in such case. Based on a few simulations (~2500 votes) I did, a DEC(7,6) column is more than enough to minimize the rounding errors. JED has been updated with the new column definition.
Recount is out of the question because not all individual ratings are recorded from the start. In another words, we do not have the complete voting history to do a proper recount. It wouldn't correct the current situation if only the recent votes is taken in to account.
And this is why there is no extension with more than 50 votes and 4 stars.
Incorrect. As of this writing, there are 27 extensions which has more than 50 votes with 4 stars and above:
http://extensions.joomla.org/component/ ... Itemid,35/Thanks for bringing this in to my attention.
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Tue Oct 30, 2007 2:20 pm
by skOre
Yes, thats what I meant, none of them has more than 4 stars, which is perfectly represented in your search as in my previous research.
However, I find it quite a bit outrageous that you would not consider a recount with the values we do have. Of course I don't know how much of the history is lost, but please do tell me the exact number so that I can follow whether it is a just decision.
I think I have already discussed with another mod (follow the first link in my initial post) that old votings and reviews loose meaning anyhow and thus I cannot see why loosing old votes in favor of correcting a mistake is such a problem.
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Tue Oct 30, 2007 2:36 pm
by Beat
I'm also fine with a recount of the average, missing old votes. Doesn't mean that it should change the number of votes counter, just to correct the erroneous averages, a more or less simple update sql query should do the job...
By the way, wouldn't it make sense to order the reviews in the other order around ? (having most recent reviews first)
Indeed, old reviews are for old releases (so are votes too, so old votes should become irrelevant).
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Wed Oct 31, 2007 3:30 am
by dknight
[qoute]Yes, thats what I meant, none of them has more than 4 stars, which is perfectly represented in your search as in my previous research.[/quote]Indeed. What I should write is that there are extensions with rating more than 4. It's just that it is represented by 4 stars.
My basis for the non-recount was based on the way the calculations are to be done. When old votes in which we do not have records of are recalculated, we tend amplify the rounding errors. Depending on the number of such votes, it will have effect on the rating for most extensions - for better or worse.
I will bring this up for discussion within JED team.
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Wed Oct 31, 2007 9:21 am
by skOre
dknight wrote:Indeed. What I should write is that there are extensions with rating more than 4. It's just that it is represented by 4 stars.
Which is what I presented in my estimations as well - so yes, and in danger of repeating myself: The matter is not the amount they do have, but that it won't change.
dknight wrote:My basis for the non-recount was based on the way the calculations are to be done. When old votes in which we do not have records of are recalculated, we tend amplify the rounding errors. Depending on the number of such votes, it will have effect on the rating for most extensions - for better or worse.
So I guess this means that there is no number that you can give out? Or have you just not looked at how many votes would actually be lost? I think its a crucial issue to present the real numbers we are talking about. If we are only loosing a couple of months or a dozen votes, surely that is negligible.
dknight wrote:I will bring this up for discussion within JED team.
Thank you very much
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Wed Oct 31, 2007 5:53 pm
by Beat
dknight wrote:[qoute]Yes, thats what I meant, none of them has more than 4 stars, which is perfectly represented in your search as in my previous research.
Indeed. What I should write is that there are extensions with rating more than 4. It's just that it is represented by 4 stars.
My basis for the non-recount was based on the way the calculations are to be done. When old votes in which we do not have records of are recalculated, we tend amplify the rounding errors. Depending on the number of such votes, it will have effect on the rating for most extensions - for better or worse.
I will bring this up for discussion within JED team.
[/quote]
As far as I know, the records date from the time where anti-cheating tools have been developed. And that's probably at least a year back. I think that if the timeframe is large enough, that given 1) the rounding errors which are significant, 2) the possibility that some ratings wouldn't have passed anti-cheating measures, 3) extensions evolved since then, that a recount and putting away old unverified votes is in place for either all components, or at least for those with more than the number of votes where rounding errors become significant (see graph). If accuracy matters, that's what should really be done. Errors and bugs happen, correcting results where possible is part of fixing the bug, imho.
And also reverting the sorting of reviews would make sense too
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Fri Nov 02, 2007 5:26 pm
by Tonie
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 stars
Posted: Fri Nov 02, 2007 5:29 pm
by Beat
Thanks for all your hard work, Tonie and JED team, making JED so unique
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Fri Nov 02, 2007 6:49 pm
by skOre
Indeed! Thanks a lot for listening!
Re: Extensions Dir Bug: Why no entry with more than 50 votes has more than 4 sta
Posted: Mon Nov 19, 2007 3:43 pm
by skOre
Wow, that was quite an effect.
Major extensions like CB, JoomlaXplorer or JCE did a jump between 0.5 and 1.5 stars! What a difference such little things can make.
And thanks again for the insight and willingness to recount! I feel that this was an important step in keeping the extensions directory in touch with the joomlasphere.