Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranking System v1 #960

Closed
wants to merge 11 commits into from
Closed

Conversation

CrafterKolyan
Copy link

@CrafterKolyan CrafterKolyan commented Mar 22, 2021

Usage of normal distribution is not justified at all and as its' support is (-inf; inf) you get some problems (e.g. #883 #455). Exponential distribution's support is [0; inf) which means if we will calculate survival function (same as 1 - cdf (https://en.wikipedia.org/wiki/Survival_function#Definition)) then for all zeros we will get a person with a score equal to a 100 which makes a lot of sense. Also in practice exponential distribution is quite accurate showing activity of a person. See example below:

This is the real activity distribution histogram (taken from https://movespring.com/blog/how-to-set-a-goal-for-your-next-activity-or-step-challenge-5f74c65ac49982000764facf):
image
This is the exponential distribution histogram with different parameters:
image

Here is one more example with real distribution (as blue) and fitted exponential distribution (as red):
image

Next step is to restore parameters of exponential distribution for real distribution. In my opinion Method of Moments (https://en.wikipedia.org/wiki/Method_of_moments_(statistics)) is the easiest to understand and comes from a single property we would like to have: expectation over our parameterized distribution would be equal to expectation of the real distribution. (see *_VALUE variables in code). As expectation of exponential distribution with parameter \lambda is equal to 1 / \lambda then if we have an expectation (which is equal to an average over users) of real distribution then the restored \lambda = 1 / expectation.

Now we have 7 distributions over different aspects of Github Profile: Commits, Contributions, Issues, PRs, Stars, Followers, Repositories. We can understand how "good" a Github Profile in each aspect by calculating survival function over each of these 7 distributions in the points corresponding to Github Profile stats (the lesser value the better). To get a single number from 7 numbers we can have for example an average of these numbers but that wouldn't be good as a person who is great in one aspect and bad in others (e.g. Linus Torvalds with only 2 repositories and low stats in PRs and Issues and a ton of stars and followers) will never get an S+ rank so we need to have an aggregate functions which would stimulate low values at least in one aspect. One of such functions is min(...) over 7 aspects but this doesn't encourage you to develop any aspects except the only one you are best in, so we will use harmonic average (https://en.wikipedia.org/wiki/Harmonic_mean) which fits our needs (as it almost equal to min(...) for values much less than 1 and also has a non-zero gradient over each variable).

As you can see from tests:

  1. A decent user (400 stars and 100 followers) has an A++ rank
  2. Newly created account has lowest possible score = 100
  3. Linus Torvalds has highest possible score = 0 (this wasn't intended I just realized it when ran a script)
    image

IMPORTANT NOTICE
Current values are not set to be equal to 1 / <average stat over users> as I couldn't find any official (and even unofficial) statistics referring to these. So they are just set to what I see as an average Github User.

@vercel
Copy link

vercel bot commented Mar 22, 2021

@CrafterKolyan is attempting to deploy a commit to the github readme stats Team on Vercel.

A member of the Team first needs to authorize it.

@codecov
Copy link

codecov bot commented Mar 22, 2021

Codecov Report

Merging #960 (6f370e3) into master (86b9ad6) will increase coverage by 0.28%.
The diff coverage is 100.00%.

❗ Current head 6f370e3 differs from pull request most recent head 034f47c. Consider uploading reports for the commit 034f47c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #960      +/-   ##
==========================================
+ Coverage   93.98%   94.26%   +0.28%     
==========================================
  Files          22       22              
  Lines         682      663      -19     
  Branches      191      185       -6     
==========================================
- Hits          641      625      -16     
+ Misses         37       34       -3     
  Partials        4        4              
Impacted Files Coverage Δ
src/calculateRank.js 96.42% <100.00%> (+4.93%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86b9ad6...034f47c. Read the comment docs.

@CrafterKolyan CrafterKolyan changed the title Change rank algorithm so it would be possible to get B+ rank (Fix #883) Change rank algorithm so it would be possible to get B+ rank (Fix #883 Fix #455) Mar 23, 2021
@CrafterKolyan CrafterKolyan mentioned this pull request Mar 30, 2021
5 tasks
@stale stale bot added the stale Issue is marked as stale. label Apr 22, 2021
@tkrotoff
Copy link

tkrotoff commented Apr 22, 2021

I hate bots that close issues and now they also close PRs 😲

@stale stale bot removed the stale Issue is marked as stale. label Apr 22, 2021
@atinba
Copy link

atinba commented Apr 25, 2021

This seems much better than the current ranking system, why this hasn't been merged yet?

@CrafterKolyan
Copy link
Author

I think the main problem is that @anuraghazra needs a lot of time to fully understand the solution and also he may want to do some extra testing on his side rather than rely on my research. (But maybe he simply missed this PR)

@anuraghazra
Copy link
Owner

anuraghazra commented Jul 18, 2021

I think the main problem is that @anuraghazra needs a lot of time to fully understand the solution and also he may want to do some extra testing on his side rather than rely on my research. (But maybe he simply missed this PR)

Oh hi! So just looked at it actually I'm very cautious when it comes to changing these stats calculations because people will go mad if they see their ranks are not the same and a breaking change happened.
Also another reason is that I SUCK AT MATHS.

But this PR and your description looks very promising.
@francois-rozet on #1186 seems to be doing the same thing but with few differences it would be great if you folks could help me review and pick a better stats algorithm option.

Repository owner deleted a comment from stale bot Jul 18, 2021
@anuraghazra anuraghazra added enhancement New feature or request. stats-card Feature, Enhancement, Fixes related to stats the stats card. labels Jul 18, 2021
@francois-rozet
Copy link
Collaborator

francois-rozet commented Jul 18, 2021

Hi @anuraghazra,

The principle is very similar in this PR and #1186. Each metric (repos, commits, stars, ...) is associated to its own rank. For instance, the "stars" rank is computed as

stars_rank = exp(-stars / STARS_MEAN)

which ranges from 0 (no one is better) to 1 (every one is better).

The difference lies in how we aggregate the individual ranks.

In this PR, the author consider that if you are extremely good in one metric, your overall rank should be as well. This is done as

rank = 7 / ( 1 / stars_rank + 1 / commits_rank + 1 / followers_rank + ...)

so if followers_rank = 0.000001 (extremely good), rank will be mostly influenced by followers_rank. I don't think this is a good idea because if someone has literally 0 commits/repos/stars but a ton of followers, his rank would still be S+.

For instance this user (esin) has 2.5k followers. In this PR, he would get S+. In mine he is a A (almost A+).

In #1186, the overall rank is a weighted average of the individual ranks.

rank = (1 * stars_rank + 0.25 * commits_rank + 0.5 * followers_rank + ...) / (1 + 0.25 + 0.5 + ...)

This prevents the problem mentioned above, but it also means that, unless you are perfect (ranks = 0.) everywhere your overall rank will not be perfect. The weights are here to mitigate by reducing the impact of commits_rank with respect to stars_rank for example.

The reason why Linus Torvalds is not S+ is because he doesn't have a lot of repos compared to the average user (only 4 instead of 10) and not a lot of PRs/issues. However, it is very easily modified: You can either reduce the "weight" of repos_rank or consider the REPOS_MEAN a bit lower (e.g. 5).

I've edited the weights so that Linus Torvalds is now S+, check #1186.

@CrafterKolyan
Copy link
Author

CrafterKolyan commented Jul 20, 2021

Hi @anuraghazra.

I understand your fears about algorithm changing. Of course, almost nobody would like to understand he is not that good compared to others and of course almost nobody would share on their profile such grade of work they've done. To be honest I'm not sure if having a problem in ranking algorithm is good or bad. It ranks people higher and gives them extra motivation and self-confidence, even though algorithm may lie to them.

From my point of view it seems that as your application became quite popular then people don't care much about the exact grading algorithm, they want to feel their significance to the society which is given to them here. I feel that your "encouraging" algorithm can make more to the open-source community than my "strict mathematical" approach. It is not about math and programming but about psychology.

Anyway I will remain my pull request open in case you'll want to change the calculation algorithm for something better and also as a reference for those who is curious how can you approach to such kind of ranking task.

@dreamyguy
Copy link

Just wanted to applaud @francois-rozet and @CrafterKolyan for their great take on this. 👏👏👏👏👏

I personally like the idea of starting from 0 and getting to the Moon, it gives me a much greater sense of accomplishment. 🚀 🌔

But I don't judge those who get a boost in confidence by starting with half a circle and an A+. Some days I feel like I need these...

Really happy with it as it is @anuraghazra, you've done an amazing work!

@Dosx001 Dosx001 mentioned this pull request Aug 30, 2021
@vercel
Copy link

vercel bot commented Sep 7, 2021

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/github-readme-stats-team/github-readme-stats/92MYUQNT1iBguNAJw3JbXddgb4kz
✅ Preview: https://github-readme-stats-git-fork-cr-1839b0-github-readme-stats-team.vercel.app

@anuraghazra
Copy link
Owner

Okay I was just testing this out, planning to sort this ranking thing this week.

Will consider both of the PRs, and release it under experimental flags ?enable_experiments=new_ranking_system to get feedback from the community and roll out the change gradually.

@CrafterKolyan but I found this, how is this username getting S rank? (username=aju100)

image

While using @francois-rozet's PR #1186 It is rank "A" which seems more correct.
From my experiments overall @francois-rozet's changes seem more balanced.

@anuraghazra anuraghazra changed the title Change rank algorithm so it would be possible to get B+ rank (Fix #883 Fix #455) Ranking System v1 Sep 7, 2021
@francois-rozet
Copy link
Collaborator

@CrafterKolyan but I found this, how is this username getting S rank?

@anuraghazra It's because the user (aju100) has an outstanding number of repositories and, as mentioned in #960 (comment) a single good rank among the metrics leads to a good overall rank in #960, but not in #1186.

Thank you for taking the time to sort this out!

@anuraghazra
Copy link
Owner

Ahh i see. aju100 has only 100 repos maybe not that much but anyways it should not be S rank.

@francois-rozet
Copy link
Collaborator

It is not that much, but much more than the average user. I should mention than the number of repos is not taken into account in #1186 (otherwise Linus Torvalds would not be S+).

@rickstaa
Copy link
Collaborator

rickstaa commented Nov 8, 2021

First of all, @francois-rozet and @CrafterKolyan, thanks a lot for addressing this topic. Here are my two cents.

Overall, I think @francois-rozet algorithm is better balanced. I agree with @francois-rozet that the @CrafterKolyan algorithm creates an incorrect score when somebody has a lot of followers but 0 commits/repos/stars (see #960 (comment)). If @CrafterKolyan fixed this, I would have no preference between the two implementations.

I, however, also see one shortcoming with the implementation of @francois-rozet. The current version does not take the number of contributions into account. I understand why totalRepos is unused: People can fork a lot of repositories while doing nothing with them but still their score would be increased. However, the number of contributions should be considered since a person who contributed to several critical opensource repositories has more impact than somebody who contributed to one repository. I noticed that this is one of the problems people have with the current implementation (see #1425).

@francois-rozet
Copy link
Collaborator

Hello @rickstaa, the reason why I don't consider contributions is because they are redondant with PRs, issues and commits. Since I take the latter into account, I don't need the former.

@rickstaa
Copy link
Collaborator

rickstaa commented Nov 8, 2021

@francois-rozet Good point, you are right I overlooked that fact while quickly scanning your code to answer #1425. In that case, I think we should go with @francois-rozet algorithm.

@CrafterKolyan
Copy link
Author

First of all, @francois-rozet and @CrafterKolyan, thanks a lot for addressing this topic. Here are my two cents.

Overall, I think @francois-rozet algorithm is better balanced. I agree with @francois-rozet that the @CrafterKolyan algorithm creates an incorrect score when somebody has a lot of followers but 0 commits/repos/stars (see #960 (comment)). If @CrafterKolyan fixed this, I would have no preference between the two implementations.

I, however, also see one shortcoming with the implementation of @francois-rozet. The current version does not take the number of contributions into account. I understand why totalRepos is unused: People can fork a lot of repositories while doing nothing with them but still their score would be increased. However, the number of contributions should be considered since a person who contributed to several critical opensource repositories has more impact than somebody who contributed to one repository. I noticed that this is one of the problems people have with the current implementation (see #1425).

To be honest I don't see the problem with many followers and 0 stars/commits/etc. The followers count is the hardest statistic to manipulate with as Github have some system to prevent multiaccount.

@francois-rozet
Copy link
Collaborator

francois-rozet commented Nov 9, 2021

IMHO, the rank should measure your stats as a developer not as an "influencer". Having tons of followers does not make you a good developer, GitHub is not Twitter or Instagram...

Also, the problem does not arise only with followers. Someone with a very large number of empty repos but nothing else still gets S+. Same for commits, issues, ... you get it.

Anyway, I would rather have your version of the rank than the one currently implemented, but @anuraghazra seemingly abondoned the idea...

@markus-wa
Copy link

👋 is this still coming @anuraghazra ? 🙂

@rickstaa rickstaa added the ranks Feature, Bug fix, improvement related to ranking system. label Oct 8, 2022
@smart2pet
Copy link

Probably a lot of people stuck at A+ like me.

Copy link

@smart2pet smart2pet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, this solve of ranking system is good enough. @anuraghazra Please look here.

Copy link

@smart2pet smart2pet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, this solve of ranking system is good enough. @anuraghazra Please look here.

@rickstaa
Copy link
Collaborator

rickstaa commented Dec 27, 2022

In my mind, this solve of ranking system is good enough. @anuraghazra Please look here.

My preference is with #1186.

@rickstaa rickstaa force-pushed the master branch 2 times, most recently from 86aafe8 to 8bc69e7 Compare January 21, 2023 16:47
@chrisK824
Copy link

Hey @rickstaa @anuraghazra will this ranking system be adopted after all?

@rickstaa
Copy link
Collaborator

Hey @rickstaa @anuraghazra will this ranking system be adopted after all?

I am in favour of merging #1186 since it is more balanced (see #1186 (comment)). I, however, would like to have @anuraghazra's opinion before making such a breaking change.

@rickstaa
Copy link
Collaborator

rickstaa commented Apr 8, 2023

Closing, in favour of #1186.

@rickstaa rickstaa closed this Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. ranks Feature, Bug fix, improvement related to ranking system. stats-card Feature, Enhancement, Fixes related to stats the stats card.
Projects
None yet
Development

Successfully merging this pull request may close these issues.