You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An array of commit objects describing the pushed commits. (The array includes a maximum of 20 commits. If necessary, you can use the Commits API to fetch additional commits. This limit is applied to timeline events only and isn't applied to webhook deliveries.)
commits[][sha]
string
The SHA of the commit.
commits[][message]
string
The commit message.
commits[][author]
object
The git author of the commit.
commits[][author][name]
string
The git author's name.
commits[][author][email]
string
The git author's email address.
commits[][url]
url
URL that points to the commit API resource.
commits[][distinct]
boolean
Whether this commit is distinct from any that have been pushed before.
There are several limitations:
A PushEvent contains a maximum of 20 commits. This means that any commit that is above this limit is simply missing in the dataset. Most PushEvent don't hit that limit and contain all the commits (something like 99%). But the problem are initial pushes that could have several thousands of commits. (E.g. A private repo moving to github would have a first PushEvent with a high number of commits.) Missing out on these commits is not okay. We could use the GitHub API for such initial PushEvent that have 20 commits (and potentially thousands of truncated commits). Missing out on subsequent PushEvent commits is probably ok.
Commit dates are missing. But we do have the push date. So we could take the push date as coarse approximation of the commit date (assuming that most of the time the date of a git push is within the same approximate time frame as the dates of the commits). But we shouldn't do this approximation for a initial PushEvent that has 20 commits (and potentially thousands of truncated commits).
We could still use the dataset to get a list of repos per user. I expect this list of repos to be mostly exhaustive as:
I expect most repoS to start public (We can easily get stats for the ratio how many start private and how many start public. (By checking if the first PushEvent has more than 20 commits.)
If you contributed to a private repo, chances are not that low that you contribute to it after it goes open source.
Small contributions (only couple of commits) are very unlikely to be missing. (Small contribs most likely only happen in public repoS. Very unlikely to miss out of a small contributions because of truncated subsequent PushEvent commit array.)
We can also use the dataset for repoS that have a first PushEvent with less than 20 commits. If the first PushEvent has less than 20 commits then we can be confident that the repo started public. Then missing out on couple of commits is probably ok: The approximate commit stats would likely be good enough to categorize users as "maintainer"/"gold contrib"/"silver contrib"/"bronze contrib" and show a contribution timeline.
The text was updated successfully, but these errors were encountered:
In the dataset commits are included in the events callled PushEvent (https://developer.github.com/v3/activity/events/types/#pushevent).
Among others a PushEvent contains:
There are several limitations:
We could still use the dataset to get a list of repos per user. I expect this list of repos to be mostly exhaustive as:
We can also use the dataset for repoS that have a first PushEvent with less than 20 commits. If the first PushEvent has less than 20 commits then we can be confident that the repo started public. Then missing out on couple of commits is probably ok: The approximate commit stats would likely be good enough to categorize users as "maintainer"/"gold contrib"/"silver contrib"/"bronze contrib" and show a contribution timeline.
The text was updated successfully, but these errors were encountered: