Monday, May 9, 2016

GitHub API For Reports

We use Enterprise Github at my work and, like github.com, it sports a fairly comprehensive REST API.

While there are lots of ways to use this API, my most common usage has been creating reports about what my team is doing on github.

What PRs Are Waiting For Me?

At my company, engineering leads are in a github group that gets notified about repositories throughout the company. But I'm unlikely to have meaningful input on a pull request generated, say, against the analytics team's code. So I wanted a quick way to find PRs that I actually care about.

This turns out to be easy to get via the Github search API:
curl -LGs 'https://|your_github_host|/api/v3/search/issues --data-urlencode "q=type:pr state:open repo:|repo|" --data-urlencode "per_page=100"

If you want all the repos for a given organization, use "user:|org_name|" You can string together any number of repos and organizations, and they'll be ORed together. In fact, the script I use constructs the query I need from its arguments, figuring out which syntax is needed. I run the results through jq and an awk-based formatting script, and I get a nice report of outstanding PRs in the repos I actually care about.

As my team grew, I wanted to also look at PRs by members of my team, even if those PRs are outside repos we own. This is particularly true of junior team members or new colleagues coming onto the team — I want a sense of their coding style and areas where they can grow. Again, this is pretty easy.

curl -LGs 'https://|your_github_host|/api/v3/search/issues --data-urlencode "q=type:pr state:open author:|username|" --data-urlencode "per_page=100"

As with repos, you can have any number of "author:" items in your search string. I run this through the same formatting steps above.

Finally, sometimes people want me to weigh in on a pull request outside of the repositories my team owns. So I added another stanza to my wrapper script (actually, the wrapper script factors out the common parts of the URL, so each section just passes in the new part of the query):

curl -LGs 'https://|your_github_host|/api/v3/search/issues --data-urlencode "q=type:pr state:open mentions:|my_username|" --data-urlencode "per_page=100"

Throughout the day, I run my script, and a few seconds later I have a complete view of all the PRs I should at least be aware of.

What Self-Merges Have Happened?

We have a strong policy against self-merges thanks to a culture in which pull requests and continuous code reviews are the norm. We wanted an easy way to find what self merges had happened in a given repo. This quick one-liner I assembled will give you those pull requests where the author was also the merger

curl -LGs https://|your_github_host|/api/v3/search/issues --data-urlencode "q=type:pr is:merged user:|org_name|" --data-urlencode "per_page=100" | jq -r '.items | .[] | .pull_request.url' | xargs -I {} curl -LGs {} | jq -r '[.user.login, .merged_by.login, .html_url] | @tsv' | awk '$1 == $2'


Who Knows About A Repo?

We recently wanted to do some cleanup on our github.com account, which has a host of repos that may or may not still be active and may or may not need to have accounts cut-off based on people who have left, contractors, and the like. I came up with this quick script that, for all the repos in the given organization, will print a list of contributors and the number of commits they've authored across the last 100 commits, ending with the ones who have contributed the most and who are thus the most likely to be knowledgeable about the state of the repo. It requires the user to create an API token so that it can access private accounts.

curl -u |username|:|github_api_token| -LGs https://api.github.com/orgs/|your_org|/repos | jq '.[] | .name' | tr -d '"' | xargs -I {} sh -c "echo {} && curl -u |username|:|github_api_token -LGs https://api.github.com/repos/|your_org|/{}/commits?per_page=100 | jq '.[] | .commit.author.name' | sort | uniq -c | sort -n"

These are just a sampling of how I use the API: I wrote a bunch of scripts to generate data for my self-review; I have a longer script that will identify which repos my team is working in on a monthly basis, part of my effort to take my somewhat siloed team members and push them into other areas where they're less comfortable; I wrote a Chrome plugin that does lots of queries to flag risk factors in an incoming PR. Just today, I was concocting a way to move a large number users between different groups of contributors to change permissions on a repo. These are just examples: The github API has a wealth of possibilities once you're aware of it.


No comments:

Post a Comment