What's New In Stackalytics (12/12/13) - Correction(s)!
For a couple of articles now I've been promising to let you know how to make corrections in Stackalytics, but other news keeps coming up. As I was getting ready to post this, we announced the release of Stackalytics 0.4 with some new review- and activity-related metrics, but let's talk about those next time, so I can keep my promise right now.
Making corrections to a commit is about making sure that the work attributed to a person or company actually represents the contribution put forth by that person or company. For example, remember when OpenStack made the change from "Quantum" to "Neutron"? It resulted in a "search and replace" commit that showed over 100K lines of code changed, which is disproportionate to the actual work done. (Not to say there wasn't effort involved, just not 100K+ lines worth of effort.)
On the other hand, you don't want anyone to be able to arbitrarily go in and change statistics without anybody knowing, so the way Stackalytics is set up, anybody can go in and make a change -- as long as that change passes the normal community review process.
Here's how it works. All corrections are in the corrections.json file. The file contains a JSON object that includes information on each correction. For example:
"correction_comment": "Reset LOC to 0",
"subject": "Rename Quantum to Neutron"
"correction_comment": "Reset LOC to 0",
"subject": "Add openstack-common"
As you can see, the parameters are fairly self-explanatory:
correction_comment: In most cases, this will be "Reset LOC to 0", but it should always describe what was actually done. This information shows in the Stackalytics dashboard.
loc: The number of lines of code that should be attributed to the change, as opposed to what comes up in Gerrit to start with.
change_id: The actual git change_id for the commit.
primary_key: The id (hash) of the commit in the git repository. You can find this using git log.
module: The OpenStack module under which the change appears.
subject: The subject of the change itself. For example, "Rename Quantum to Neutron", or "Removed all projects except Glazier Api". The subject helps others understand why the number of lines in a commit may not have accurately represented the amount of effort involved.
If you see a commit that needs fixing, you can submit a patch to corrections.json, just as you would on any other OpenStack project. That patch then goes through the same community review as any other, so there's complete transparency. Once it's approved, it's merged, and the relevant commit is marked with the correction in red on the Stackalytics dashboard so that there's no question as to what's been done.
For example, here you see a commit that was originally 76,053 lines of code, but was corrected down to 10,000.
So what are the criteria for making these corrections?
The common sense approach to changing commits
The idea behind Stackalytics is to provide a transparent, common sense approach to making corrections. That means nobody should come in and make changes in an arbitrary way. And changes should never be made simply as a means for gaming the system.
As of this writing, most or all of the changes in the
corrections.json file were added after manual review by the Stackalytics team. These are changes that were originally more than 3000 LOC, and were thought by the team to not to represent the actual level of work.
That's not to say that Ilya Shakhat just goes in and makes an arbitrary judgment of other people's work, of course! All corrections should be made in accordance with the guidelines on the Stackalytics wiki, which are as follows:
Commits that contain auto-generated files should be adjusted in order to represent the amount of effort actually produced by the contributor, not including generated output.
Commits that contain the result of automatic code refactoring should be adjusted accordingly.
Commits that are the result of improperly renamed files (shell rename instead of git rename) should be zeroed.
Commits with binary and 3rd party files should adjusted accordingly.
In many, or even most cases, this will result in commits being zeroed, but that's not always the case.
For example, consider the documentation commit in Figure 1. Originally, it consisted of 76,053 lines of code changed, but it was adjusted downward to 10,000. This is because a major portion of the commit consisted of SVG files, which were artificially inflating the totals. On the other hand, the team could tell that a significant amount of manual refactoring and rewriting had been done, so rather than simply zeroing the total, they adjusted it, intentionally using a round number to emphasize that it's an estimate.
But that's wrong!
Of course, the Stackalytics team isn't intimately familiar with every large change that comes their way. It's entirely possible that you might disagree with a change that's been made, and that's fine. Remember, Stackalytics is just like everything else in OpenStack; if you think something needs fixing, go ahead and fix it. You're free to go ahead and submit changes to the corrections.json file in the same way in which you'd make changes to any other OpenStack project. That applies to both existing corrections and also any other corrections you might have seen, if you think they aren't representative of the real work done.
You can file a Stackalytics bug through LaunchPad, of course, and you can also attach a patch.
If you make the change directly and submit it for review (as opposed to just filing a bug and waiting for the team to fix it), make sure that the corrections_comment and subject are descriptive, and that the full explanation is in the commit message, so that reviewers know why you want to make the change.
The team is always adding new features to Stackalytics. For example, we still haven't talked about the "Mentors" metric. Is there something you'd like to learn about how Stackalytics works, or something that you'd like to see it do?