This analysis is a continued work on the git analysis project following my last post. For the completeness, I incorporate the charts of the last post into this one.
This time I'll just introduce the whole idea and show some charts. If you're interested in the source code, please visit https://github.com/AlunYou/AlunYou.github.io.

Input Data

I manually collect commit log data for 8 popular repos activemq, hadoop, hbase, hive, kafka, spark, storm, zookeeper by git log > xx.log command. Please note, the chart sub title is my comment based on the default charts which are analysis of all the 8 repos. You could also select a single repo to view. It's interesting to see the difference of these repos.

Definition:

  • Overtime: Saturday or Sunday, or earlier than 8 AM or later than 7 PM on Monday to Friday.
  • Core Contributors: the group of people who contributed about 60% of the whole commits.
  • Full Time Contributors: the authors who make 60% above of their commits at work time.
  • Cross Contributors: authors commit to at least two repos.
  • Repo Relation: See how much two repos are related by investigating how many authors contribute to both of them.
  • Charts

    Discussion:

  • In the repo relation analysis, I had to use 4 mapreduce jobs to do the work. I'm wondering what is the normal number of jobs for a typical real-life project.