Messrs Nagappan, Murphy and Basilis’ paper on influence of organization structure on code quality caught my attention when was recently mentioned in infoq.com. (Though the paper is not recent, it is published in Jan 2008.)
The paper describes how organization structure affects software quality. Organization structure attributes can be used to build a prediction model for assessing software quality, or to be specific – failure proneness. In authors’ words “In this paper we present a metric scheme to quantify organizational complexity, in relation to the product development process to identify if the metrics impact failure-proneness.” Eight metrics were derived for organization structure. (Though it was difficult to understand how ‘edit frequency’ can always mean instability or low quality code. Think, a developer with healthy habit of continuous refactoring can skew the data.) The guinea pig was Windows Vista with 3404 binaries and 50 MLOC, seemingly the largest such study in commercial software. All data goes through a menacing logistic regression equation to spit out failure-proneness of binaries. This in turn is used to calculate Precision and Recall for pre-defined random split. And out came the numbers. The average precision value was 87% and average recall value was 84%.
Which means 87% of all binaries that were predicted to fail actually did fail. And 84% of the all binaries which failed were predicted beforehand. (It took me some time to understand the difference). Compared to other prediction models like Code Churn, Code complexity (they cited 5 more), the results are shown to be more predictive.
The authors, quite humbly, didn’t rush to propose any prescribed crystal ball model. But one cannot help wonder how this fits together with other factors which are already known to influence code quality, like quality of developers. That means if we keep the org structure same (say a control group) and change the quality of developers, by this model the failure-proneness stays the same, but we know that’s not the case.
The findings were quite interesting, perhaps important too, though this paper, even for a technical one, wasn’t a good read. Sometimes the sentence construction jarred and seemed prone to be misinterpreted or chuckled. (“NOE is the absolute number of unique engineers who have touched a binary and are still employed by the company” or “OOW is the ratio of percentage of people…”. But that’s my nitpicking, apologies). The paper mentions generously other papers which have carried work in similar vein. The references run 37 papers.
It is indeed promising to see that the authors plan to conduct this line of research for open source models where the organization structure is very different in being distributed and non-hierarchical. Similar research on open source might reveal a lot which can be used by other organizations (Open source itself is not likely to benefit from the results, though).
Outsourcing companies specializing is building software/solutions using delivery centers in low cost geographies will benefit the most from such studies. The findings will help clients realize the cost associated with or risk entailed on software quality with distributed development. It will also make sense for agile teams, which once fancied collocated teams, and now getting more distributed.
It might be possible, with enough studies, to suggest an organization structure for minimum failure proneness. But it seems to be a tall order.
[This paper is also being written about here and here.]