Calibrating the COCOMO II Post-Architecture Model
Bradford Clark Sunita Devnani-Chulani Barry BoehmCenter for Software Engg. Center for Software Engg. Center for Software Engg.
Computer Science Department Computer Science Department Computer Science DepartmentUniv. of Southern California Univ. of Southern California Univ. of Southern California
Los Angeles, CA 90089, USA Los Angeles, CA 90089, USA Los Angeles, CA 90089, USA+1 760 939 8279 +1 213 740 6470 +1 213 740 5703
firstname.lastname@example.org email@example.com firstname.lastname@example.org
ABSTRACTThe COCOMO II model was created to meet the need for acost model that accounted for future software developmentpractices. This paper describes some of the experienceslearned in calibrating COCOMO II Post-Architecturemodel from eighty-three observations. The results of themultiple regression analysis, their implications, and a futurecalibration strategy are discussed.
KeywordsCOCOMO, cost estimation, metrics, multiple regression.
1 INTRODUCTIONThe COCOMO II project started in July of 1994 with theintent to meet the projected need for a cost model thatwould be useful for the next generation of softwaredevelopment. The new model incorporated proven featuresof COCOMO 81 and Ada COCOMO models. COCOMO IIhas three submodels. The Application Composition modelis used to estimate effort and schedule on projects that useIntegrated Computer Aided Software Engineering tools forrapid application development. The Early Design and Post-Architecture models are used in estimating effort andschedule on software infrastructure, major applications, andembedded software projects.
The Early Design model is used when a rough estimate isneeded based on incomplete project and product analysis.The Post-Architecture model is used when top level designis complete and detailed information is known about theproject. Compared to COCOMO 81, COCOMO II addednew cost drivers for application precedentedness,development flexibility, architecture and risk resolution,team cohesion, process maturity, required software reuse,documentation match to lifecycle needs, personnelcontinuity, and multi-site development. COCOMO II alsoeliminated COCOMO 81s concept of development modesand two COCOMO 81 cost drivers: turnaround time andmodern programming practices.
This paper describes our experiences and results of the firstcalibration of the Post-Architecture model. The modeldetermination process began with an expert Delphi processto determine apriori values for the Post-Architecture model
parameters. The dataset of 83 projects was used for modelcalibration. Model parameters that exhibited highcorrelation were consolidated. Multiple regression analysiswas used to produce coefficients. These coefficients wereused to adjust the previously assigned expert-determinedmodel values. Stratification was used to improve modelaccuracy.
The resulting model produces estimates within 30% of theactuals 52% of the time for effort. If the modelsmultiplicative coefficient is calibrated to each of the majorsources of project data, the resulting model producesestimates within 30% of the actuals 64% of the time foreffort. It is therefore recommended that organizationsusing the model calibrate it using their own data. Thisincreases model accuracy and produces a local optimumestimate for similar type projects.
Section 2 of this paper reviews the structure of theCOCOMO II Post-Architecture model. Section 3 describesthe data used for calibration. Section 4 describes thecalibration procedures, results, and future calibrationstrategy.
2 POST-ARCHITECTURE MODELThe COCOMO II Post-Architecture model is fullydescribed in [1&3]. The Post-Architecture model covers theactual development and maintenance of a software product.This stage of the lifecycle proceeds most cost-effectively ifa software life-cycle architecture has been developed;validated with respect to the systems mission, concept ofoperation, and risk; and established as the framework forthe product.
The Post-Architecture model predicts softwaredevelopment effort, Person Months (PM), as shown inEquation 1, and schedule in months. It uses sourceinstructions and / or function points for sizing, withmodifiers for reuse and software breakage; a set of 17multiplicative cost drivers (EM); and a set of 5 scaling costdrivers to determine the projects scaling exponent (SF),Table 1. These scaling cost drivers replace the developmentmodes (Organic, Semidetached, or Embedded) in theoriginal COCOMO 81 model, and refine the four exponent-
scaling factors in Ada COCOMO. The model has the form:
( )PM A Size EMSF iijj= +
== 1 01 1
The selection of scale factors (SF) in Equation 1 is basedon the rationale that they are a significant source ofexponential variation on a projects effort or productivityvariation.
Recent research has confirmed diseconomies of scaleinfluence on effort with the Process Maturity (PMAT)scaling cost driver . For projects in the 30 - 120 KSLOCrange, the analysis indicated that a one level improvementin Process Maturity corresponded to a 15 - 21% reductionin effort, after normalization for the effects of other costdrivers.
Table 1. COCOMO II Cost DriversSym. Abr. Name
SF1 PREC PrecendentednessSF2 FLEX Development FlexibilitySF3 RESL Architecture and Risk ResolutionSF4 TEAM Team cohesionSF5 PMAT Process MaturityEM1 RELY Required SoftwareEM2 DATA Data Base SizeEM3 CPLX Product ComplexityEM4 RUSE Required ReusabilityEM5 DOCU Documentation Match to Life-cycle
NeedsEM6 TIME Time ConstraintEM7 STOR Storage ConstraintEM8 PVOL Platform VolatilityEM9 ACAP Analyst CapabilityEM10 PCAP Programmer CapabilityEM11 AEXP Applications ExperienceEM12 PEXP Platform ExperienceEM13 LTEX Language and Tool ExperienceEM14 PCON Personnel ContinuityEM15 TOOL Use of Software ToolsEM16 SITE Multi-Site DevelopmentEM17 SCED Required Development Schedule
All of the cost drivers are listed in Table 1. Each driver canaccept one of six possible qualitative ratings: Very Low(VL), Low (L), Nominal (N), High (H), Very High (VH),and Extra High (XH). Not all ratings are valid for all costdrivers. An example of ratings for a cost driver is given forthe Required Reusability (RUSE) cost driver, see Table 2.
The Size input to the Post-Architecture model, Equation 1,includes adjustments for breakage effects, adaptation, andreuse. Size can be expressed as Unadjusted Function Points(UFP) or thousands of source lines of code (KSLOC).
The COCOMO II size reuse model is nonlinear and isbased on research done by Selby . Selbys analysis ofreuse costs across nearly 3000 reused modules in the
NASA Software Engineering Laboratory indicates that thereuse cost function is nonlinear in two significant ways (seeFigure 1):
It does not go through the origin. There is generally acost of about 5% for assessing, selecting, andassimilating the reusable component.
Small modifications generate disproportionately largecosts. This is primarily due to two factors: the cost ofunderstanding the software to be modified, and therelative cost of interface checking.
Figure 1. Non-linear Effects of Reuse
0.25 0.5 0.75 1.0
Data on 2954NASA modules
The COCOMO II sizing model captures this non-lineareffect with six parameters: percentage of design modified(DM); the percentage of code modified (CM); thepercentage of modification to the original integration effortrequired for integrating the reused software (IM); softwareunderstanding (SU) for structure, clarity, and self-descriptiveness; unfamiliarity with the software (UNFM)for programmer knowledge of the reused code; andassessment and assimilation (AA) for fit of the reusedmodule to the application .
3 DATA COLLECTIONData collection began in September 1994. The data camefrom organizations that were Affiliates of the Center forSoftware Engineering at the University of SouthernCalifornia and some other sources. These organizationsrepresent the Commercial, Aerospace, and FederallyFunded Research and Development Centers (FFRDC)sectors of software development with Aerospace beingmost represented in the data.
Data was recorded on a data collection form that askedbetween 33 and 59 questions depending on the degree ofsource code reuse. The data collected was historical, i.e.observations were completed projects. The data wascollected either by site visits, phone interviews, or bycontributors sending in completed forms. As a baseline forthe calibration database, some of the COCOMO 1981projects and Ada COCOMO projects were converted toCOCOMO II data inputs. The total observations used in thecalibration was 83, coming from 18 different organizations.
This dataset formed the basis for an initial calibration.
A frequent question is what defines a line of source code.Appendix B in the Model Definition Manual  defines alogical line of code. However the data collected to date hasexhibited local variations in interpretation of countingrules, which is one of the reasons that local calibrationproduces more accurate model results.
The data collected included the actual effort and schedulespent on a project. Effort is in units of Person Months. Aperson month is 152 hours a month and includesdevelopment and management hours. Schedule is calendarmonths. Adjusted KSLOC is the thousands of lines ofsource code count adjusted for breakage and reuse. Thefollowing three histograms show the frequency ofresponses for this data.
Overall, the 83 data points ranged in size from 2 to 1,300KSLOC, in effort from 6 to 11,400 person months, and inschedule from 4 to 180 months.
We have found that the different definitions of product anddevelopment in the 1990s make data sources lesscomparable than in the 1970s. As a result, even after datanormalization the accuracy of COCOMO II.1997 was lessaccurate on 83 projects than COCOMO 1981 was on 63projects.
4. MODEL CALIBRATIONThe statistical method we used to calibrate the model is
called multiple regression analysis. This analysis finds theleast squares error solution between the model parametersand the actual effort, PM, expended on the project. TheCOCOMO model as shown in Equation 1 is a non-linearmodel. To solve this problem we transform the non-linearmodel in Equation 1 into a linear model using logarithms tothe base e, Equation 2.
The next step was to heuristically set the values of theexponential and multiplicative qualitative cost drivers. Thiswas done using a Delphi process with the COCOMO IIAffiliate users. These are called apriori values.
Multiple regression analysis was performed on the linearmodel in log space. The derived coefficients, Bi, fromregression analysis were used to adjust the apriori values.
4.1 Results of Effort CalibrationThere were 83 observations used in the multiple regressionanalysis. Of those observations, 59 were used to create abaseline set of coefficients. The response variable wasPerson Months (PM). The predictor variables were size(adjusted for reuse and breakage) and all of the cost driverslisted in Table 1. The constant, A, is derived from raising eto the coefficient, B0, Equation 2.
To our surprise, some of the coefficients, Bi, were negative.The negative coefficient estimates do not support theratings for which the data was gathered. To see the effect ofa negative coefficient, Table 2 gives the ratings, apriorivalues, and fully calibrated values for RUSE. The rating forthe Required Reusability cost driver, RUSE, captures theadditional effort needed to construct components intendedfor reuse on the current or future projects. The apriorimodel values indicate that as the rating increases from Low(L) to Extra High (XH), the amount of required effort willalso increase. This rationale is consistent with the results of12 studies of the relative cost of writing for reuse compiledin . The adjusted values determined from the datasample indicate that as more software is built for widerranging reuse less effort is required. As shown in Figure 4,diamonds versus triangles, this is inconsistent with theexpert-determined multiplier values obtained via theCOCOMO II Affiliate representatives.
A possible explanation for the phenomenon is thefrequency distribution of the data used to calibrate RUSE.There were a lot of responses that were I dont know orIt does not apply. These are essentially entered as aNominal rating in the model. This weakens the dataanalysis in two ways: via weak dispersion of rating values,and via possibly inaccurate data values. The regressionanalysis indicated that variables with negative were notstatistically significant for this dataset. This appears to bedue to lack of dispersion for some variables; imprecision of
Figure 2. Data Distribution for Person Months
1 01 52 02 53 0
P e rso n M o n t h s
Figure 3. Data Distribution for Size
20 40 60 80 100
software effort, schedule, and cost driver data; and effectsof partially correlated variables.
Table 2. RUSE Cost DriverValues
RUSE Definition Apriori AdjustedL None 0.89 1.05N Across project 1.00 1.00H Across program 1.16 0.94
VH Across product line 1.34 0.88XH Across multiple
product lines1.56 0.82
Figure 4. RUSE Calibrated Values
L N H VH XH
4.2 Strategy and Future CalibrationWe and the COCOMO II affiliate users were reluctant touse a pure regression-based set of cost driver values whichconflicted with the expert-determined apriori values such asthe triangles in Figure 4 for RUSE. We therefore used a10% weighted-average approach to determine anaposteriori set of cost driver values as a weighted averageof the apriori cost drivers and the regression-determinedcost drivers. Using 10% of the data-driven and 90% of theapriori values.
The 10% weighting factor was selected after comparisonruns using 0% and 25% weighing factors were found toproduce less accurate results than the 10% factors. Thismoves the model parameters in the direction suggested bythe regression coefficients but retains the rationalecontained within the apriori values. As more data is used tocalibrate the model, a greater percentage of the weight willbe given to the regression determined values. Thus thestrategy is to release annual updates to the calibratedparameters with each succeeding update producing moredata driven parameter values. Hence the COCOMO IImodel name will have a year date after it identifying the setof values on which the model is based, e.g. COCOMOII.1997.
The research on the Process Maturity (PMAT) cost driverhas shown the data-driven approach to be a credible. Alarger dataset was used to determine PMATs influence oneffort. PMAT was statistically significant with the large112 project dataset compared to the 83 project dataset
discussed here. This was caused by the additional datahaving a wider dispersion of responses for PMAT across itsrating criteria.
Stratification produced very good results. As with theprevious COCOMO models, calibration of the constant, A,and the fixed exponent, 1.01, to local conditions is highlyrecommended. This feature is available in a commercialimplementation of COCOMO II, COSTARs Calico tool,and is also available in the free software tool, USCCOCOMO II.1997.1.
4.3 Future WorkWith more data we are going to make a thoroughexamination of the cost drivers that have a negativecoefficient. We plan to extend the Bayesian approach toaddress cost drivers individually. There is also a need tocalibrate cost drivers for their influence duringmaintenance. The distribution of effort across lifecyclephases needs to be updated. This is challenging because ofdifferent lifecycle models that are used today, e.g. spiral,iterative, evolutionary. COCOMO IIs sizing model is verycomprehensive but there was not enough data to fullycheck its validity. Additionally the relationship betweenUnadjusted Function Points and logical / physical sourcelines of code needs further study.
REFERENCES1. Boehm, B., B. Clark, E. Horowitz, C. Westland, R.
Madachy, R. Selby, Cost Models for Future SoftwareLife Cycle Processes: COCOMO 2.0, Annals ofSoftware Engineering Special Volume on SoftwareProcess and Product Measurement, J.D. Arthur andS.M. Henry (Eds.), J.C. Baltzer AG, Science Publishers,Amsterdam, The Netherlands, Vol 1, 1995, pp. 45 - 60.
2. Clark, B., The Effects of Process Maturity on SoftwareDevelopment Effort, Ph.D. Dissertation, ComputerScience Department, University of Southern California,Aug. 1997.
3. Center for Software Engineering , COCOMO II ModelDefinition Manual, Computer Science Department,University of Southern California, Los Angeles, Ca.90089, http://sunset.usc.edu/Cocomo.html, 1997.
4. Poulin, J., Measuring Software Reuse, Addison-Wesley, Reading, Ma., 1997.
5. Selby, R., Empirically Analyzing Software Reuse in aProduction Environment, in Software Reuse:Emerging Technology, W. Tracz (Ed.), IEEE ComputerSociety Press, 1988, pp.176-189.
Weisberg, S., Applied Linear Regression, 2nd Ed., JohnWiley