AbstractTechnology industry is a global industry. Cisco
routers are sold all over the world, serviced all over the world and developed by engineering teams spread across the globe.
Several organizations in the United States are developing products with development teams spread across timezones. Development teams in India, China, Russia and other low cost location allow the project manager to put more resources on the task. In several cases, the presence of an overseas components can turn a negative NPV project into positive NPV project.
In this paper I would like to enumerate learning from several telecommunications product development experiences. In most cases there were two main teams working on the project. One based in a low cost location and the other based in North America. Additionally, some members of the team worked from home offices and would commute to the main office only when asked to do so by the project manager.
The overseas team was always part of a dynamic overseas company which specialized in providing development resources for advanced telecommunication ventures. This distinction is important since we are dealing with established companies with mature processes and verifiable track record. This paper lists key lessons that can help better managing distributed projects.
I. INTRODUCTION HEN a new project is started, a project managers primary concern is to get the effort, funded and staffed.
Project staffing options range from single site in-house development to multi site development spread across numerous organizations. Multi site projects are also called distributed projects. A product developed across multiple sites is called a distributed product.
Several factors go into making a selection. This includes the skill set of the management team and engineering staff, prior experience with multi site development and comfort level with foreign cultures and work ethics. Once a decision is made to multi site the project, the project manager can employ one of three outsourcing models.
First model is termed Blackbox outsourcing; it implies a high level of independence for all sites and a low level of interactions between sites. On the other hand of the spectrum is Integrated outsourcing. This model results in a high level of interaction between sites and a low level of independence for each site. Third option is a combination of the two or a middle ground.
Every project manager aspires to deliver high quality product on time and under budget. If the development effort is
structured as a multi site effort than the team faces additional challenges that can derail the project. These challenges can be accentuated if one or more sites are in a different time zone. I have often referred to these extra challenges as distribution risk.
While there are no text book techniques to mitigate
distribution risk in every scenario, there are some guidelines that can help. The application of these guidelines is more art than science and it demands a high level of flexibility and emotional maturity in the management team.
In this paper, I have summarized some decision points which can help decide what kind of outsourcing model is suited for a project. I have also elaborated on techniques that can improve the quality of a distributed product.
II. PROJECT STRUCTURE: OUTSOURCING DECISIONS
A. First Decision: Integrated Outsourcing or Blackbox outsourcing. Technology projects typically have a hierarchy of engineers
reporting into one or more project managers. The work breakdown often is influenced by various high level architecture components that form the product.
Work breakdown and staffing is one of the first decisions that a project manager needs to make. If the project team is spread across multiple locations, a wrong step at this stage can guarantee failure at the end.
B. Introduction to Blackbox outsourcing. Let us consider a project with M locations. Project manager
is working with the core team at headquarters and the rest of the team is remote, spread out over M-1 locations.
In this project, high level product architecture breaks out the product into N components. A simple solution to the staffing quandary is to identify high level components that are complex, but are largely self-contained and have sparse yet well defined interfaces. Such components could be given to a remote team. The remote team would need to be set up as a project team with a strong team leader or project manager. Such a model is referred to as a Blackbox outsourcing model.
In this situation there is very little interaction between engineers on the remote team and those at headquarters, or between remote teams.
A good example of this approach is where an ancillary piece is spun off to a remote team. For example, on an ATM switch, if the Mibs are in place, the graphical user interface
Strategic Management of Distributed Technology Projects
Shantnu Sharma, Manager Research and Development, Ellacoya Networks
0-7803-9139-X/05/$20.00 2005 IEEE. 461
can be easily outsourced using Blackbox outsourcing. In Packetcable Multimedia, the SOAP interface on an existing Application Manager could be developed using this methodology.
This model has several advantages and disadvantages. These are discussed in detail here.
1) Simplicity: If the module developed at a remote site can be completed with little interaction with the core site, this is the recommended approach. There is an implicit assumption that the developed module is either a stand alone product or is very straight forward to integrate.
A telephony soft-switch that I was involved used a database. We decided to switch from an open source database to a commercial database. This was an excellent project for a dedicated, remote team. The interfaces were already in place and did not need to change. Remote team needed very little interaction with the core team. Integration was smooth as the interfaces were the same.
2) Side stepping communication issues: In Blackbox outsourcing, most communication flows between a point person on the remote side and a point person on the core team. This is simpler to achieve, particularly when compared with every one on the core team talking with everyone on the remote team(s).
3) Ability to scale up/down: If a new module has to be added to a project, in the Blackbox model, that might translate into adding a new development site. This can be accomplished by minimal disruptions of other sites. On the other hand, if a module is completed, the site responsible for it can be diverted to work on another component, with minimal disruption of the of core team.
This may seem like a strong advantage on paper. In reality it is never that simple, because most projects are a combination of Blackbox and Integrated Outsourcing.
4) Difficulty in transitioning sustaining work to low cost locations: In Blackbox outsourcing, teams are made on the basis of modules/protocols/products etc. Every team takes ownership of their particular segment.
It is quite likely that a team at a high cost location has sole ownership and domain expertise in a particular area of the product. After project completion, if the project manager wishes to transition sustaining/enhancement work from the high cost location to a lower cost location, she will run into trouble. Project team will feel threatened that their jobs are being transferred overseas. In many cases they will be justified in feeling vulnerable. Even when the original team is guaranteed work on a new project they may feel insecure if the old project is transferred. Even if they spend time training the remote team, it would be half hearted and may even be at the expense of the new project.
5) Skill set constraints: Sometimes when deadlines are looming, project managers are looking for exact skill set match. This is unfortunate since I am a big believer in hiring smart, high caliber engineers and training them. In Black box outsourcing all the vacancies for a module are in one location. It is possible that talent may be available in other locations where the project has a footprint, but such individuals cannot
be called upon to help. In my opinion this is a big drawback of the Blackbox outsourcing model, particularly in locations where the demand for engineers exceeds supply.
C. Integrated outsourcing. Unfortunately in many real world examples, several
practical limitations prevent Blackbox outsourcing from taking place. If a project wants to leverage the advantages of global sourcing than integrated outsourcing needs to be considered. In fact several projects are a combination of both models.
In this approach, local and remote teams are integrated as one team. Team members happen to be working at different locations. In the ATM switch example, the GUI team could be spread across 2 or more geographically disparate locations.
The biggest advantage and disadvantage of this approach is that the project manager and team have to confront and solve the communications issue. If the combined team cannot work as one cohesive team than the project will not survive. On the other hand, if the team can work as one cohesive unit, they will enjoy tremendous cost and strategic advantages.
Attached figure below summarizes some key differences between Integrated outsourcing and Blackbox outsourcing.
Fig. 1. Illustrating major differences between the two outsourcing models
The main challenge of the project leader is to transform a distributed team into one integrated virtual team. I have used the following techniques and have found them to be useful in several cases.
1) Leverage advances in telecommunications: Email and telephone remain the primary work horses of communication.
0-7803-9139-X/05/$20.00 2005 IEEE. 462
Distributed projects need to go beyond these conventional means. Instant messaging, SIP video phones, web cams, blackberrys etc have redefined the meaning of presence and connectivity. It is in the projects interest that key members of the team be available for a quick conversation or email exchange. Project Manager has to be careful about striking the right balance between key engineers availability for a project and respecting ones private time.
2) Creating cross dependencies and linkages: A classic development cycle for a project staffed by a single person is requirements analysis-> architecture->design->coding-> test. This approach is efficient and with the right amount of reviews and cross checking can produce a high quality product.
In a distributed environment this strategy can be problematic. Engineers working at remote location, having module dependencies may not correspond until integration time. If a mistake has been made at design/coding time it may be costly to fix it at integration time.
A simple solution is to keep engineers working on related modules engaged with each others design and code reviews. Such a solution works well if engineers are loaded at less than 75% in their schedules. When deadlines are aggressive, Mangers will load engineers at close to 90% levels. In such situations, the engineering staff starts focusing more on individual deliverables and less on the work of others.
Setting up cross dependencies and linkages implies taking a portion of someones development cycle and giving it to someone else and vice versa. Assume that engineer A is responsible for Module Alpha being developed at headquarters site H. Engineer B is responsible for module Beta and is working at a remote site R. Alpha and Beta have dependencies that will be resolved at integration time. H is in a high cost location while R is in a low cost location.
A one way cross dependency and linkage is created when B is responsible for the high level design of module Alpha, while A keeps the rest of the module. Corresponding linkage in the reverse direction is created when A is given the task of writing the design document for module Beta, while B keeps the rest of the module. Such a bi-directional linkage and cross dependency gets remote site personnel in close contact during all stages of the project.
Fig. 2. Illustrating cross dependencies and linkages. Note that engineers responsible for module Alpha, skipped the design for her component. Instead she design module Beta.
A side effect of creating cross dependencies and linkages is that the development time on modules linked increases. In my opinion, this is compensated by a shorter integration and bug fixing cycle.
Cross linkages also have the effect of spreading expertise across locations. After integration is complete, low cost remote site R can be given the task of maintaining Alpha code base for bug fixes, enhancements and new features.
3) Form smaller virtual tiger teams: In large technology projects, often new features and enhancements involve work across several modules. A good example would be increasing the throughput of a telecommunication switch. Such work may require changes in many modules. Often project managers solve such a problem by forming a cross functional tiger team which has the authority and expertise to modify several aspects of the product. If members of a tiger team are distributed than they will communicate regularly.
4) Grey box testing across locations: I have described this concept in significant detail in the section B. This concept is central to the success of integrated outsourcing.
5) Development Process is your friend: One of the first things that a mature Project Manger does is to establish a detailed project development process (PDP). And then she rolls it out to all teams. This put every one on a level playing field.
Let me illustrate by providing an example. A remote team was developing a MPLS service API for a core team developing a switch. The project was in design phase and the remote team was tasked with developing a design for this API. Core team was expecting an API that complied out of the box. They expected that the headers would be included in the appendix and so on. Remote team was more interested in providing high level guidance on how API would be designed. They did not anticipate providing low level granularity in the design process. The design process got bogged down in endless reviews and acrimony, and the schedule spun out of control. This was a project management night mare. The two teams were at loggerheads, the schedule was slipping and customer commitments were being ignored.
0-7803-9139-X/05/$20.00 2005 IEEE. 463
If these things are well defined in a PDP, it becomes much simpler to avoid and resolve these issues.
6) Geographic proximity during Architecture and Integration: In the initial stage of the project, when the architecture is being hashed out, having all the major players in a room can be very cost effective. This might involve needing to get key players from various teams together in one location. It is very difficult to collaborate on a design across time zones. Getting the team together can speed project development.
Similarly integration phase efforts also benefit from physical proximity of engineers. Sometimes, engineers have to change algorithms, interfaces and signatures on the fly to make the system work. This is difficult to accomplish when engineers are not co-located.
The Project Manager needs to have fair understanding of the complexity of the design/integration effort. The more complicated and involved the effort, the more likely that having key players in one location will pay off. The project manager will have to estimate the cost of bringing every one into one room, and also the cost saving from accomplishing the effort in a shorter amount of time. Strong considerations should be given to critical path items, and to customer impacting deliverables.
7) Project Manager and Architects to visit all sites: I cannot stress the need for face time with all the remote locations. In one project, during a visit to a remote site, I learnt that the remote team could not get to the Bug Database from their office work stations and needed to create bugs from lab machines. A small inconvenience, which would never have been fixed. In another case, it was clear that traffic patterns and so on made it difficult for the remote team to come in early in the morning as was expected by the core site. Such little things get addressed when senior members visit remote locations.
7) Programming environment: If all engineers have a common programming environment than it becomes significantly easier for them to find solutions to issues. For example, one project that I was associated with had standardized Eclipse for all Java debug. When a remote engineer needed to reproduce a problem seen by the local engineer, she would simply tar up her Eclipse Workspace and ftp it to the remote engineer. Similarly, standardizing on tools to measure complexity, code coverage, memory leaks etc helped compare like quantities.
III. DELIVERING QULAITY PRODUCT: ON TIME AND UNDER BUDGET.
A. Quality Assurance (QA) strategy can make or break your project. Technology projects are inherently complex and a
distributed work environment increases complexity several folds. This additional level of complexity can increase the probability of bugs and deviations from specification, thus increasing the importance of quality assurance functionality. A distributed project will fail if the project manager fails to
compensate for the problems introduced by location and time zone challenges.
Several distributed development teams will distribute programmers across locations but will consolidate test engineers in one location. This is a recipe for failure. When the product is going through a QA cycle, the test engineers want to interact with developers and vice-versa. This is not always possible because of time zone and geographic constraints. In the end, everybody is frustrated and project managers decide to abandon a distributed work environment. If the Net Present Value (NPV) of a project is negative in North America, the same project could have a positive NPV when distributed across high and low cost locations. Valuable economic opportunities are thus lost.
A distributed product development environment calls for un-conventional approaches to QA. In this paper I am enumerating several QA rules that have helped with producing high quality software products.
1) QA Rule A: Developers are not testers but they need to put on a testing hat periodically. At design time, the developer responsible for a given module needs to design a unit and integration test framework for that module. She should also map requirements to design modules, thus designing/developing what will be tested. All programmers should report unit/integration test coverage results in their status reports. I have asked for code coverage number greater than 80%. These numbers should be measured by automated tools and coverage report statistics should be part of the project status report. Several open source packages are available to measure code coverage.
2) QA Rule B: A distributed work environment needs to have two QA levels, Gray Box and Black Box. Let me first illustrate the difference between these two tiers of test efforts.
Black box testers are a products first customer. Blackbox testing is driven by product requirements, public APIs and documents released to customers. A black box tester needs to be an expert in testing the technology but does not need to know anything about the specific architecture of the product providing the technology. Blackbox testers are treated like customers and the project manager is the focal point of communication. Blackbox testers expect the product to be largely working when they get it.
Blackbox testers typically require large lab setups and infrastructure. These test beds can be used for testing multiple products. Blackbox testers dont necessarily need to be part of the product development team.
Gray box testing also tests the product as a whole, just like black box testing. Gray box testing is informal and is done at all product development locations. Gray box testing is performed both by developers and test engineers, both part of the project team. A typical gray box tester has a fair understanding of the product architecture and enjoys a high level of communication with the team members. It is important to note, that gray box testing is not ad-hoc. It needs to be planned, test specifications have to be written and a record of test outcomes is maintained. When various software pieces are integrated, and basic functionality is working, gray
0-7803-9139-X/05/$20.00 2005 IEEE. 464
box testing can begin. Why do we have two levels, when there is some overlap
between the two? Answer for black box is obvious. This is your first customer deliverable/beta site. Without this the project manager will have doubts about the code base. It is better for a product to crash and burn in-house than at customer sites.
Gray testing is the additional expense of working in a distributed environment. This is where all the issues introduced by programmers being in remote teams are solved.
3) QA Rule C: QA reports directly into the project manager or designee. The need for a formal contact person is obvious for a black box development model. They are your first customers and when they find an issue, it needs to be reported to someone with significant authority on the project.
The need for such formality is not obvious in Gray box testing, but that is where it is most critical. A big percentage of bugs uncovered during Gray box testing are caused by communication breakdowns. Consider a case where all modules behave correctly in unit testing but the product as a whole as an issue. Was the API incorrectly implemented or incorrectly used? Why was this issue not found in unit/integration testing. Project manager needs to quickly resolve these issues and make decisions. Individual developers could also make such decisions, but they are torn apart by time and distance barriers.
4) QA Rule D: QA induced requirement churn needs to be carefully managed. When a project is started, requirements are gathered and captured. As the product develops, there needs to be a feedback loop into requirements and opportunities to modify them. As the team gains more domain expertise, some requirements make more sense while others less. As the product moved from Integration to Grey box to black box testing, the flexibility in changing requirements should decrease. Consider a case where, a Grey box tester finds an issue at remote site A, reports it direct to developer at remote site B. This issue is fixed. Unfortunately the project manager at headquarters is not in the loop. Perhaps, this issue did not need to be fixed as it is not a requirement. Or this was an error in coding that should have been caught by unit tests. The former case will cause requirement churn, a very expensive and time consuming process.
5) QA Rule E: Test plans should be detailed and automated. It is the responsibility of the project manager or designate person to verify that the test plan is detailed. As a rule of thumb, I make sure that the test plan can be executed by a new hire with minimal or no access to the team. This might involve specifying actual commands, scripts etc than need to be executed. A typical test plan might say, Add an IP Address to the interface. A detailed plan would take a step further and specify the CLI command. It might say, Add an IP Address to the interface by running the following command..
A detailed test plan can help with distributing work load across locations. Perhaps a senior person could write the plan while an intern, working in a different time zone can run it with minimal help from the author. In many cases it is not even practical for the two parties involved to communicate
continuously because of geographic and time differences. Only a detailed test plan can help in such a scenario.
Additionally, a detailed test plan is much easier to automate. And, it can be automated by someone who may not be part of the core team.
5) QA Rule F: The trend is your friend. Project manager should use a formal bug tracking system for all phases of product development, from requirement analysis all the way to customer support. Bug occurrences should be graphed in various domains (like time, complexity, phases) for sub modules, system and phases.
Consider a software product with N modules, labeled A to N. Each module is staffed by one or more engineers. Every engineers can work on 1 or more module. Every module has different size and level of complexity.
Project Manager has to decide what kinds of graphs to produce and report. At least simple number of bugs vs. time graph should be reported for every module and the system. More advanced graphs based on complexity, lines of code, development phases and other parameters can be created.
Let us assume that the project manager is tracking number of changes in the requirements document against the project phase. When the project is in requirements analysis, that number is high. When the project is in black box testing phase that number should be tending to zero. If now a set of requirements need to be changed in the middle of the project that will show up as a spike in the graph. Project manger can determine if their will be massive redesign and re-coding and take corrective steps.
Such proactive compensation is possible only if the requirement change is treated as a bug in the requirement analysis phase and logged. Project manager can periodically graph and report bug occurrences against various scales and observe trends.
Lets us consider the requirements bug again. At the beginning of the project we should expect to see a high occurrence of requirement flaws. However as we move down the time line, the frequency of such bugs should decrease and we should see a rapid fall in bug occurrences. For large, complex, distributed projects such graphed data can provide pointers into problem areas. A smart project Manager can then take proactive steps to address such concerns.
IV. CONCLUSION Multi site development with a low cost center component is
the reality of product development today. Such efforts, while having low upfront cost introduce increased risk into the project. These risks, if not managed properly can increase the cost of development, stretch schedule and result in low quality product. Project Managers should take proactive steps to counter potential pitfalls and issues to preserve the viability of the project.
ACKNOWLEDGMENT The author would like to thank V Sharma for editorial
0-7803-9139-X/05/$20.00 2005 IEEE. 465