Data mining applications in BT

  • CategoryDocuments

  • View212

  • BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007272 Data mining applications in BT R Shortland and R Scarfe With the increased use of computers there is an ever increasing volume of data being generated and stored. This can lead to companies becoming âdata rich and information poorâ. This paper describes how BT has used data mining techniques to convert volume data into high-value information which can be used to aid decision making in a number of key business processes. The benefit of actively using data, as opposed to passively storing it, is demonstrated via a number of case studies which cover areas as diverse as fault diagnosis, fraud detection, market segmentation, credit vetting and litigation assessment. 1. Introduction âIt has been estimated that the amount of information in the world doubles every 20 months. The size and number of databases probably increases even faster.â Frawley et al [1] With the increased use of computers, there is an ever increasing volume of data being generated and stored. The sheer volume held in corporate databases is already too large for manual analysis and, as they grow, the problem is compounded. Furthermore, in many companies data is held only as a record or archive. BT has huge volumes of data from 20 million customer accounts, call records, equipment records and fault logs. Potentially valuable information is hidden within these databases and is underexploited. As Sir John Harvey-Jones says: âIT has failed to move from data processing to be- coming a key strategic weapon to change businesses in ways to beat the competition. The real value of IT is only realised if you change the way business is done.â [2] This paper presents a number of case studies demonstrating how data mining is being used to exploit valuable data. Data mining encompasses a range of techniques which aim to create value from volume and form the foundation of decision making (Fig 1) [3]. It does not have a formal definition and there are differing views on its meanings. For the purpose of this paper, data mining is defined as the process of extracting implicit information from databases, often by using various computerised analysis techniques in combination. These are drawn from the disciplines of data analysis, machine learning [4], and data visualisation [5]. They include cluster analysis, dimensional compression, neural networks, and tree and rule induction. Fig 1 From volume to value [3]. The case studies presented in this paper highlight the use of these techniques and the wealth of information contained in databases. The studies presented are in the following areas: ⢠identifying faults on printed circuit boards, ⢠discovering the organisational structure of groups of criminals, ⢠market segmentation and customer characterisation, ⢠predicting outcomes of credit assessment and litigation. 2. Applications 2.1 Background By way of introduction it is worth considering the wide range of applications where data mining has been used. Table 1, derived from Frawley et al [1], is not intended as an exhaustive list but demonstrates the range and diversity of applications. decisions knowledge information data value volume
  • Breadth as well as depth THIS PAPER ORIGINALLY APPEARED IN Vol 12, No 4, 1994 BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007 273 Table 1 Data mining applications. 2.2 Case study 1 â fault diagnosis One of the earliest studies of data mining in BT employed neural networks for automatic diagnosis of faults in line cards used in digital switches [6]. Previous attempts using a âdeep modelâ expert system approach [7] were hampered by the time and expertise required to build and maintain an adequate model based on a knowledge of the circuit functions. Neural networks offered the possibility of automatically obtaining a âshallowâmodel (implicit in the trained network) sufficiently detailed to diagnose the fault classes to the required level of accuracy. This model was based on readily available past experience in the form of previous test results and diagnostic outcomes held in repair databases. Machine learning1 classification techniques attempt to build rules that distinguish between classes. They require data to be presented as a set of attributes followed by the class to which the example belongs, as shown in Fig 2. In the case of the fault diagnosis system, the test data was applied to the input of a neural network, which was then trained to recognise the appropriate fault class. By way of illustration, Fig 3 shows six examples of test outcomes for two of the possible fault types. In practice, the training data consisted of about 250 example cases which had been classified using manual techniques. Fig 2 Attributes for classification. One issue which arose in this case study was that of âmissing dataâ. The full test procedure involved 77 separate tests. In practice, to improve test machine throughput, the process was terminated once an abnormal test result was identified. Incomplete tests therefore resulted in values missing from the test set. A neural network expects an input value for each attribute and consequently incomplete test sequences pose a problem. Three ways of tackling this problem were considered: ⢠eliminate examples with missing values, ⢠generate a random number within the valid range for the attribute, ⢠use a fixed number at the mid-point of the valid range. The first option was not viable as most examples have missing entries to some degree. Of the remaining two it turned out that using the mid-point of the valid range was the most successful [6]. A further lesson learned was that the best results could be obtained by limiting the number of fault classes. Initial experiments examining the ten fault classes were unable to achieve acceptable accuracy. However, analysis of the data revealed that the complexity of the initial task could be reduced significantly. By exploiting the fact that 85% of faults were due to only four component types (see Fig 4), a classifier was built which achieved an overall classification accuracy of 92%. This was of equivalent accuracy to the Industry Application areas Medicine biomedicine, drug side effects, hospital cost containment, genetic sequence analysis and prediction Finance credit approval, bankruptcy prediction, stock market prediction, securities, fraud detection, detection of unauthorised access to credit data, mutual fund selection Agriculture soya bean and tomato disease classification Social demographic data, voting trends, election results Marketing and sales identification of socio-economic subgroups showing unusual behaviour, retail shopping patterns, product analysis, frequent flying patterns, sales prediction Insurance detection of fradulent and excessive claims, claims âunbundlingâ Engineering automotive diagnostic expert systems, Hubble space telescope, computer aided design (CAD) databases, job estimates Physics and chemistry electrochemistry, superconductivity research Military intelligence analysis, data fusion and other classified applications Law tax and welfare fraud, fingerprint matching, recovery of stolen cars Fig 3 Test case data format. Test 1 Test 2 Test 3 Test 4 Test 5 Test 77 Fault type 3.700 â3.700 â0.002 â1.914 4.000 ? Component 1 39.400 â39.00 â0.003 â2.054 0.160 7 Component 2 39.900 â39.10 â0.002 â0.518 0.160 7 Component 2 â7.400 â55.80 â0.002 â0.032 4.00 ? Component 1 38.300 â39.10 â0.001 â0.518 0.230 8 Component 2 2.100 â2.100 â0.002 â0.518 4.00 ? Component 1 1 Unless explicitly specified, in this paper machine learning refers to that area of machine learning concerned with building classifiers. Attr 1 Attr 2 Attr 3 Attr 4 Attr 5 Attr n Class
  • THIS PAPER ORIGINALLY APPEARED IN Vol 12, No 4, 1994 Breadth as well as depth BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007274 âdeep modelâ approach but provided much faster classification and offered an efficient âfirst passâdiagnosis. The remaining 15% of faults could be diagnosed during a âsecond passâ using the âdeep modelâ. Fig 4 Identifying problems. 2.3 Case study 2 â fraud This case study shows how visualisation can be used to identify relationships between entities in a database in order to support further exploration. Investigating a fraud perpetrated against the telephone network rapidly revealed features that suggest data mining is of primary interest: ⢠fraudsters are often part of a highly organised gang â implying that the data might have structure, ⢠many of those committing crime can be identified individually, ⢠very often extensive call records are associated with crime, ⢠sufficient data has to be available to identify the fraud and the fraudsters. To assist in identifying a criminal hierarchy, a data visualisation tool was used to identify calling patterns from a large number of call records. Figure 5 and Figure 6 show telephones represented as nodes on the circumference of a circle and telephone calls as links between them. Figure 5 gives an impression of the complexity of the initial problem. The approach taken was to explore the theories of security investigators. The most significant of these was that âpremium rate services targeted by the largest number of fraudsters are most likely to be part of an organised crime.â This was explored by simplifying the display to show calls only to premium rate services of interest (as shown in Fig 6). Fig 5 Initial view of the fraud problem. Fig 6 Major fraud problem. By exploring similar theories the investigators successively refined their understanding of how each of the fraudsters fitted into the criminalsâ organisational structure. Such information has allowed investigators to concentrate their effort on people who were most likely to be the ringleaders. This has resulted in a number of successful arrests. In the United States, where this type of fraud originated, it is estimated to be a multi-billion dollar problem. It is 0 20 40 60 80 100 number of instances others component 7 component 6 component 5 component 4 component 3 component 2 component 1 component failure distribution for line cards used in digital switches suspects mobile telephones premium rate services suspects premium rate services
  • Breadth as well as depth THIS PAPER ORIGINALLY APPEARED IN Vol 12, No 4, 1994 BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007 275 relatively new in the United Kingdom and occurs on a much smaller scale. By being proactive it is possible to limit and, in many cases, prevent or contain it from becoming such a problem here. 2.4 Case study 3 â marketing Marketing campaigns [8] start with a set of criteria â for example, the product(s) to be marketed, the intended sector, and the geographical area. Machine learning [4] can be applied at the initial stage to characterise customers who already have the product to be incorporated into the campaign criteria to optimise the targeting and response rate (Fig 7). Fig 7 Targeting customers. As the campaign progresses, machine learning can be used to characterise those customers who respond positively and to refine the targeting for subsequent cycles of the campaign. This leads to a greater level of efficiency, achieved by targeting customers with a high probability of accepting the product. The following case study shows how machine learning has been used to characterise customers. The training data set extracted 2000 customers who had opted for and against a raft of network services. Figure 8 shows part of a decision tree [4] produced by machine learning. This work highlights the âmaximum call chargeâ as being the most significant indicator for determining whether or not a customer is likely to use network services. The next most significant factor is the âitemised billâ indicator. It can be seen that customers with itemised bills and call charges greater than £56 are very likely (83% probability) to use network services. Those who do not have itemised bills and have call charges less than £56 are unlikely (87% probability) to use network services. The insights gained by the automatic generation of decision trees can be used in a number of ways â for targeting offers to people with similar characteristics, and to help understand why people choose or disregard a service. Machine learning also offers the following benefits to marketing campaign support: ⢠by increasing response rates through better targeting, the cost of mailing is reduced, ⢠the generation of better targeted calling lists, ⢠explicit definition of the implicit market segments â giving a better understanding of the customer base and helping to design campaigns to fit the market segment, ⢠avoidance of annoyance on the part of those who would be uninterested. 2.5 Case study 4 â credit assessment Traditional methods of credit assessment use âscore cardsâ [9] that are often designed and maintained by independent agencies and use data from various sources to determine credit-worthiness. Although score cards are well established they have a number of disadvantages. They are costly to develop, maintain and use, since they are owned and operated by the credit reference agency and access external bureau data. By contrast the credit assessment system in this case study, which uses machine learning classification, relies only on the internal customer databases (see Fig 9). Con- campaign criteria campaign targeting machine learning mailing list customer database general customer attributes target customer list selection criteria customer selection criteria campaign response database target customer attributes campaign maximum call charge itemised bill itemised bill average call charge directory status non-network services customers (p = 0.87) network services customers (p = 0.83) non-network services customers (p = 0.57) network services customers (p = 0.65) ex-directory normal ⤠£56>£56 yes no >£100 ⤠£100 network services customers non-network services customers yes no Fig 8 Part of the decision tree for network services.
  • THIS PAPER ORIGINALLY APPEARED IN Vol 12, No 4, 1994 Breadth as well as depth BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007276 sequently there are few ongoing costs associated with the use of the data. Prediction models have been built using the historical data from a large sample of representative customers, using sales details, previous behaviour and actual outcomes. At the time of writing, trials involving more than 16 000 customer accounts are being monitored, with initial results showing that a high level of correct predictions have been attained. 2.6 Case study 5 â litigation assessment When a customer consistently refuses to pay an invoice, few options are left, namely: ⢠direct customer contact â to negotiate a settlement, ⢠employ a debt collection agency, ⢠sue the customer through the courts, ⢠do nothing and write off the entire debt. The customer response will be one of the following: ⢠to pay the full debt, ⢠to pay a part of the debt, ⢠to pay by instalments, ⢠to pay nothing. This case study explored a strategy to reduce litigation costs whilst maintaining income, e.g. negotiating settle- ments to avoid defended litigation cases that would otherwise incur high costs for relatively small income. Machine-learning techniques were used to auto- matically generate prediction models using historical data relating to customersâ payment history and litigation behaviour. The initial experiments based on eight litigation outcomes produced high error rates. In order to achieve better classification results, two steps were taken. Firstly, as in the fault diagnosis case, the complexity of the problem was reduced. After consultation with the litigation experts, the task was divided into two sub-problems, each with different types of outcome. Secondly, composite attributes were created; the litigation experts speculated that a linear combination of two attributes could be predictive. Although some methods combine the attributes, for example neural networks, others such as decision trees do not; hence the need to create composite attributes in some cases. The resultant decision trees affirmed the expectations of the litigation experts and produced strong correlation with previously unseen historical data. Models have been built from historical data for about 8000 customers, using previous behaviour and litigation outcomes. Predictions were made for 1300 current litigation cases. At the time of writing, the trial is still live but initial results show that a high level of correct predictions has been obtained. 3. Conclusions The authors have demonstrated how volume data may be converted to high-value information through a series of case studies covering a broad range of applications. Data mining techniques have been shown to provide significant benefits either in terms of cost savings or in revenue generation. What has also been shown, particularly in the case of fraud, is that the combination of data mining with human expertise is highly effective. A number of lessons have been learnt from these studies. Firstly, simply throwing a machine-learning system at a database is unlikely to yield good results. A significant amount of effort is required to preprocess data and understand its meaning in the problem domain. Specialist domain knowledge will almost certainly be required. Secondly, a good deal of problem simplification is likely to be needed if high accuracy results are to be obtained. This inevitably requires an element of compromise between overall business goals and what is practically achievable. Lastly, and perhaps most importantly, data mining alone will not yield business benefits. To be successful it is necessary that business processes are changed to deliver them. The mind-set which views data as something to be archived has to be changed to one which views it as a valuable resource to be exploited. References 1 Frawley W J et al: âKnowledge discovery in databases: an overviewâ, AI Magazine (Fall 1992). customer credit assessment system agreed sale and payments terms of sale sales request customer database periodic training customer attributes Fig 9 Prediction system based on internal customer database.
  • Breadth as well as depth THIS PAPER ORIGINALLY APPEARED IN Vol 12, No 4, 1994 BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007 277 2 Massey J and Newing R: âTrouble in mindâ, Computing, pp 44â45 (12 May 1994). 3 Integral Solutions Limited, Data Mining publicity material (October 1993). 4 Limb P R and Meggs G J: âData mining â tools and techniquesâ, BT Technol J, 12 , No 4, pp 32â41 (October 1994). 5 Walker G R et al: âVisualisation of telecommunications network dataâ, BT Technol J, 11, No 4, pp 54â63 (1993). 6 Totton K and Limb P R: âExperience in using neural networks for elec- tronic diagnosisâ, IEE, Proceedings of Second International Conference on Artificial Neural Networks, Bournemouth, UK (18â20 November 1991). 7 Kennett D and Totton K: âExperience with an expert diagnostic system shellâ, IFIP Workshop on KBS for Test and Diagnosis, Grenoble (27â29 September 1988). 8 Openshaw S: âA review of the opportunities and problems in applying neurocomputing methods to marketing applicationsâ, Journal of Target- ing, Measurement and Analysis for Marketing, 1 , No 2 (Autumn 1992). 9 Lewis E M: âAn introduction to credit scoringâ, The Athena Press, California (1992).
BT Technology Journal ⢠Vol 25 Nos 3 & 4 ⢠July/October 2007272 Data mining applications in BT R Shortland and R Scarfe With the increased use of computers there…