• 1.Data Mining Sue Walsh Higher Education Consulting SAS Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 2. Overview Brief Historical Perspective Defining Data Mining Issues • Data Collection and Data Organization • Modeling Issues and Data Difficulties • Skepticism and Communication Applications SAS Enterprise Miner Demonstration SAS Enterprise Miner versus SAS/STAT Another Kind of Data Mining - Text Mining Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 3. History Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 4. Data Mining, circa 1963 IBM 7090600 cases“Machine storage limitations“Machine storage limitationsrestricted the total number ofrestricted the total number ofvariables which could bevariables which could beconsidered at one time to 25.”considered at one time to 25.” Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 5. Since 1963Moore’s Law: The information density on silicon-integrated circuits doubles every 18 to 24 months. Parkinson’s Law: Work expands to fill the time available for its completion. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 6. ho sp st el it aoc eclpca ktroni at tat rac ielodepo nt g s in reor t -o gire de OLm rs TPf-s strie aiotalesrl ie bate nese le dansnk ph ta re in onse rvgt ra eatimnsac ca io ag t ioll sns es nscr ta edxit recaturd rn s ch ar Data Delugege s 6
  • 7. The Data Experimental Opportunistic PurposeResearchOperationalValueScientificCommercialGeneration ActivelyPassively controlledobservedSize Small MassiveHygieneClean DirtyStateStaticDynamicCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 8. The Origins of Data Mining Statistics PatternNeurocomputing Recognition Machine Data Mining Learning AIDatabases KDDCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 9. Solving the Data Puzzle- a Step-by-Step ApproachData collection • Transactional systemsns • Customer information systemsci sioData organization s De esData analysissin BuReportingt ul es eRTh Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 10. Definition Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 11. What Is Data Mining? • IT − Complicated database queries• ML − Inductive learning from examples• Stat − What we were taught not to do Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 12. Data Mining – The SAS DefinitionAdvanced methods for exploring and modelingrelationships in large amounts of data. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 13. Solving the Data Puzzle- a Step-by-Step ApproachData collection • Transactional systems • Customer information systemsData organization - data warehousingData analysis - data miningReportingActionCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 14. The SAS Approach to Data MiningSEMMASample Explore Modify Model Assess Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 15. Issues Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 16. Data Collection and Data OrganizationWhat data has been collected and where is it? How do I combine legacy systems with current data systems?• Customer Story What is the meaning of some of these data values? Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 17. Modeling Issues and Data Difficulties Data Preparation Rare or Unknown Targets • Over Sampling Undercoverage Dirty Data • Errors • Missing Values Dimension Reduction (Variable Selection) Under and Over Fitting Temporal Infidelity Model Evaluation Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 18. Skepticism and CommunicationSkepticism • Breaking the Rules (statisticians) • Magic (non-analytical individuals) Communication Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 19. Applications Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 20. Health CareDrug development – to help uncover less expensive but equallyeffective drug treatments.Medical diagnostics – imaging, real-time monitoring (e.g.,predicting women at high risk for emergency C-section).Insurance claims analysis – identify customers likely to buy newpolicies; define behavior patterns of risky customers. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 21. Business and Finance Banks - to detect which customers are using which products sothey can offer the right mix of products and services to better meetcustomer needs – cross sell and up sell.Credit card companies - to assist in mailing promotional materials topeople who are most likely to respond.Lenders - to determine which applicants are most likely to defaulton a loan. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 22. The Absa Group (a South African Bank)Challenge: Reduce operating expenses and cut losses by leveraging data to improve security and enhance customer relationships. Solution: SAS helped Absa reduce armed robberies by 41 percent over two years, netting a 38 percent reduction in cash loss and an 11 percent increase in customer satisfaction ratings.Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 23. Sports and Gambling Sports teams – to analyze data to determine favorable player match ups and call the best plays Gaming industry - to analyze customer gambling trends at casinos. Sports Fanatics – to predict which teams will be chosen for tournament berths as well as to predict game winners. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 24. Education Enrollment Management – which students are likely to attend Retention/Graduation Analysis – which students will remain enrolled after the first year and/or through graduation Donation Prediction – who is likely to donate and how much might they donate Faculty Churn – what faculty members are most likely to leave the institution Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 25. Other Application AreasInsurance – pricing, fraud detection, risk analysisStock Market – market timing, stock selection, riskanalysisTransportation – performance & network optimization topredict life-cycle costs of road pavementTelecommunications – churn reductionRetail – market basket analysis to help determinemarketing strategies Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 26. Demonstration Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 27. Data Mining with SAS Enterprise Miner versus with SAS/STAT Features in SAS Enterprise Miner not in SAS/STAT• Decision trees• Neural networks• Automatic data splitting• Automatic score code• Model comparison tool Features in SAS/STAT not in SAS Enterprise Miner• Diagnostic statistics The products offer different model evaluation statistics because of the difference in purpose. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 28. Another Kind of Data MiningCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 29. Text Mining – What is it?Text mining is a process that employs a set of algorithms for converting unstructured text into structured data objects and the quantitative methods used to analyze these data objects. “SAS defines text mining as the process of investigating a large collection of free-form documents in order to discover and use the knowledge that exists in the collection as a whole.” (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 30. Another View of Text Mining Text A Miracle OccursNumbersCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 31. Text Mining ApplicationsAutomotive Early Warning System • Wallace and Cermack (2004) describe the use of text mining for warranty analysis related to the TREAD act. Medical Information Management • TextWise Labs uses sophisticated text mining methodology to extract medical information from disparate data sources on the Internet. • Computer Science Innovations Inc. is developing an application for the National Cancer Institute that automatically converts medical records into XML data.Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 32. Text Mining Applications Insurance Claim Fraud • Insurance companies employ Special Investigative Units (SIU) to investigate claims for fraud. Data mining methods can be employed to automate the process of referral. Text mining methods are applied to claims examiner notes, physician reports, and other textual data to enhance predictive accuracy. Technical Support • Sanders and DeVault (2004) describe a process that employs text mining to improve efficiency in a technical support environment. Copyright © 2006, SAS Institute Inc. All rights reserved.
    Please download to view
  • All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
    ...

    Data Mining Issues and Applications

    by tommy96

    on

    Report

    Category:

    Documents

    Download: 0

    Comment: 0

    1,440

    views

    Comments

    Description

    Download Data Mining Issues and Applications

    Transcript

    • 1.Data Mining Sue Walsh Higher Education Consulting SAS Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 2. Overview Brief Historical Perspective Defining Data Mining Issues • Data Collection and Data Organization • Modeling Issues and Data Difficulties • Skepticism and Communication Applications SAS Enterprise Miner Demonstration SAS Enterprise Miner versus SAS/STAT Another Kind of Data Mining - Text Mining Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 3. History Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 4. Data Mining, circa 1963 IBM 7090600 cases“Machine storage limitations“Machine storage limitationsrestricted the total number ofrestricted the total number ofvariables which could bevariables which could beconsidered at one time to 25.”considered at one time to 25.” Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 5. Since 1963Moore’s Law: The information density on silicon-integrated circuits doubles every 18 to 24 months. Parkinson’s Law: Work expands to fill the time available for its completion. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 6. ho sp st el it aoc eclpca ktroni at tat rac ielodepo nt g s in reor t -o gire de OLm rs TPf-s strie aiotalesrl ie bate nese le dansnk ph ta re in onse rvgt ra eatimnsac ca io ag t ioll sns es nscr ta edxit recaturd rn s ch ar Data Delugege s 6
  • 7. The Data Experimental Opportunistic PurposeResearchOperationalValueScientificCommercialGeneration ActivelyPassively controlledobservedSize Small MassiveHygieneClean DirtyStateStaticDynamicCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 8. The Origins of Data Mining Statistics PatternNeurocomputing Recognition Machine Data Mining Learning AIDatabases KDDCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 9. Solving the Data Puzzle- a Step-by-Step ApproachData collection • Transactional systemsns • Customer information systemsci sioData organization s De esData analysissin BuReportingt ul es eRTh Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 10. Definition Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 11. What Is Data Mining? • IT − Complicated database queries• ML − Inductive learning from examples• Stat − What we were taught not to do Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 12. Data Mining – The SAS DefinitionAdvanced methods for exploring and modelingrelationships in large amounts of data. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 13. Solving the Data Puzzle- a Step-by-Step ApproachData collection • Transactional systems • Customer information systemsData organization - data warehousingData analysis - data miningReportingActionCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 14. The SAS Approach to Data MiningSEMMASample Explore Modify Model Assess Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 15. Issues Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 16. Data Collection and Data OrganizationWhat data has been collected and where is it? How do I combine legacy systems with current data systems?• Customer Story What is the meaning of some of these data values? Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 17. Modeling Issues and Data Difficulties Data Preparation Rare or Unknown Targets • Over Sampling Undercoverage Dirty Data • Errors • Missing Values Dimension Reduction (Variable Selection) Under and Over Fitting Temporal Infidelity Model Evaluation Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 18. Skepticism and CommunicationSkepticism • Breaking the Rules (statisticians) • Magic (non-analytical individuals) Communication Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 19. Applications Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 20. Health CareDrug development – to help uncover less expensive but equallyeffective drug treatments.Medical diagnostics – imaging, real-time monitoring (e.g.,predicting women at high risk for emergency C-section).Insurance claims analysis – identify customers likely to buy newpolicies; define behavior patterns of risky customers. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 21. Business and Finance Banks - to detect which customers are using which products sothey can offer the right mix of products and services to better meetcustomer needs – cross sell and up sell.Credit card companies - to assist in mailing promotional materials topeople who are most likely to respond.Lenders - to determine which applicants are most likely to defaulton a loan. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 22. The Absa Group (a South African Bank)Challenge: Reduce operating expenses and cut losses by leveraging data to improve security and enhance customer relationships. Solution: SAS helped Absa reduce armed robberies by 41 percent over two years, netting a 38 percent reduction in cash loss and an 11 percent increase in customer satisfaction ratings.Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 23. Sports and Gambling Sports teams – to analyze data to determine favorable player match ups and call the best plays Gaming industry - to analyze customer gambling trends at casinos. Sports Fanatics – to predict which teams will be chosen for tournament berths as well as to predict game winners. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 24. Education Enrollment Management – which students are likely to attend Retention/Graduation Analysis – which students will remain enrolled after the first year and/or through graduation Donation Prediction – who is likely to donate and how much might they donate Faculty Churn – what faculty members are most likely to leave the institution Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 25. Other Application AreasInsurance – pricing, fraud detection, risk analysisStock Market – market timing, stock selection, riskanalysisTransportation – performance & network optimization topredict life-cycle costs of road pavementTelecommunications – churn reductionRetail – market basket analysis to help determinemarketing strategies Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 26. Demonstration Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 27. Data Mining with SAS Enterprise Miner versus with SAS/STAT Features in SAS Enterprise Miner not in SAS/STAT• Decision trees• Neural networks• Automatic data splitting• Automatic score code• Model comparison tool Features in SAS/STAT not in SAS Enterprise Miner• Diagnostic statistics The products offer different model evaluation statistics because of the difference in purpose. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 28. Another Kind of Data MiningCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 29. Text Mining – What is it?Text mining is a process that employs a set of algorithms for converting unstructured text into structured data objects and the quantitative methods used to analyze these data objects. “SAS defines text mining as the process of investigating a large collection of free-form documents in order to discover and use the knowledge that exists in the collection as a whole.” (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 30. Another View of Text Mining Text A Miracle OccursNumbersCopyright © 2006, SAS Institute Inc. All rights reserved.
  • 31. Text Mining ApplicationsAutomotive Early Warning System • Wallace and Cermack (2004) describe the use of text mining for warranty analysis related to the TREAD act. Medical Information Management • TextWise Labs uses sophisticated text mining methodology to extract medical information from disparate data sources on the Internet. • Computer Science Innovations Inc. is developing an application for the National Cancer Institute that automatically converts medical records into XML data.Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 32. Text Mining Applications Insurance Claim Fraud • Insurance companies employ Special Investigative Units (SIU) to investigate claims for fraud. Data mining methods can be employed to automate the process of referral. Text mining methods are applied to claims examiner notes, physician reports, and other textual data to enhance predictive accuracy. Technical Support • Sanders and DeVault (2004) describe a process that employs text mining to improve efficiency in a technical support environment. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • Fly UP