1. 1. Data Mining for Security Applications∗Kulesh Shanmugasundaram kulesh@cis.poly.eduNovember 10 20011 Introduction CCS to meet with researchers who are interested inapplying data mining techniques to security applica- This is a summary of discussions at Workshop ontions and discuss critical issues of mutual interest Data Mining for Security Applications CCS’01, PA.during a concentrated period.” In this document data mining takes a broad meaning which may, sometimes, include machine learning Two fundamental questions were asked and mostly (ML) and artificial intelligence (AI). Furthermore, went unanswered in our discussions. forensics and intrusion detection are interchangeable in some contexts. Please note it is beyond the scope of our discussions to provide better definitions to1. Are we trying to solve the right security prob- these terms. lem(s)?Are denial of service and intrusion detectionright problems for data mining or are there anyIt is noted in early discussions that most data min-other security problems where data mining could ing solutions avoid recognition of simple, effectivebe more effective– such as cryptanalysis–, useful substitutes in place of sophisticated, computation-and perhaps preventive? It was suggested foren- ally intensive data mining techniques. One of thesics is one of the fields that can use data mining suggestions is that future performance comparisonsfor effective data reduction and for learning new should include some of these simple, effective meth-insights or patterns. There were no other sug- ods where appropriate. Also noted that most datagestions. mining techniques do not emphasize enough on pre- processing and post-processing of datasets. It was emphasized that we should use machine learning tech-2. Do security problems need development of new niques to fine tune input datasets before data mining data mining techniques? and should automate decision making after data min-It is hard to answer this question without answer- ing. Authors demonstrated use of data mining for ing the previous question. We have not yet iden- intrusion detection, for identifying denial of service tified security problems that mandate a whole attacks and for forensics. new data mining approach. However, it was feltstrongly among the panel that new data miningtechniques will have to be investigated in near 2 Questions... future. One of the suggestions was to investigatetechniques used in bio-informatics to solve secu- Traditionally data mining has solved problems in rity problems– especially intrusion detection. database systems and bio-informatics– where data mining techniques are still being used successfully to map genome– and financial engineering. Recently 3Ideas & Opinions data mining community started applying similar techniques to existing security problems. “This eventFollowing is a collection of ideas and opinions that provides an opportunity for attendees of the ACMcame out of this workshop. Most of them revolve ∗ Feel free to edit this document but please let me know around security problems for which data mining can what you did so that I can keep my copy fresh. Thanks! provide solutions. Page 1
  2. 2. 3.1Fusion of Information 3.4Gene Coding Applications As networks and network sensors become ubiquitousIdea of gene coding applications is similar to gene fusion of sensor information is critical to the devel- coding bio-organisms. Tools and methods should be opment of accurate insights on incidents. Therefore, developed to extract application behaviors at differ- fusion of sensor information and infrastructure devel- ent levels of software engineering process and embed opment to support fusion are an important areas forthese behaviors along with application code. Upon research and development. For instance, stream min-execution of a gene coded application, application ing techniques can be used to develop tools that can level firewalls and intrusion detection systems use give better overview of network traffic in [near] real embedded gene code of the application to detect time[2]. Network forensics is another area well po-anomalous behaviors. sitioned to benefit from fusion of information. One of many problems with information fusion is lackIt seems quite obvious network is not the best place of industry support in adopting a common standardto perform effective filtering. There is too much noise for intrusion message exchange. However, IETF andon the network; To do any effective filtering means TRENA are working together on couple of standardsfirst filtering nosie out and then focusing on interest- for intrusion message exchange.ing signals. However, host based detection methodsin comparison are much more effective in that thereis less noise. We can, however, raise the bar further 3.2Rule Generation & Data Reduc- by deploying detection methods at applications them-selves. New methods should be developed which al-tionlow application developers to characterize “normal” Automated rule generation for intrusion detectionbehaviors of applications and package that informa- systems [to identify new threats,] rule generation for tion as part of application code. Intrusion detection data mining systems to filter datasets efficiently andsystems and firewalls can then rely on this informa- data reduction in data mining systems without lose tion to model anomalous behaviors. of critical information are still in primitive stages of development. Research and development effort must 3.5Feature Selection of Attacks be put in to develop better automated rule genera- tion methods. False alarm filtering is considered anFeature selection of attacks is an important ele- open problem. It was mentioned most commercial ment to intrusion detection systems. Currently there IDS products produce as much as 80% false positives. are not many useful feature selection, categorization New methods are required to filter false alarms with- methods available[14]. Such selection criteria would out leaving way to stealth attacks. That is, an at-allow real time attack profiling and adaptive attack tacker may trigger high volume of false alarms and containment by intrusion detection systems. if IDS reacts by filtering out that false alarm the at- tacker can now by pass the IDS without triggering3.6Lack of Data Visualization the alarm. New methods should reduce false alarms but should avoid such attacks as well. There is a lack of data visualization tools for networkapplications and forensics. Development of data vi-sualization tools with single data multiple perspective 3.3Automated Ruleset Propagation is an immediate necessary. Some form of certificationshould be developed to certify forensic tools such ab- One of the problems still not addressed by IDS ven-stractions are not altering evidence and such abstrac- dors is how to propagate rulesets or attack signatures tions are actually “telling the truth.” An open source securely over networks. An automated update strat- forensic data visualization library seems to be a good egy, through an overlay network approach, should al- starting point for such a certification process. low intrusion detection systems to be more adaptive. Currently RealSecure (http://www.realsecure.com/)3.7Lack of Datasets is the only system that supports anti virus like method to update rulesets from a central server. It is a great concern of the community that lacks of However, updating rulesets in a heterogeneous net- realistic test datasets are making the research uncer- work means more than connecting to a central servertain. What works on test dataset may not work prop- and downloading new rule sets. erly in real datasets; on the other hand, methods thatPage 2
  3. 3. are not efficient on test datasets may turn out to be effective on real data. Therefore, there is immediate need for a tool or a network infrastructure to collect real datasets for the community while maintaining privacy standards. References[1] Critical Thoughts on Contemporary Data Min-ing Research for Security Applications, KlausJulisch[2] Fusing Heterogeneous Alert Streams into Scenar-ios, Oliver M Dain, Robert K. Cunningham[3] Using MIB II Variables For Network AnomalyDetection- A Feasibility Study, Xinzhou Qin et.al[4] Intrusion Detection with Unlabled Data UsingClustering, Leonid Portnoy et. al[5] Multi-Topic EmailAuthorshipAttributionForensics[6] Panel discussions in and out of Sonata -03[7] An Intrusion Detection System Based on theTeiresisas Pattern Discovery Algorithm, AndreasWespi et. al[8] The GeneMine system for genome/proteome an-notation and collaborative data mining[9] Mining High Speed Data Streams, Pedro Domin-gos, Geoff Hulten [10] Dr. Sushil Jajodiahttp://www.ise.gmu.edu/˜csis/faculty/jajodia.html [11] Johannes Gehrkehttp://www.cs.cornell.edu/johannes/ [12] Wenke Leehttp://www.cc.gatech.edu/˜wenke/ [13] Philip Chanhttp://www.cs.fit.edu/˜pkc/ [14] Columbia IDS Grouphttp://www.cs.columbia.edu/ids/Page 3
Please download to view
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
...

Data Mining for Security Applications

by tommy96

on

Report

Category:

Documents

Download: 0

Comment: 0

217

views

Comments

Description

Download Data Mining for Security Applications

Transcript

  1. 1. Data Mining for Security Applications∗Kulesh Shanmugasundaram kulesh@cis.poly.eduNovember 10 20011 Introduction CCS to meet with researchers who are interested inapplying data mining techniques to security applica- This is a summary of discussions at Workshop ontions and discuss critical issues of mutual interest Data Mining for Security Applications CCS’01, PA.during a concentrated period.” In this document data mining takes a broad meaning which may, sometimes, include machine learning Two fundamental questions were asked and mostly (ML) and artificial intelligence (AI). Furthermore, went unanswered in our discussions. forensics and intrusion detection are interchangeable in some contexts. Please note it is beyond the scope of our discussions to provide better definitions to1. Are we trying to solve the right security prob- these terms. lem(s)?Are denial of service and intrusion detectionright problems for data mining or are there anyIt is noted in early discussions that most data min-other security problems where data mining could ing solutions avoid recognition of simple, effectivebe more effective– such as cryptanalysis–, useful substitutes in place of sophisticated, computation-and perhaps preventive? It was suggested foren- ally intensive data mining techniques. One of thesics is one of the fields that can use data mining suggestions is that future performance comparisonsfor effective data reduction and for learning new should include some of these simple, effective meth-insights or patterns. There were no other sug- ods where appropriate. Also noted that most datagestions. mining techniques do not emphasize enough on pre- processing and post-processing of datasets. It was emphasized that we should use machine learning tech-2. Do security problems need development of new niques to fine tune input datasets before data mining data mining techniques? and should automate decision making after data min-It is hard to answer this question without answer- ing. Authors demonstrated use of data mining for ing the previous question. We have not yet iden- intrusion detection, for identifying denial of service tified security problems that mandate a whole attacks and for forensics. new data mining approach. However, it was feltstrongly among the panel that new data miningtechniques will have to be investigated in near 2 Questions... future. One of the suggestions was to investigatetechniques used in bio-informatics to solve secu- Traditionally data mining has solved problems in rity problems– especially intrusion detection. database systems and bio-informatics– where data mining techniques are still being used successfully to map genome– and financial engineering. Recently 3Ideas & Opinions data mining community started applying similar techniques to existing security problems. “This eventFollowing is a collection of ideas and opinions that provides an opportunity for attendees of the ACMcame out of this workshop. Most of them revolve ∗ Feel free to edit this document but please let me know around security problems for which data mining can what you did so that I can keep my copy fresh. Thanks! provide solutions. Page 1
  2. 2. 3.1Fusion of Information 3.4Gene Coding Applications As networks and network sensors become ubiquitousIdea of gene coding applications is similar to gene fusion of sensor information is critical to the devel- coding bio-organisms. Tools and methods should be opment of accurate insights on incidents. Therefore, developed to extract application behaviors at differ- fusion of sensor information and infrastructure devel- ent levels of software engineering process and embed opment to support fusion are an important areas forthese behaviors along with application code. Upon research and development. For instance, stream min-execution of a gene coded application, application ing techniques can be used to develop tools that can level firewalls and intrusion detection systems use give better overview of network traffic in [near] real embedded gene code of the application to detect time[2]. Network forensics is another area well po-anomalous behaviors. sitioned to benefit from fusion of information. One of many problems with information fusion is lackIt seems quite obvious network is not the best place of industry support in adopting a common standardto perform effective filtering. There is too much noise for intrusion message exchange. However, IETF andon the network; To do any effective filtering means TRENA are working together on couple of standardsfirst filtering nosie out and then focusing on interest- for intrusion message exchange.ing signals. However, host based detection methodsin comparison are much more effective in that thereis less noise. We can, however, raise the bar further 3.2Rule Generation & Data Reduc- by deploying detection methods at applications them-selves. New methods should be developed which al-tionlow application developers to characterize “normal” Automated rule generation for intrusion detectionbehaviors of applications and package that informa- systems [to identify new threats,] rule generation for tion as part of application code. Intrusion detection data mining systems to filter datasets efficiently andsystems and firewalls can then rely on this informa- data reduction in data mining systems without lose tion to model anomalous behaviors. of critical information are still in primitive stages of development. Research and development effort must 3.5Feature Selection of Attacks be put in to develop better automated rule genera- tion methods. False alarm filtering is considered anFeature selection of attacks is an important ele- open problem. It was mentioned most commercial ment to intrusion detection systems. Currently there IDS products produce as much as 80% false positives. are not many useful feature selection, categorization New methods are required to filter false alarms with- methods available[14]. Such selection criteria would out leaving way to stealth attacks. That is, an at-allow real time attack profiling and adaptive attack tacker may trigger high volume of false alarms and containment by intrusion detection systems. if IDS reacts by filtering out that false alarm the at- tacker can now by pass the IDS without triggering3.6Lack of Data Visualization the alarm. New methods should reduce false alarms but should avoid such attacks as well. There is a lack of data visualization tools for networkapplications and forensics. Development of data vi-sualization tools with single data multiple perspective 3.3Automated Ruleset Propagation is an immediate necessary. Some form of certificationshould be developed to certify forensic tools such ab- One of the problems still not addressed by IDS ven-stractions are not altering evidence and such abstrac- dors is how to propagate rulesets or attack signatures tions are actually “telling the truth.” An open source securely over networks. An automated update strat- forensic data visualization library seems to be a good egy, through an overlay network approach, should al- starting point for such a certification process. low intrusion detection systems to be more adaptive. Currently RealSecure (http://www.realsecure.com/)3.7Lack of Datasets is the only system that supports anti virus like method to update rulesets from a central server. It is a great concern of the community that lacks of However, updating rulesets in a heterogeneous net- realistic test datasets are making the research uncer- work means more than connecting to a central servertain. What works on test dataset may not work prop- and downloading new rule sets. erly in real datasets; on the other hand, methods thatPage 2
  3. 3. are not efficient on test datasets may turn out to be effective on real data. Therefore, there is immediate need for a tool or a network infrastructure to collect real datasets for the community while maintaining privacy standards. References[1] Critical Thoughts on Contemporary Data Min-ing Research for Security Applications, KlausJulisch[2] Fusing Heterogeneous Alert Streams into Scenar-ios, Oliver M Dain, Robert K. Cunningham[3] Using MIB II Variables For Network AnomalyDetection- A Feasibility Study, Xinzhou Qin et.al[4] Intrusion Detection with Unlabled Data UsingClustering, Leonid Portnoy et. al[5] Multi-Topic EmailAuthorshipAttributionForensics[6] Panel discussions in and out of Sonata -03[7] An Intrusion Detection System Based on theTeiresisas Pattern Discovery Algorithm, AndreasWespi et. al[8] The GeneMine system for genome/proteome an-notation and collaborative data mining[9] Mining High Speed Data Streams, Pedro Domin-gos, Geoff Hulten [10] Dr. Sushil Jajodiahttp://www.ise.gmu.edu/˜csis/faculty/jajodia.html [11] Johannes Gehrkehttp://www.cs.cornell.edu/johannes/ [12] Wenke Leehttp://www.cc.gatech.edu/˜wenke/ [13] Philip Chanhttp://www.cs.fit.edu/˜pkc/ [14] Columbia IDS Grouphttp://www.cs.columbia.edu/ids/Page 3
Fly UP