This paper was presented at the National Security Space Architect MIM Technology Forum, Aerospace Corp, Chantilly, VA, June 29, 1999

Security Using Intelligent Agents and Data Mining

Bruce C. Gabrielson, PhD
bruce.c.gabrielson@cpmx.saic.com

Center For Information Security Technology
Science Applications International Corporation
Columbia, MD

Introduction

Science Applications International Corporation (SAIC) is a diversified high-technology research and engineering company based in San Diego, California. SAIC offers a broad range of expertise in technology development and analysis, computer system development and integration, technical support services, and computer hardware and software products.

Significant research efforts have been put forth by SAIC and others to identify viable methods for efficient and efficacious security related to information systems. New and innovative measures are required to enable classified clients to use the Internet and still protect critical data, regulate dissemination, guard against malicious behavior, and ensure the integrity of the system without impeding normal activity.

This paper addresses two suggested security improvement activities related to enterprise-wide computing and data mining across a web-based environment.

Intrusion Detection Framework (IDF)

Intelligent Agent (IA) based Host Security Engineering Tools are aides to the system integrator, helping to ensure that integrated systems are security hardened more consistently. SAIC’s approach to supervise and control IA tools when employed to protect the enterprise wide environment is called the Intrusion Detection Framework (IDF).

The IDF provides the foundation for common services such as data collection, information storage, event management, security, user interface, and task automation to organize and integrate individual probes, monitors, and sensors in the enterprise computing environment. The IDF provides the necessary application programming interfaces and services necessary to partition and distribute applications. Different frameworks provide different ways for clients and servers to communicate.

The IDF integrates Commercial off the Shelf (COTS) and Government off the Shelf (GOTS) tools for vulnerability assessment, audit monitoring, intrusion detection, and malicious code detection and eradication and provide the following security services:

What Are Intelligent Agents

An intelligent agent is a system that can perform a task based on intelligence learned or rules provided. They are able to independently evaluate choices. Agency describes the degree of independence that an agent exhibits. When several agents are put together, they form an agency, capable of a combined range of actions. The agency picks the right agent needed to perform a specific task, using the network itself to do the processing. In other words, the agent represents the user to select and complete a required task through ruled-based criteria while using and interacting with other programs and data.

Agents are a natural extension for complementing and evolving technologies. In the future as these other technologies evolve, so too will the corresponding capability of associated agents to deliver these technologies. Some supporting and complementing technologies include:

Agents consist of a common architecture shown in Figure 1 below. The knowledge base contains the knowledge that has been generated as well as rules that are being followed. Libraries contain information the agent has identified. Application objects are the resources available to the agent (test tools in our case). The adapter serves as the standardized interface for the tools. The views is basically who or what the agent is.


Figure 1 - Common IA Architecture

Information Assurance Problems

In the real world there are practical problems with information assurance. Technology advances result in equipment and configuration changes that are increasingly difficult to follow in real-time. Available test tools that support IA are usually intended to either monitor for attacks or to proactively attack network vulnerabilities. The big problem with IA is getting the knowledge needed to properly carry out the required range of protection activities when attacks are detected.

The Security Agent Solution

Any security agent that can solve the IA problem set will have needs based on what job it is doing. The primary needs of a security agent relate to information, reaction and control:

Reaction and interpretation represent the most difficult challenge. For the security agent to work, it needs some rules to follow depending on where in the infrastructure the agent will operate and what we want it to do. We want the highest level agents to interpret what is going on at levels below them so they can follow reaction rules that are appropriate for their level. Controls must be in place that can support the reaction rules that have been decided on.

Another related problem is that the distributed, hierarchical control/reporting strategy reflects that an element's authority to directly command/control the examination, monitoring and analysis of (lower-level) network assets may diminish with its degree of remoteness in the organizational hierarchy. Therefore, achieving this objective would seem to imply that a two-tier approach might be necessary, one level which would manage assets directly, and one intended to evaluate the situational responses from lower level assets.

IA's are a natural to solve this problem using their remote programming capability. A common agency design might be developed that can change its structure depending on where it is located and who it needs to talk to in the hierarchy. Using an agent's remote programming (RP) capability for computer-to-computer communications, the agent approach is particularly suited to help filter and take automatic actions at higher levels of abstraction, in addition to detecting and reacting to local patterns in system behavior. The IDF will include a distributed knowledge base which acts as a repository for reference knowledge on vulnerabilities, attack techniques/signatures, countermeasures, as well as information collected concerning detected intrusions, vulnerabilities, and corrupted software/information.

IDF Design Objectives

The following design objectives should provide guidance in developing the IDF.

Data Mining Research Goals

Data mining is another emerging technology that can be closely tied to security. Here again intelligent agents play a key role. Data mining allows automated acquisition, generation and exploitation of knowledge from large volumes of heterogeneous information, including multiple text, video, and audio sources. From available internal data sources and open sources such as the Internet and published works, metadata and meta knowledge can be captured, smart indexing can be performed using heuristics, and knowledge bases can be created that enable concept-based searches through natural language processing. The goal is to be able to turn agents loose to find the necessary information or patterns needed to gain knowledge about the environment or about some specific security related condition. Notice that this IS would eventually become one more tool in the IDF.

Figure 2 represents the agent structure for the acquisition of knowledge from wherever it may be located. Data can exist in many forms, each of which requires the development of a special agent designed to find, acquire, interpret and extract the data into a useable and common format.


Figure 2 - Knowledge Acquisition Structure

A general, usable system that provides associative access to all this data must meet certain criteria: It must offer a single point of contact that provides uniform access to all the information available on a particular network (such as audit logs) or on the Web. It must have reasonable performance in processing user queries. It must address issues of scalability in terms of storage requirements at any of the system's distributed components. It must also address scalability issues in terms of network communications, by efficiently and selectively accessing the large and rapidly growing number of information servers. Finally, it must help the user locate relevant information. In order to do this, the system should provide recommendations for refining user queries and help the user manage and understand the complexity of the information space.

The InfoSleuth Solution

InfoSleuth is a powerful multi-use tool that allows for complex like queries to run against heterogeneous sources which may be located on disparate operating systems. It consists of an agent-based infrastructure that can be used to deploy agents for information gathering and analysis over diverse and dynamic networks of multimedia information sources. It may be viewed as a set of loosely inter-operating, though cooperating, active processes distributed across a network. Agent based data mining approaches, such as used by InfoSleuth, have the potential to evolve into the ultimate means of capturing and analyzing information.

Within the context of this paper, data mining is suggested as a potential source of vulnerability information. First, either existing InfoSleuth agents could be modified or new agents will be developed that can analyze large audit files for broad based attack information. InfoSleuth agents can also be used to data mine open source Internet based computer vulnerability information.

Conclusion

The common problem with all agents, including those proposed herein, is a common protocol that each agent or agency can use communicate with all other agents. Once this final problem is solved, the agency approach can provide an initial solution to the existing real world problems associated with information security as well as provide the foundation for further evolution based on future technology advances.