Enterprise Log Managers: An Unsexy, But Vital Tool

: InfoSecurity; Posted On Monday, 16 July 2012 08:39

Ultimately, the goal of Enterprise Log Management (ELM) is to get your most critical events escalated to your operations staff to react and respond with the appropriate actions. In today’s enterprise, you would be culling through millions of events if you were not relying on ELM to correlate that information and point to what is most critical. You may be asking, “Isn’t this Security Information and Event Management (SIEM)?” It’s not. Well, not entirely. ELM and SIEM are interrelated. SIEM is more concerned with the larger view of your overall security landscape, whereas ELM is focused on a specific element of security: “What is happening where?” SIEM correlates data across varying data sources and environments—a more holistic view. Therefore, ELM is a subset and critical component of a SIEM program. Not all companies require a SIEM program. However, most companies would benefit from an ELM solution. For the purposes of this article, we’ll stick to ELM. For more information on SIEM, I encourage you to download ISACA’s free SIEM white paper (www.isaca.org/siem).

Corporate policies are put forth, as are the related controls, in an effort to deter or prevent undesirable activities. Translating the corporate policies into the solution and configuring the relationship between the policy, the controls, and the data feeds from systems and applications that need to be monitored are foundational steps to build an ELM. A measure of the quality of an ELM technology is how easy it is to interface with your critical systems. “How many different components does it understand?” so to speak. “How much technical expertise is required in order to make it deliver value?”

Use Cases and Setup

Privileged access monitoring is a classic example in which an ELM gathers logs from various systems and creates a direct workflow to the operations staff, enabling them to take an action against items considered inappropriate. For example, a domain admin logged in after an allowed change window and failed to authenticate several times in a row—an example of a potential brute force attack. The system must correlate those events and initiate the appropriate workflow, whatever that may be. The processes established around the solution are just as important. The log management solution is only as good as the processes and teams that support it. Typically, this requires an engineering staff and an operations staff. The engineers build and configure the ELM so the right alerts are coming through. The operations staff is then able to take the alerts and, ideally, do the “right thing.” Of course, the less mature your existing processes and workflows, the more iterations will be required. The events you consider “taggable,” the events you are interested in, must tie back to corporate policy. The basic premise that “thou shalt not access that which you are not allowed to access” will guide the rules you develop. Activity will fall into one of three categories: transactions you don’t care about, transactions you want to know about and transactions you want to take immediate action on. For example, you might have miskeyed your password while attempting to log in. That type of transaction is not necessarily one to be concerned about. However, if there are a thousand more attempts in the next 60 seconds, you should know something is phishy. This example is likely a hacker trying to brute-force access to your valuable data. Flag it and determine what part of the organisation should receive the system workflow.

ELM can provide value through non-security use cases as well. There could be transactional activity that indicates a problem, such as multiple acknowledgement requests being generated as a result of a system glitch. The sheer volume could saturate the network acting as a denial of service attack. The ELM could flag this type of activity when it occurs so that remediations can begin to happen in a preventive manner, potentially averting an outage of a critical service.

A virus on the network provides an opportunity for a good ELM to demonstrate intelligence. As the tool logs virus-induced events and correlates them together as a single outbreak, operations will be able to target the affected population proactively. This approach, as is usually the case, can save hundreds or thousands of hours by solving the problem instead of addressing each incident reactively. Obviously, this becomes a compelling value statement as ITIL has put forth for decades: the presence of multiple incidents occurring for similar reasons typically represent a problem needing a solution (i.e. ‘problem management’).

Requisite Skills

The primary skill associated with successfully deploying an ELM is being able to translate business use cases into the ELM tool’s language. If your environment deals with personally identifiable information, for example, privacy concerns are going to be one of the highest priorities. An understanding must exist of the systems generating the data and how those data relate to the company’s use cases. For example, we don’t want people logging on as a local administrator in an Active Directory domain environment; therefore, the ELM would need to alert on the appropriate event ID. As IT professionals, we know there will always be a technology that is not commonly known and will require additional work to develop the proper interface. The resources you assign as your solution delivery Leads or engineers for an ELM deployment must understand how to translate your business logic into the technical speak of your IT landscape.

Challenges

Scalability is the first challenge and biggest concern in architecting the solution. Most likely there will be significant amounts of data logged. Data retention policies and growth must also be considered. Depending on your use cases, large portions of data may need to be held for very long periods of time. Therefore, consideration should be given to balance your company’s tolerance for risk with their taste for capital investment.

ELM systems typically work one of two ways: data intensive, which gathers all data to be analysed later and thus need to scale accordingly; and limited collection, which has agents gather only the information considered “interesting.” In the case of the former, storage will be a greater concern; for the latter, processing capabilities will need to be stronger to reduce the chances of introducing latency into transaction processing time.

Many ELM solutions do not use a communications protocol that provides delivery guarantee, and instead use protocols, such as UDP, which can result in some of the data getting lost. Technology and process verifications could be additional requirements to be factored into the design.

Of course, having well-defined expectations will determine the perceived success of the implementation. Implementing such a solution in a company that has limited policies and procedures will have little success, as there will be few rules to correlate the activity against. Define your solution delivery success criteria early and make sure what you choose is measurable. Consider using a governance and management framework such as COBIT 5 (www.isaca.org/cobit) to guide the initiative.

Conclusion

Some ELMs come with standard rule sets that can accelerate implementation. Recognising efforts to refine rule sets to reflect your organisation’s corporate policies will drive the migration from focused manual intervention to true problem management. In this manner, not only will ELM implementers see a reduction in time spent resolving incidents, but their responsiveness will be seen as more proactive than reactive. As a result, these shops should see a reduction in incident management costs. And of course, when implemented correctly, security issues will reduce overall and compliance abilities will improve.

About the Authors

Rob Zanella recently rejoined CA Technologies’ Internal Audit department. Prior to this role Rob led IT Service Management and Compliance for their IT organization reporting to the CIO. In this role, he was responsible for Service Desk, Continual Service Improvement and Compliance activities within Information Technology. Rob joined CA in 2005 as Director of Internal Audit to develop the company’s first IT Audit practice. Upon establishing the practice, Rob next assumed responsibility for the IT Compliance function to advise on controls optimization opportunities and to manage CA’s IT risk and controls profile. Since then Rob has also assumed responsibility for the Global Enterprise Security group as well as the Service Desk functions. In these roles, Rob has lead CA’s efforts to achieve certification in ISO20000 (ITIL) and ISO27001 (Security).

Rob has 28 years of IT experience in operations, software development, project management, auditing, security, compliance and audit. Before joining CA, he was Director of IT Audit for 5 years at SIAC, the technology arm of the NYSE. Prior to the NYSE, Rob was a Senior Manager at Deloitte & Touche for 7 years implementing ERP solutions as part of their Enterprise Risk Services group for several large clients. In addition, he held various software development and project management positions within Savings Bank Trust Company, Dime Savings Bank and Union Savings Bank while developing and implementing lending software.

Rob is a frequent speaker on technology governance issues, has written several articles and co-authored two books including the recently published, “Cloud Security and Governance: Who’s on Your Cloud?”

Bill Welch, CISM, is a member of the Metro NY ISACA Chapter. He is Senior Director of IT Security at CA Technologies.
Mike Mendelsohn is Director of Application Security at CA Technologies.
Brian Korte is Senior Specialist of IT Security at CA Technologies.

OTHER AUTHORS: BILL WELCH, CISM; MIKE MENDELSOHN & BRIAN KORTE

Enterprise Log Managers: An Unsexy, But Vital Tool

About Vigilance

Quick Links

Who's Online

Search