Thoughts on A.I. in Cybersecurity

Generally when people hear the term AI they instantly think of data science derived AI such as machine learning and deep learning. This type of AI is very much needed as the amount of security data keeps increasing. When I started out in security in the early 1990s, the security analysts would have to manually go through the logs and other data sources to look for patterns of interest. As the amount of data grew, we started seeing data science being applied more and more across different data sets. This enabled data science approaches like machine learning to find the probable patterns and produce information for the analyst saying “these are probably the patterns you are looking for”. I say probably because approaches like machine learning use probabilistic reasoning where the results are just conjecture until validated by a human with the necessary knowledge to understand the data they are looking at. It’s often said the amount of data more than doubles every 2 years and this adds weight to why we need data science approaches like machine learning to do the preliminary analysis of the data for defenders. This also means the information, knowledge, and wisdom layers on top of the data are increasing as well.
As cybersecurity organizations deploy more and more sensors, they are also deploying more and more data science derived AI solutions to do that preliminary analysis. For the past several years this has been causing security analysts to drown in the information being produced in the same way they used to drown in the security data before wide spread use of data science derived AI solutions. The human analysts need to process the information being produced by all those solutions to verify the individual preliminary analysis results produced by algorithms to sort out the false positives from true detections. The problem is there is now far too much information being produced from the underlying data, when combined with information being shared by other organizations about threats and vulnerabilities, for most human security teams to process and take action on. The Ponemon Institute did a survey a few years ago that determined the average company has 75 security solutions, 96% of the information being produced wasn’t being addressed, 19% were deemed reliable, 4% were actually investigated. The cybersecurity problems can’t be addressed by data science derived AI alone. We also need knowledge engineering derived AI that focuses on organizing information into knowledge and can mimic how human security analysts and investigators apply the knowledge and wisdom contained in the knowledge-base.
The strength of knowledge engineering derived AI is being able to mimic how human security analysts apply their knowledge and wisdom to the information to make sense of the preliminary analysis results coming from data science derived AI solutions and validating the performance of the point product producing the preliminary analysis results information.

Another strength of knowledge engineering derived AI is semantic interoperability. Which is the ability to integrate the information across different silos, that is in different formats and serializations into a common format and to organize the siloed information into integrated knowledge using W3C standardized ontologies (knowledge models created from knowledge representation language standards). This means the knowledge engineering derived AI can organize the information coming from the different data silos with knowledge from different frameworks such as MITRE ATT&CK, the NIST Cybersecurity Framework, NIST Cyber Resiliency Engineering Framework, ODNI Cyber Threat Framework, etc so the information is organized and can be looked at through the different lenses of knowledge frameworks and human mental models. Both data science derived AI and knowledge engineering derived AI are required pieces in the DHS and NSA sponsored Integrated Adaptive Cyber Defense (IACD) community.

Knowledge engineering derived AI also support the ability to reasoning and inference using the integrated knowledge and wisdom in the same way human security analysts do. For example, a very popular human mental model for intrusion analysis is the Diamond Model. The Diamond Model of Intrusion Analysis contains 7 axioms to help human analysts understand the nature of cyber threats and to help them reasoning over cyber attack information.
Knowledge engineering derived AI also uses these same types of axioms to support reasoning and inference. Axioms are normally captured as first order predicate logic in knowledge engineering derived AI.

Knowledge engineering derived AI can reasoning over the facts in the information it is looking at and based on those facts, infer new facts into the investigation from the knowledge models (ontologies) and knowledge contained in the knowledge-base.

The below graphic shows the investigation of a windows event for powershell, the knowledge engineering derived AI applied knowledge to detect and verify this was not an authorized powershell usage, gathered up the contextual knowledge about the device and user and asserted this contextual information. Then, based on the facts from the windows event containing the unauthorized powershell and the contextual facts of the device, user, and what parts of the business they support, the knowledge engineering derived AI was able to infer all the information in the green area at the top of the knowledge graph. Inferences such as the specific MITRE ATT&CK technique, the MITRE ATT&CK Tactic, the stage of the cyber attack lifecycle using the ODNI cyber threat framework, the objective of the adversary in performing this activity, the stage of the cyber attack lifeycle using NSA’s more complex Technical Cyber Threat Framework, the impact assessment from this activity, and what courses of action to do or recommend. These course of action recommendations could be passed to human teams or to a security orchestrator for automated response actions.

The problem with applying AI in cybersecurity has to include knowing which type of AI you need to solve the different problems faced by the security organization. If it’s a data problem, then you need data science derived AI, if you’ve already invested in applying data science derived AI then you’re probably drowning in the information produced by the various data science derived AI solutions and don’t have the humans you need to process, verify, and validate all the preliminary analysis results. You need to start thinking about investing in knowledge engineering derived AI solutions if you’ve reached this level. These are very different types of AI that don’t have a lot of overlap but are extremely complimentary when both are used in the security enterprise. Data Science derived AI, Security Orchestration, and Knowledge Engineering derived AI are 3 foundational technologies that are needed to support holistic security automation and to keep the human security team from drowning in the data and information.
This is a helpful overview chart to see the side by side comparison of both data science derived AI (aka non-symbolic AI) and knowledge engineering derived AI (aka symbolic AI). It’s important to remember these two types of AI don’t compete with each other but have a synergistic relationship with each solving their own sets of problems as part of a holistic approach to applying AI in cybersecurity.

Written by Shawn Riley
Shawn Riley serves as the Chief Visionary Officer and Technical Advisor to the CEO for Shawn also volunteers as the Executive Vice President, Strategic Cyberspace Science and Board of Directors member at the non-profit Centre for Strategic Cyberspace + Security Science in London, England, UK. Shawn is an industry thought leader in the NSA's Science of Security virtual organization with a focus on applied cybersecurity science and AI-driven science in security operations.