A cybersecurity analyst recently said to me, "Bad guys don't do normal things." He was explaining his thought process behind a particular automated alert. Specifically, he was describing a Splunk alert that notified his team whenever a ‘suspect ping’ process was detected on one of the more than 20,000 devices within their enterprise. This particular alert detected whenever any Windows host executed the ‘ping.exe’ process with a ‘-n’ argument.
For example: ping 18.104.22.168 -n 1 -w 1000
The peculiar invocation above is not likely to be used by an IT person doing normal network troubleshooting. This non-default command, to send a single ping, might be indicative of malware or an intruder probing the network. This abnormal execution is what makes the ping interesting to a cyber network defender. At the mentioned organization, simply running the above command would be enough to launch a brief investigation by the Incident Response (IR) team.
False Positives are a Huge Waste of Time
Most of these investigations are wasted on false positives. The automated alert described above is a good start, but ultimately casts too wide a net when hunting for threats on such a large and diverse network. It turned out that the vast majority of the alerted ping executions are caused by an HP update process that, as part of its execution, checks for access to the network. Faced with these false positives, the IR team could just turn the alert off to avoid dead-end investigations, but then they would miss the malicious behavior cases, that could lead to finding the true threats.
A Solution Strategy
The typical first step for the defender investigating each ‘suspect ping’ would be to check its parent process. If it was invoked by a user from the command line or some unknown application, the investigation would continue. When the parent process can be quickly identified as a trusted application, the case would be simply closed as a false positive. The ability to automate this first step allows the IR team to ignore the distractions and reclaim valuable minutes of their day. Those precious minutes add to hours each week. And hours add up to days, etc. You get the picture.
One solution to this problem would be for the Analyst to use tools that systematically inspect the parent process of each command for the signs of benign or malignant intent. With DarkLight, the Analyst can encode their expert know-how into a Description Logics inference engine, in essence, creating a “virtual analyst”. Offloading this first-step frees the expert's valuable time for the investigation of true threats.
Let’s take a look at how to encode the Analyst’s knowledge into DarkLight by creating a Programmable Reasoning Object (PRO) that examines a single event to decide if it is truly a “Suspect Ping” event. PRO reasoners do the heavy lifting, leaving the analyst with a condensed set of meaningful data. This particular PRO retrieves and uses knowledge about related data (the parent process) to reason if a ping process can be ignored, or should be classified as “suspect”. If the ping is suspect, DarkLight can also provide the analyst with additional information about the related devices, departments, and personnel to aid in the incident response process.
The Data Source
This team was collecting Windows host data with a tool called Windows Logging Service (WLS). This data is then indexed by Splunk and eventually retrieved as a CSV file. Below is an example alert file. It shows a "suspect ping" with other processes started within a few seconds on the same host. The highlighted process ID (PID) and parent-process ID (PPID) show that the ping was invoked by HPSSFUpdater.exe and the IR team would recognize this as normal activity.
A CSV formatted output of a Splunk alert that shows all host processes started within a few seconds of a "ping" process.
Data Modeling with Ontologies
Before DarkLight can reason about this data, it needs to be modeled in an ontology. This sounds pretty complicated but it’s actually quite easy and very powerful. Ontologies contain class definitions, property definitions, and facts adhering to these definitions. By describing classes, properties, and rules in domain of expert cybersecurity, DarkLight can automate the tedious, complex, and overwhelming tasks of the analyst. Even better, industry-standard ontologies like STIX can be imported into DarkLight and used to create standardized descriptions of activities.
For this example a new "proc" ontology was created to describe the process execution data sent in the Splunk alerts.
Next, our strategy requires us to define 2 new classes of things:
- Process - This type encompasses every event in our alert data.
- SuspectPing - This is the specific type of event we are trying to discover with DarkLight. Only some of the processes in our data are ping processes and even fewer should classified as a SuspectPing.
We’ll also want to define some new data properties before we process the alert data.
- hasProcessName - Shown in the CSV data as NewProcessName. This is the full path of the executable process that was launched.
- hasBaseFileName - The shortened name of the executed processes. Not really useful for this analysis, but since the log includes it, we'll import it anyway.
- hasCommandLine - This is the command that was executed including arguments
- hasPID - The Process ID.
- hasCreatorProcessName - This is the name of the parent process.
- hasPPID - The process ID of the parent process.
And finally, before DarkLight can perform analysis on the source data, it must be first transformed from the tabular records of the CSV formatted file into the preferred data structure of DarkLight: graphs. Here’s an example of a single record represented as a graph:
In Part 2, we will discuss how DarkLight ingests, queries, and reasons over log event data to cut false positives down to nearly zero and provide IR teams with actionable intelligence that is highly enriched with enterprise contextual knowledge. Stay tuned for the gory details!