Detectives, bounty hunters, investigative reporters, threat hunters. They all share something in common: When they’re hot on a scent, they’re going to follow it. In the world of cybersecurity, threat hunters can use artifacts left behind by a bad actor or even a general hunch to start an investigation. Threat hunting, as a practice, is a proactive approach to finding and isolating cyberthreats that evade conventional threat detection methods.
Today’s threat hunters are technologists. They are using an arsenal of tools and triaging alerts to pinpoint nefarious behaviors. However, technology can also be a barrier. Pivoting between tools, deciphering noisy datasets and duplicative fields, assessing true positives from alerts, and waiting to access cold data repositories can slow down hunts during critical events.
Threat hunters that I have worked with here at Comcast and at other organizations have shared that data, when enriched and connected, can be a crucial advantage. Data helps paint a picture about users, devices, and systems, and the expansive lens enables threat hunters to have a more accurate investigation and response plan. However, data is expensive to store long term, and large, disparate datasets can be overwhelming to sift through to find threat signals.
Threat hunting in the AI age
The broad adoption of artificial intelligence (AI) and machine learning (ML) opens the door to data-centric threat hunting, where a new generation of hunters can execute more comprehensive and investigative hunts based on the continuous, automated review of massive data. Threat hunters can collaborate with data engineers and analysts to build AI/ML models that can quickly and intelligently inspect millions of files and datasets with the accuracy, scale, and pace that manual efforts cannot match.
When companies are generating terabytes and petabytes of data every day, using AI/ML can help security teams:
- Collect data from multiple security tools and aggregate it with non-security insights.
- Scrutinize network traffic data and logs for indicators of compromise.
- Detect unknown threats or stealthy attacks, including the exploitation of zero-day vulnerabilities and lateral movement activities.
- Alert on multiple failed log-in attempts or brute force activity and identify unauthorized access.
At Comcast, having clean, integrated data allows AI/ML to improve operational efficiency and fidelity. For the cybersecurity team, operationalizing AI/ML to scrub large datasets led to a 200% reduction in false positives; for the IT team, AI/ML highlighted single-use and point solutions that could be reduced or eliminated, leading to a $10 million cost avoidance.
Creating more effective threat hunting programs with your data
Threat hunters want access to data and logs — the more the merrier. This is because clever malware developers are deleting or modifying artifacts like clearing Windows Events Log or deleting files to evade detection, but fortunately for us, threat hunters know packets don’t lie.
Analyzing all that data can quickly become a challenging task. DataBee® takes on the security data problem early in the data pipeline to give data engineers and security analysts a single source of truth with cleaner, enriched time-series datasets that can accelerate AI operations. This enables them to utilize their data to build AI/ML models that can not only automate and augment the review of data but also achieve:
Speed and scale: Security data from different tools that have duplicative information and no common schemas can now be analyzed quickly and at scale. DataBee parses and deduplicates multiple datasets before analysis. This gives data engineers clean data to build effective AI/ML models directly sourced from the business, increasing visibility and early detection across the threat landscape.
Business context: Threat hunting needs more than just security data. Security events without business context require hours of event triaging and prioritization. DataBee weaves security data with business context, including org chart data and asset owner details, so data engineers and threat hunters can create more accurate models and queries. For Comcast, employing this model has led to more informed decision-making and fewer false positives.
Data and cost optimization: The time between when a security event is detected and when a bad actor gains access to the environment may be days, months, or years. This makes data retention important — but expensive. Traditional analytical methods and SIEMs put tremendous pressure on CIO and CISO budgets. DataBee optimizes data, retaining its quality and integrity, so it can be stored long term and cost-effectively in a data lake. Data is highly accessible, allowing threat hunters to conduct multiple compute-intensive queries on demand that can better protect their organization.
Bad actors are evolving. They’re using advanced methods and AI/ML to improve their success rates. But cybersecurity teams are smart. Advanced threat hunters are expanding outside of generic out-of-the-box detections and using AI/ML to improve their success rate and operational efficiency. Plus, using AI/ML effectively also saves money by enabling threat hunting teams to scale, doing more hunts within the same set of resources in the same time frame.
Take your interest into practice and download the data-centric threat hunting guide that was created through interviews and insights shared by Comcast’s cybersecurity team.