Cybersecurity and compliance are team sports where data is the primary equipment. The challenge to building data-driven security programs is that vendor-specific security tools speak different languages with unique semantics and file formats. This makes gathering and using insights challenging because cross-function teams need to connect dissimilar security tools that generate high volumes of often-duplicated or replicated data.
In response to these challenges, the Open Cybersecurity Schema Framework (OCSF) was introduced as a standardized, holistic schema that delivers security data consistency, faster analysis, and improved collaboration.
What is the Open Cybersecurity Schema Framework?
The Open Cybersecurity Schema Framework (OCSF) is a collaborative, open-source project delivering a vendor-agnostic, standardized database structure that:
- Normalizes common security event data
- Defines versioning criteria to enable the schema's evolution
- Provides security log producers and consumers a self-governance process
- Provides security log consumers a vendor-agnostic standardized log structure
While the OCSF initially focuses on creating a standardized schema for cybersecurity events, it is not restricted to the cybersecurity domain. Essentially, the OCSF maps these personas to:
- Vendors: Authors and producers who use it for their specific domains, like a firewall or endpoint detection and response (EDR) technology provider
- Data engineers: Mappers who need to help security teams simplify data ingestion and normalization
- Data scientists and analysts: Consumers who need a common language for threat detection and investigation
Intended to be implementation agnostic, the OCSF can be used with any:
- Security tools that generate log data
- Storage format, including security data lakes
- Data collection processes
- Extract, Transform, and Load (ETL) processes
What is the format of the OCSF schema?
Although the OCSF schema delivered in JavaScript Object Notation (JSON) format, the standard is agnostic to how an organization stores the data and uses the following elements to categorize and classify the elements of the data structure:
Data Types, Attributes, and Arrays: Common data types (strings and integers) and scalar data types (timestamps or IP addresses) that help build out the unique identifier names (attribute) for simple data types or contextually related attributes (object) for complex data types
Event Class: particular sets of attributes that define a type of activity within the system
Category: groupings of event classes that enable clarifying an event’s domain, making documentation and search more manageable, and streamlining reporting
Profile: additional related attributes overlaid on event classes and categories, like including malware detection when an endpoint detection and response (EDR) tool is the data source
Extensions: new attributes, objects, event classes, categories, and profiles that can apply to existing data elements without modifying the core schema
OCSF categories
The OCSF categories are fields that align event classes to specific domains. The six primary categories are:
- System Activity
- Findings
- Identity & Access Management
- Network Activity
- Discovery
- Application Activity
OCSF event classes
The OCSF identifies and defines thirty-three event classes. Each event class has a:
- Caption: Event title
- Name: Field as formatted in the log
- ID: numerical identifier
- Description: definition as used by the schema
Consider the following example of an event class:
- Caption: Process Activity (1007)
- Name: process_activity
- ID: 1007
- Description: Process Activity events report when a process launches, injects, opens or terminates another process, successful or otherwise.
Additionally, the OCSF provides additional contextual information about an event class, enabling enhanced correlation across multiple event classes and within categories.
For example, with the File System Activity event class, the OCSF provides 35 additional data elements, including:
- Actor: person performing an activity
- Component: name or relative pathname of a sub-component of the data object
- Connection identifier: network connection identifier
- File differences: file content differences used for change detection
OCSF source identification
The OCSF has various fields to identify what technology generated the log or event. For example, the OCSF uses the following values for the VPC Flow Logs from Amazon Web Services (AWS):
- Source: VPC Flow Logs
- Metadata.product.name: Amazon VPC
- Metadata.product.vendor_name: AWS
- Metadata.product.feature.name: Flowlogs
- Class_name: Network Activity
What are the values and benefits of OCSF?
A large enterprise can deploy one hundred or more security tools, all of which use their proprietary data formats. The OCSF can help organizations overcome these problems, giving them a way to optimize their security data.
Easier Data Management
OCSF provides the flexibility for organizations to add, remove, or replace data sources and security tools based on their needs without breaking upstream security analytics. The standardized schema, for example, will allow organizations that have a new EDR vendor to seamlessly switch vendors without disrupting their existing detection capabilities that use EDR data.
Correlate Events
With all security telemetry conforming to a single format, organizations can aggregate more data for enhanced correlations. For example, by correlating data efficiently between EDR and Identity and Access Management (IAM) tools, they create high-fidelity detection rules in their Security Information and Event Management (SIEM) tools that better identify potential malware infections, saving analysts time and reducing alert fatigue.
Reduce Storage Costs
By normalizing security telemetry, the OCSF enables organizations leveraging data lakes to save more money by reducing duplicated data elements like fields or content. Organizations use their SIEMs for security detection, investigation, and response. However, these technologies weren’t built for the terabytes or petabytes of data that cloud and on-premises environments generate every day. Rather than choosing what data matters, organizations using the OCSF schema can retain all security telemetry in a security data lake at a lower cost.
Optimize Data’s Value
With the OCSF schema, organizations can deduplicate their security data, giving them more accurate data. With clean data, they can leverage advanced analytics models for:
- Active detections, like applying sigma rules for high volume detections on streaming data
- Continuous controls monitoring (CCM)
- Using business intelligence tools for compliance reporting