Normalization and the Query Data Model (QDM)

Search Results Normalization into QDM (Query Data Model)

Query normalizes the results that come from different data sources into a standardized cybersecurity data schema called the QDM (Query Data Model) which is heavily inspired by the Open Cybersecurity Schema Framework (OCSF). OCSF is open-source and an industry standard, with the backing and collaboration from key vendors in cybersecurity. OCSF was announced at BlackHat 2022 with its initial founding coalition including organizations like Splunk, AWS, Broadcom, Cloudflare, CrowdStrike, IBM Security, Okta, Palo Alto Networks, Rapid7, Sumo Logic, Tanium, Trend Micro, and Zscaler. For more information on OCSF, please refer to:

Query's Data Schema is OCSF

Query's Data Schema is OCSF

Query plays the role of the "Data Broker" by converting data from data source vendors' native formats into QDM. This normalization results in OCSF "Objects" as Query's cybersecurity entities of interest, and OCSF "Event Classes" as the activity related to those events.

📘

Query Data Model

View and browse the QDM at https://schema.query.ai/

Note: OCSF is rapidly evolving, hence Query is using a smaller, stable subset, along with some additional modifications, suitable for our cybersecurity Federated Search use-cases. Please refer to the official OCSF Documentation link to understand the broader schema.

OCSF Objects as QDM's Cybersecurity Entities

OCSF Objects have been adopted into Query's data model to represent cybersecurity Entities. Each object has its set of attributes that Query can extract and set from the federated search results coming from multiple disparate data sources. Below are the OCSF Objects in Query's Data Model:

  • User: Represents a user. Its attributes include fields like name, email_addr, uid, ...
  • Device: Represents an endpoint. Its attributes include fields like hostname, ip, instance_uid, ...
  • Network Endpoint: Represents any public/private source/destination of a network connection. Its attributes include fields like hostname, ip, port, svc_name, ...
  • Process: Represents running instance of a launched program. Its attributes include fields like name, pid, parent_process, user, ...
  • Email: Represents email metadata such as sender, recipients, and direction. Its attributes include fields like from, to, subject, size, ...
  • File: Represents files, folders, links and mounts, including the reputation information, if applicable. Its attributes include fields like name, path, type_id, fingerprints, ...
  • URL: Represents the path and reputation of a URL. Its attributes include fields like hostname, path, scheme, port, query_string, ...
  • Domain Info: Represents registration information pertaining to a domain. Its attributes include fields like domain, registrar, created_time, modified_time, ...
  • Location: Represents geographic location information. Its attributes include fields like coordinates, city, country, ...

QDM Entities extracted from OCSF Objects

As stated above, the Query Federated Search Platform selectively surfaces certain fields from within OCSF objects, the below list are the normalized Entities you are able to dispatch searches with. The Query Federated Search Platform will translate the query into whichever combinations of API calls, SQL statements, KQL statements, or otherwise, and bring back deduplicated, normalized and correlated (by parent->child relationships) records with that entity in them.

For instance, if you search for an IP Address 205.100.1.1, Query will dispatch the search to all of the Connectors you have enabled. In this example, you may have SentinelOne (an Endpoint Detection & Response tool), Microsoft Intune and JAMF Pro (both Mobile Device Management tools), and Tégo Threat Feed API (a Threat Intelligence concentrator) -- each will bring back different records such as related findings, alerts and potential devices from EDR, onboarded endpoints or servers from MDM, and relevant geolocation, reputation scoring, registrar and reverse DNS information from CTI tools. This can help Analysts, Detection Engineers, and other users of the Query Federated Search Platform to quickly collate, Orient, Decide and Act on the information they're provided with for closing incidents, developing detection content, or otherwise.

As of 15 FEB 2024, the following entities are supported.

  • Email Address: An Email Address is a designation for an electronic mailbox that sends and receives messages on a computer network. This is irrespective of directionality and encompasses To, From, CC, and BCC addresses.
  • File Hash: A File Hash is a unique numerical value calculated using a mathematical function that identifies/corresponds to the content of a file. For example: MD5, SHA2, SHA256, and similar hashing algorithm hex digests.
  • File Name: A File Name is the complete title of a file and file extension - optionally including the path - and is used to uniquely identify a computer file in a file system, directory, object storage, and similar.
  • Hostname: A Hostname is the name of a computer connected to a network used to identify/correspond to the machine on the network. This may overlap with a Domain Name or Resource Name depending on the system of record (this refers to future additions to QDM Entities!).
  • IP Address: An Internet Protocol Address is a unique address that identifies a device on the Internet or a local network. This can be an IPv4 or IPv6 address.
  • URL: An Uniform Resource Locator (URL) is a unique identifier used to locate a resource on the Internet and is also referred to as a web address. This can also include service-specific Universal Resource Indicators (URIs) such as Amazon S3 URIs or GCP Storage Bucket URIs.
  • Username: A Username is a phrase, word, or combination of characters used to identify and gain access to a computer or computer network. This can be an email address, a full name, a Service Principal Name (SPN), and similar.
  • Resource ID: A Resource ID is any identifier (uuid, GUID, UID, ID, serial number, etc.) or canonical lookup value for any asset (e.g., cloud resource, endpoint, mobile device, etc.) on its own or implicated within a finding. This can be an AWS Resource Name (ARN), an ETag, an ID from another cloud provider, the machineId within Defender 365, the aid or device_id within Crowdstrike Falcon, and similar.
  • MAC Address: The MAC address of a device or endpoint. Identified by hexadecimal characters with colon (:) or dash (-) delimiters.
  • Process Name: The name of a Process (or Thread) as identified by a source system. Either on its own, as a Parent, or part of a Command Line.

Several other Entities are under active development as well as the ability to search upon any field within any Object in the future.

Relevant QDM Event Classes

Query normalizes and correlates the above OCSF Objects' activity information coming from various data sources into the event classes below:

  • Security Finding: Security Finding events describe findings, detections, anomalies, alerts and/or actions performed by security products.
  • Email Activity: Email Activity events report findings and activities of emails.
  • File System Activity: File System Activity events report when a process performs an action on a file or folder.
  • Account Change: Account Change events report when specific user account management tasks are performed, such as a user/role being created, changed, deleted, renamed, disabled, enabled, locked out or unlocked.
  • Authentication: Authentication events report authentication session activities such as user attempts a logon or logoff, successfully or otherwise.
  • Authorization: Authorization events report special privileges or groups assigned to a session.
  • Entity Management: Entity Management events report activity by a managed client, a micro service, or a user at a management console. The activity can be a create, read, update, and delete operation on a managed entity.
  • Network Activity: Network Activity events report network connection and traffic activity.
  • HTTP Activity: HTTP Activity events report HTTP connection and traffic information.
  • API Activity: API Activity events describe general CRUD (Create, Read, Update, Delete) API activities, e.g. (AWS Cloudtrail).

Relevant OCSF Event Categories for Query

OCSF groups similar Event Classes into what it calls "Categories". The above Event Classes fall into the below OCSF Categories. Note that Categories are not displayed in the UI and are listed here more for information on Query's schema:

  • Findings: Category for any finding events. This includes security_finding events.
  • System Activity: Category for any system activity events. This includes file_system_activity events.
  • Audit Activity: Category for any audit activity events. This includes account_change, authentication, authorization, and entity_management events.
  • Network Activity: Category for any network activity events. This includes network_activity, http_activity, email_activity, and api_activity events.

Searching by Time

All events in OCSF have three time attributes: time, start_time, and end_time. Query always provides a value for event time. Most systems of record have only one timestamp for events; in these cases, start_time and end_time will be empty. When you change the value of the time picker in the search bar, you're changing a filter on the time field.

Objects in OCSF do not have an association to time. Most systems of record that provide data on objects also lack this association; they respond to queries with information about the current state of the environment. Because of this, time filters are ignored when searching for objects.