Security Data Pipelines

Write data the right way, powered by Query Federated Search.

Coming Soon!

Overview

Sometimes you just need to move data. Security teams use telemetry from dozens of sources. When it’s time to store, search, or analyze that data, things break down. SIEM costs explode. Custom pipelines get brittle. And most so-called “data platforms” just add complexity.

Query Security Data Pipelines are powered by the same engine that powers Query Federated Search, with the main benefit of providing in-situ normalization and standardization of disparate and federated data sources into the Open Cybersecurity Schema Framework (OCSF) data model. The OCSF data model provides translation of common security key-value pairs into easy to find sources to power machine learning feature extraction, fine-tuning LLM FMs, performing analytics, or just searching and visualizing your security data.

Query Security Data Pipelines are architecturally simplistic. You choose a Source which is any Static Schema Connector, except Cyber Threat Intelligence & OSINT (besides MISP) along with a Destination which include Amazon S3, Azure Blob & ADLSv2, Google Cloud Storage, Splunk HEC, and the Cribl Stream HTTP Bulk API. You choose a single Source and one or more Destinations per Pipeline which define the schedule and in future releases you'll be able to choose specific OCSF events, specific Attributes, as well as perform pre-processing tasks by filtering on specific values.

Use Cases

While Query Federated Search provides connectors to support ad-hoc analytics, detections, search, and visualizations of your data - there are times where moving to more durable storage makes more sense. Query never stores or otherwise duplicates your data and supports many APIs that are not always ideal for ad-hoc search due to rate limiting, smaller retention windows, and response times. The following use cases are supported by Query Security Data Pipelines

  • Longer term retention of downstream data, especially for platforms with limited retention such as Entra ID, Google Workspace, JumpCloud SSO, and otherwise.
  • Moving data in a performant way to object storage, SIEMs, and data lakes for more performant and complex search capabilities.
  • Taking advantage of OCSF-formatted data for MLOps and AIOps.
  • Visualizing data in durable object storage with BI tools such as Qlik, QuickSight, PowerBI, and Looker Studio.

Supported Sources

For the purpose of Query Security Data Pipelines, a Static Schema Connector is synonymous with a Source. The following categories of sources are supported today. In the future, we will begin to support Dynamic Schema Connectors such as Splunk, Cribl Search, Snowflake, Amazon Security Lake, ClickHouse Cloud, and others. Please reach out to [email protected] referencing any Pipeline Source you want us to support.

Cloud Infrastructure and Security Sources

Data Security

Developer Security

Endpoint

Identity and HR

IT Service Management

Mobile Device Management

SIEM and Log Management

Threat Intelligence and Enrichment

Supported Destinations

Destinations are purpose-built write-only sources supported by Query Security Data Pipelines. Each Destination is written into the right way including but not limited to:

  • Proper partitioning and/or clustering, in accordance with the Hive specification
  • Data is written in ZStandard (ZSTD) or Snappy compressed Apache Parquet, except for sources that expect JSON, it is written in GZIP-compressed JSON (not JSON-LD or JSON-ND garbage).
  • Data is written in batches that match the maximum acceptable throughput.
  • Data is written in OCSF.

If you need another Destination or output parameter supported, please reach out to [email protected] referencing any Pipeline Destination and parameter variation you want us to support. On our immediate roadmap: Amazon Security Lake, Databricks, and Snowflake.

  • Amazon S3
  • Azure Blob Storage (and ADLSv2)
  • Cribl Stream - HTTP Sources
  • Google Cloud Storage
  • Splunk HTTP Event Collector (HEC)