Security Data Pipelines
Write data the right way, powered by Query Federated Search.
Coming Soon!
Overview
Sometimes you just need to move data. Security teams use telemetry from dozens of sources. When it’s time to store, search, or analyze that data, things break down. SIEM costs explode. Custom pipelines get brittle. And most so-called “data platforms” just add complexity.
Query Security Data Pipelines are powered by the same engine that powers Query Federated Search, with the main benefit of providing in-situ normalization and standardization of disparate and federated data sources into the Open Cybersecurity Schema Framework (OCSF) data model. The OCSF data model provides translation of common security key-value pairs into easy to find sources to power machine learning feature extraction, fine-tuning LLM FMs, performing analytics, or just searching and visualizing your security data.
Query Security Data Pipelines are architecturally simplistic. You choose a Source which is any Static Schema Connector, except Cyber Threat Intelligence & OSINT (besides MISP) along with a Destination which include Amazon S3, Azure Blob & ADLSv2, Google Cloud Storage, Splunk HEC, and the Cribl Stream HTTP Bulk API. You choose a single Source and one or more Destinations per Pipeline which define the schedule and in future releases you'll be able to choose specific OCSF events, specific Attributes, as well as perform pre-processing tasks by filtering on specific values.
Use Cases
While Query Federated Search provides connectors to support ad-hoc analytics, detections, search, and visualizations of your data - there are times where moving to more durable storage makes more sense. Query never stores or otherwise duplicates your data and supports many APIs that are not always ideal for ad-hoc search due to rate limiting, smaller retention windows, and response times. The following use cases are supported by Query Security Data Pipelines
- Longer term retention of downstream data, especially for platforms with limited retention such as Entra ID, Google Workspace, JumpCloud SSO, and otherwise.
- Moving data in a performant way to object storage, SIEMs, and data lakes for more performant and complex search capabilities.
- Search Amazon S3 Destinations with AWS Glue + Amazon Athena Connector
- Search Azure Blob Destinations with Azure Data Explorer (ADX) Connector
- Search Google Cloud Storage Destinations with Google BigQuery Connector
- Search Cribl Stream - HTTP Destinations by pipelining data to Cribl Lake and using the Cribl Search Connector
- Searching Splunk HEC Destinations with the Splunk Connector
- Taking advantage of OCSF-formatted data for MLOps and AIOps.
- Visualizing data in durable object storage with BI tools such as Qlik, QuickSight, PowerBI, and Looker Studio.
Supported Sources
For the purpose of Query Security Data Pipelines, a Static Schema Connector is synonymous with a Source. The following categories of sources are supported today. In the future, we will begin to support Dynamic Schema Connectors such as Splunk, Cribl Search, Snowflake, Amazon Security Lake, ClickHouse Cloud, and others. Please reach out to [email protected] referencing any Pipeline Source you want us to support.
Cloud Infrastructure and Security Sources
Data Security
Developer Security
Endpoint
- Carbon Black Cloud
- CrowdStrike Falcon API
- Microsoft Defender for Endpoint
- SentinelOne Singularity Platform
Identity and HR
IT Service Management
Mobile Device Management
SIEM and Log Management
Threat Intelligence and Enrichment
Supported Destinations
Destinations are purpose-built write-only sources supported by Query Security Data Pipelines. Each Destination is written into the right way including but not limited to:
- Proper partitioning and/or clustering, in accordance with the Hive specification
- Data is written in ZStandard (ZSTD) or Snappy compressed Apache Parquet, except for sources that expect JSON, it is written in GZIP-compressed JSON (not JSON-LD or JSON-ND garbage).
- Data is written in batches that match the maximum acceptable throughput.
- Data is written in OCSF.
If you need another Destination or output parameter supported, please reach out to [email protected] referencing any Pipeline Destination and parameter variation you want us to support. On our immediate roadmap: Amazon Security Lake, Databricks, and Snowflake.
- Amazon S3
- Azure Blob Storage (and ADLSv2)
- Cribl Stream - HTTP Sources
- Google Cloud Storage
- Splunk HTTP Event Collector (HEC)
Updated 21 days ago