Snowflake
Integrate Query with the Snowflake data intelligence platform.
TL;DR
To integrate Snowflake with Query:
- Create a new User, or utilize an existing one, with a Password.
- Add a platform connection in Query using your Username, Password, Warehouse Name, Snowflake Account ID along with the Database, Schema, and Table/View.
- Use the Configure Schema workflow to introspect and map your data into the Query Data Model.
- Use Query Federated Search to surface any number of indicators, assets, findings, alerts, and anything else stored in your Snowflake tables by entities, events, and objects.
Overview
The Snowflake Data Cloud is a multi-cloud enterprise data warehouse and intelligence platform, billed as the AI Data Cloud. Snowflake supports big data, streaming, business intelligence (BI), machine learning (ML), and artificial intelligence (AI) workloads. In their own words: "Snowflake’s Data Cloud is powered by an advanced data platform provided as a self-managed service. Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.
The Snowflake data platform is not built on any existing database technology or “big data” software platforms such as Hadoop. Instead, Snowflake combines a completely new SQL query engine with an innovative architecture natively designed for the cloud. To the user, Snowflake provides all of the functionality of an enterprise analytic database, along with many additional special features and unique capabilities."
Security and IT teams use Snowflake as a direct Security Information & Event Management (SIEM) replacement or as an alternative data store to expand SIEM use cases such as enrichment, big data analytics, machine learning (ML), artificial intelligence (AI), and detections. Like other warehouse and lake houses, Snowflake has a centralized metadata catalog which organizes the various "objects" such as Databases, Schemas, Stages, Tables, Views, and more. Security and IT teams typically ingest source-and-log-specific data into tables and create various views from them which is where Query comes into play.
Query provides an easy-to-use interface to both search and model data stored within Snowflake. Query takes care of query planning, query translation, query execution, and windowing of results on top of the Python Snowflake SDK. You never have to write any SQL and Query ensures that all SQL that is submitted to the warehouse is as efficient as possible, meaning we will not blindly send an unlimited SELECT * FROM table
across to your tables or views.
Prerequisites
The Query integration with Snowflake relies on the usage of a "service account" user with a password. To be able to access every possible table and pre-built views (e.g., SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
or SNOWFLAKE.ACCOUNT_USAGE.USERS
) you will need to assign the ACCOUNTADMIN
role. Query is a read-only tool, you cannot use it to submit arbitrary SQL - willingly or maliciously - no CREATE
, ALTER
or other CUD events will ever happen.
Known Limitations
Query's Connector with Snowflake has the following known limitations.
- Query does not determine if a Warehouse is in a running state, it may lead to query errors or connection test failures.
- Only scalar data types and
VARIANT
semi-structured data types are supported.OBJECT
,ARRAY
,VECTOR
, and geospatial data is not currently supported.- RSA Private Key & Passphrase nor OAuth2 authentication options are currently available.
Create a "service account" user
You can skip this section if you already have a service account User setup in Snowflake. In the future, support for RSA Private Keys and Passphrase-based authentication will be added into Query.
-
As an
ORGADMIN
,ACCOUNTADMIN
or other role will permissions to create users navigate to the Admin section --> Users & Roles --> select + User from the top-right of the window, as shown below. (Fig 1.) -
At a minimum provide a User Name, Password, and an appropriate Default Role in the Advanced User Options Section and select Create User as shown below (Fig. 2). Snowflake security best practices recommends both providing an email to Admins as well as NOT assigning
ACCOUNTADMIN
,SECURITYADMIN
, orORGADMIN
Roles as the Default Role.
In the next sub-section you'll learn where to retrieve the next required parameters for configuring a Snowflake Connector for Query.
Retrieve additional required parameters
In addition to a Username and Password (and in the future, a Username, RSA Private Key, and Passphrase) you require additional parameters to configure the Snowflake Connector: Warehouse Name and Account ID are required to setup the Connector and Cursor, along with your desired Table/View target.
The Account ID is a concatenation of Organization Name and Account Name with a dash such as my_org-my_account
. There are three ways to retrieve this.
-
After logging into Snowflake, you can retrieve the values from the URL in your browser, Organization Name is the first value after
app.snowflake.com
, and Account Name is the second, e.g.,:https://app.snowflake.com/ibcdef/mr00001/
-
On the bottom left-hand corner of the page, select your User Profile and hover your cursor over Account. The top value is your Organization Name and your Account ID is highlighted text beneath it, as shown below (Fig. 3).
-
From a Snowflake Worksheet, submitting the query
SHOW ACCOUNTS
will provide a grid with both theorganization_name
andaccount_name
in a grid, as shown below (Fig. 4).
A virtual warehouse, often referred to simply as a “warehouse”, is a cluster of compute resources in Snowflake. Depending on the volumes and complexity of data you will be querying (e.g., if you have PB scaled materialized views versus simple tables of security data), you may want to pick a larger "t-shirt" size of Warehouse. For more information on warehouses, see the Considerations section of the Snowflake Documentation.
Warehouse selection consideration
- Query currently only pulls back 5000 records at a time per query per unique Connector.
- Query only uses a few operators within our query building such as casting, coercion, and selecting elements in JSON.
- Consider using an "always on" warehouse or one that can be place into a running state quickly to avoid transient errors due to failed tests or preview data retrieval on the Query-side.
To retrieve a Warehouse name you can do so in one of two ways.
-
Navigate to the Admin section --> Warehouses and copy the NAME of your desired target Warehouse from the list, as shown below (Fig 5).
-
From a Snowflake Worksheet, submitting the query
SHOW WAREHOUSES
will provide a grid of all Warehouses with thename
column containing the Warehouse Name parameter, as shown below (Fig. 6).
After gathering these details, the only other information you require are the list of Tables and Views and their associated upstream Schema and Database name.
Configure a Snowflake Connector
The Snowflake Connector is a dynamic schema platform configuration. Static schemas are platforms in which the Query team pre-configures the type of data normalization that happens and a dynamic schema platform gives the user control for mapping and normalizing data into the Query Data Model. For dynamic schemas, Query provides a no-code data mapping workflow to allow you to map your source data into the Query Data Model. For more information, see the Configure Schema and the Normalization and the Query Data Model sections, respectively.
To connect to your Snowflake tables, Query requires your "service account" username and password, Account ID (dash-separated concatenation of your Organization and Account Names), your Warehouse Name, as well as the specific Database, Schema and Table (or similar object, e.g., View or Materialized View).
Query stores your Snowflake Password in AWS Secrets Manager encrypted with an AWS Key Management Service (KMS) Customer Managed Key dedicated to your Query tenant. Every action within Query's backend requires with a policy-as-code agent - Open Policy Agent - and the only principal able to access the Secret and its KMS CMK is a dedicated AWS IAM Role that powers the infrastructure used for the Snowflake Connector dedicated to your tenant.
Configure Schema Limitations
Due to limitations in the introspection and mapping process, you may only map one Snowflake table per Connector. There are not any limits to the amount of Connectors you can create.
-
Navigate to the Connections page, select Add Connections, and select Snowflake from the Cloud Infrastructure and Data Lakes category, optionally type "Snowflake" in the search bar as shown below (Fig. 7).
-
In the Connection Info section of the platform connector, provide the following details, select Save, and then Test Connection as shown below (Fig. 8).
-
Connection Alias Name: A contextual name for the Connector, this is used to disambiguate multiple of the same connector.
-
Platform Login Method: Leave the default value
Default Login
. -
Username: The Username of your "service account" user.
-
Password: The Password for your "service account" user. YOU CANNOT EDIT OR RETRIEVE THIS VALUE AFTER CREATION -- IF YOU SET PASSWORD EXPIRATIONS YOU MUST RECREATE THE CONNECTOR.
-
Account ID: Your Account ID is a concatenation of Organization Name and Account Name with a dash such as
iabcde-mr00001
. -
Database Name: The name of the database that contains the specific schema and table you want to query.
-
Table Name: The table name that contains the data you want to query. Can also be a View, Federated Table, Federated View, or Materialized View.
-
Warehouse ID: The Warehouse name.
-
Schema Name: The schema name in the provided database that contains the table you want to query.
-
-
Execute the Configure Schema workflow to map your target table data into the QDM.
After completing the Configure Schema workflow, you are ready to perform Federated Searches against your Snowflake tables (or various flavors of views).
Querying the Snowflake Connector
Querying your data completely depends on how your configured your schema mapping, as not all data sources are created the same, you can theoretically query across any number and combination of the currently supported Entities within the Query Federated Search platform.
- Hostnames (and Domains)
- IP Addresses (IPv4 and IPv6)
- MAC Addresses
- User Names
- Email Addresses
- URL Strings (and URIs)
- File Names
- Hashes (e.g., MD5, SHA1, SHA256, SSDEEP, VHASH, etc.)
- Process Names
- Resource IDs
- Ports
- Subnets (e.g.,
192.168.1.0/24
or2001:0db8:85a3:0000::/64
) - Command Lines (e.g.,
python3 encrypted.py
orssh user@ubuntu-ip-10-0-0-1
) - Country Code (e.g.,
US
orCN
) - Process ID
- User Agent
- Common Weaknesses & Enumerations (CWE) IDs (e.g.,
CVE-2024-100251
) - Common Vulnerabilities & Enumerations (CVE) IDs (e.g.,
CWE-79
) - User Credential UID (e.g., AWS User Access Key ID -
AKIA0007EXAMPLE0002
) - User ID
- Group Name
- Group ID
- Account Name
- Account ID
- Script Content
Resources
Refer to the previous sections' hyperlinks for more information on specific resources, services, tools and concepts. For further help with creating tables and performance tuning, consider some of the resources below.
- Snowflake User management: https://docs.snowflake.com/en/user-guide/admin-user-management
- Introduction to organizations https://docs.snowflake.com/en/user-guide/organizations
- Managing accounts in your organization https://docs.snowflake.com/en/user-guide/organizations-manage-accounts
- Snowflake security https://docs.snowflake.com/en/guides-overview-secure
- Snowflake warehouses https://docs.snowflake.com/en/user-guide/warehouses-overview
- Databases, Tables and Views - Overview https://docs.snowflake.com/en/guides-overview-db
Troubleshooting
- If you recently changed your permissions / Role in Query, log out and in again and clear your cache if you cannot Save or Test Connection.
- Verify your Snowflake user has proper Role access to the Warehouse/Database/Schema/Table/View.
- Verify your Password is not expired or rolled over.
- Verify you are using the correct connection parameters.
- Verify the name and location of your Database, Schema, and/or Tables.
- Verify Network Rules are not blocking access from external IP ranges.
- Verify encryption or column security settings are not blocking your users.
- Ensure you Account ID is created from the correct Organization and Account name and is separated by a
-
and not any other whitespace.
If you have exhausted the above Troubleshooting list, please contact your designated Query Sales Engineer or Customer Success Manager. If you are using a free tenant please contact Query Customer Success via the Support email in the Help section or via Intercom within your tenant.
Updated 5 months ago