Amazon Athena (for Amazon S3)
Integrate Query with any data in Amazon S3 via Amazon Athena
TL;DR
To integrate Amazon Athena to query data stored in Amazon S3 with Query:
- Configure Amazon Athena in your AWS Account by creating & specifying a Results bucket or define it within an Athena Workgroup.
- Use AWS Glue Crawlers or Amazon Athena Data Definition Language (DDL) statements to create Tables (or a View!) of your target data.
- Deploy an AWS IAM Role with External ID and permissions to use Amazon S3, Amazon Athena, AWS Glue, and (optionally) AWS Key Management Service APIs.
- Add a connection source per Table you want to integrate into Query using your AWS IAM Role.
- Use the Configure Schema workflow to introspect and map your data into the Query Data Model
- Use Query Search to surface nearly any relevant security data points in your data to include Resource GUIDs, Usernames, Email Addresses, Domains, Hostnames, Hashes (e.g. MD5, SHA1, SHA2), User Agents, Process Names, File Names, and more.
Overview
Amazon Athena is a serverless analytics service on the Amazon Web Services (AWS) Cloud built upon Trino and Presto that allows you to perform interactive analysis and querying against data stored within Amazon Simple Storage Service (S3) buckets. Athena is able to work with several open-table formats such as AWS Glue Data Catalog, Apache Iceberg and Delta Lake and file formats such as plaintext (.txt
), Apache Parquet, Apache Avro, Apache ORC, JSON, XML, CSV, and more.
Query integrates with Amazon Athena for the purposes of querying data within Amazon S3 buckets, this can range from general information technology (IT) to specialized security data such as high-cost and high-volume logs such as Authorization logs, Packet Captures, DNS or DHCP traffic, or telemetry from Endpoint Detection & Response (EDR) or Host-based Intrusion Detection/Prevention Systems (HIDPS), and more.
Within the Query Federated Search platform this data source is considered a Dynamic Schema platform, as it can be in any format you choose, refer to the Configure Schema section for more information to help you quickly onboard your data.
The Query Federated Search Connector for Amazon Athena (for Amazon S3) is able to make use of any Table or View you create that is registered in the AWS Glue Data Catalog, whether using the CREATE TABLE command directly within your Athena SQL DDL statements or tables that were created using AWS Glue Crawlers or otherwise. Query supports any table format including Glue, Snowflake and Delta Lake, though some unintentional bugs may occur if there are any nuances with data types or operators.
However, as noted, this Connector will only work for Amazon S3 and does not work for any Athena Federated Search or External Datasources, additionally this will not work for External Hive meta stores, ODBC, or JDBC drivers at this time. In the future, these variations will mostly likely result in new types of Connectors being added dedicated to the variations.
Using the Query Federated Search platform you can quickly and easily surface any supported Entity within the platform if it matches in your data such as IP Addresses, Process Names, Domains, Hostnames, MAC Addresses, and more. Our platform will craft Athena SQL DDL statements on your behalf to pull out exactly the data you require, all results are normalized, deduplicated, correlated and enriched such that an entity-based search would surface similar data points across all of your onboarded resources. For example, you could onboard logs from Next Generation Firewalls (NGFW), Cloud Access Security Brokers (CASB), Google VPC Flow Logs, and any VPN solution and a search for an IP address will collate all matches across various log sources.
Prerequisites
Before using the Query Federated Search Connector for Amazon Athena (for Amazon S3), you must create an Amazon S3 bucket and configure Amazon Athena to write results to it, the S3 bucket itself can be referenced using the Bucket Name or the Amazon Athena Workgroup Name for the Connector.
Lastly, if you do not have the AWS CLI installed and/or configured, refer here. Ensure you have permissions to create and modify IAM Roles and Policies.
iam:CreateRole
iam:CreatePolicy
iam:AttachRolePolicy
The Query Federated Search Platform can create multiple different Connectors per Amazon Athena (for Amazon S3) tables or views. Additionally, Query contains the credentials necessary for querying Amazon Athena (for Amazon S3) with each unique Connector, enabling you to onboard lakes in multiple different AWS Accounts and Regions regardless of the principal AWS Organization.
Configure an Amazon Athena (for Amazon S3) Connector
Amazon Athena (for Amazon S3) Connectors within the Query Federated Search Platform are dynamic schema platform configurations. Static schemas are platforms in which the Query team pre-configures the type of data normalization that happens and a dynamic schema platform gives the user control for mapping and normalizing data into the Query Data Model. For dynamic schemas, Query provides a no-code data mapping workflow to allow you to map your source data into the Query Data Model. For more information, see the Configure Schema and the Normalization and the Query Data Model sections, respectively.
To enable connectivity into your AWS Account, the Query Federated Search platform uses AWS IAM Roles with a read-only, randomly generated UUID as the ExternalId
per unique connector configured. This random ExternalId
is generated after entering in all relevant details, so you will need to determine what you will name your AWS IAM Role ahead of time. As a matter of best practice, this mechanism is intended to create a single Policy, a Single Role, and to be used with a single Connector with its own distinct Lake Formation permissions.
In the tasking order, you will:
- Provide required details to a selected Amazon Athena (for Amazon S3) Connector.
- Create an IAM Policy with the generated Identity Policy snippet
- Create an IAM Role with the generated Trust Policy snippet, attach the IAM Policy
- Test Connection, configure your mapping via the Configure Schema workflow, and enable the Connector
Proceed to the next sections to learn how to activate an Amazon Athena (for Amazon S3) Connector and start searching.
Pre-configure an Amazon Athena Connector
Use the following steps to create a new Query Federated Search Connector for Amazon Athena, in this example we will onboard AWS CloudTrail Management Events, however the same steps can be used for any of the static schema or "first party" Athena Connector within the Query Federated Search Platform. Be sure to change the required details such as the Table Name and using the proper IAM Role.
-
Navigate to the Connections page, select Add Connections, and select Amazon Athena from the Cloud Infrastructure and Data Lakes category, optionally type "Athena" in the search bar as shown below (FIG. 1).
-
In the Connection Info section, add the following details listed below the screenshot (FIG. 2).
-
Name: The human-readable name you want to give to this connector, you can name it whatever you want, such as naming it after the source data in each Table you query.
-
Platform Instance: Leave default (e.g.,
Amazon Amazon Athena
) -
External ID: This value will be generated after you Save the Connector.
-
AWS Account ID: The same AWS Account ID where your Athena Workgroup, Target and Results Amazon S3 Buckets are located.
-
Role Name: The name of the AWS IAM Role that delegates access to the Query Federated Search AWS Account. Remember, you will create this Role AFTER pre-configuration, ensure you use the same Role Name when you do so.
-
Amazon Athena Target Bucket Name: The name of the Amazon S3 that contains the data you wish to query with Amazon Athena, this same bucket should be referred to by your Glue Database and Glue Table(s).
-
Catalog Name: The name of the AWS Glue Data Catalog in your AWS Account, this should be
AwsDataCatalog
in most cases but can also bedefault
. -
Database Name: The name of the AWS Glue Database which contains the AWS Glue Table or Amazon Athena View that you query.
-
Table/View Name: The name of the AWS Glue Table or Athena View that corresponds to the data you are trying to query.
-
Results S3 Bucket Name: The name of the Amazon S3 bucket that you configured to write Amazon Athena query results into.
-
Amazon Athena Workgroup Name (OPTIONAL): In lieu of supplying a value for Results S3 Bucket Name and KMS Key you can supply a custom Amazon Athena Workgroup that defines these values.
-
Results Encryption Option: A dropdown selection for the type of encryption used, either Amazon S3-managed Keys (SSE-S3), your own AWS KMS CMK or an external encryption key used for Customer-Side Encryption (CSE). You must use the SAME type of encryption in both areas, if you do not specify any encryption in Amazon Athena, leave this as
Amazon S3-managed keys (SSE-S3)
. -
KMS Key (OPTIONAL): The AWS Resource Name (ARN) or the GUID of the AWS KMS CMK used for SSE-CMK or SSE-CSE, if used. If you use SSE-S3 encryption leave this option blank.
-
AWS Region: The same AWS Region where your Athena Workgroup, Athena and Results Amazon S3 Buckets are located.
-
-
Select Save from the bottom-right of the connection pane as shown below (FIG. 3). This ensures that all supplied information is saved and allows the Query backend to generate IAM Policy JSON snippets and an External ID unique to this Connector.
-
Scroll to the bottom of the Connector card and copy the AWS IAM policy snippets as shown below (FIG. 4) for both the Trust Policy (for your IAM Role) and Identity Policy (for your IAM Policy) and proceed to the next section.
Create AWS IAM Policy for Amazon Athena (for Amazon S3) Connector
In this section you will use the supplied Identity Policy from the previous section to create an AWS IAM Policy. While only a specific Glue Database and Table combination is specified, your entire Athena and Amazon Athena Results S3 Buckets have permissions granted for them. If you want to limit the Role to a specific prefix (folder/directory) path in Amazon S3 for either bucket, ensure you modify the JSON snippet as necessary.
If you encrypt your data in S3 using AWS KMS Customer Managed Keys (CMKs) and/or encrypt your Amazon Athena query results, you must also add the following permissions to this IAM Policy.
kms:Decrypt
kms:GenerateDataKey
kms:Encrypt
The following steps use an Amazon Linux 2023 AWS CloudShell environment and the AWS CLI, adapt these instructions to your specific operating system or deployment strategy (e.g., AWS CloudFormation, Terraform, AWS Console, etc.).
- Create an environment variable that contains the content of your Identity Policy as shown below, ensure you replace the section
{ "YOUR_POLICY_HERE }
with your actual policy!
POLICY_JSON=$(cat <<EOF
{ "YOUR_POLICY_HERE }
EOF
)
- Create the AWS IAM Policy with the following command, ensure you change the
--policy-name
value for whatever name you wish to grant the policy.
aws iam create-policy \
--policy-name "QueryFederatedSearchAthenaPolicy" \
--description "Grants the Query Federated Search Platform Glue, Athena, and S3 permissions." \
--policy-document "$POLICY_JSON"
- (OPTIONAL STEP) if you use AWS KMS CMKs to encrypt the Amazon Athena query results and/or to encrypt the data within your Athena Target S3 Bucket, add the following policy into the created AWS IAM Policy using the AWS Console. Ensure you replace the values of
$AWS_REGION
,$AWS_ACCOUNT_ID
and$YOUR_KEY_ID_HERE
with your proper AWS Region, AWS Account ID, and KMS Key ID respectively.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "kmsPermissions",
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey"
],
"Resource": "arn:aws:kms:$AWS_REGION:$AWS_ACCOUNT_ID:key/$YOUR_KEY_ID_HERE"
}
]
}
In the next section you will create an AWS IAM Role, attach this Policy to it, and optionally modify you AWS KMS Key Policy if you use a KMS CMK. Ensure you copy the value of the IAM Policy ARN as you will need it in the next section!
Create AWS IAM Role & attach IAM Policy
In this section you will use the supplied Trust Policy from the previous section to create an AWS IAM Role which trusts the Query Production Account and delegates access to your Glue, Athena, S3 (and optionally KMS) resources. The ExternalId
that is generated is to prevent a Confused Deputy flaw within the role assumption logic, however, this can be cumbersome to maintain multiple specific IAM Roles per-Connector despite it being a best practice. To that end, if this matches with your own internal identity governance (and other) requirements, you can specify multiple ExternalId
entires in one IAM Trust Policy.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::822484525064:root"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": [
"foo",
"bar",
"69420",
"vampirevampire"
]
}
}
}
]
}
If you encrypt your data in S3 using AWS KMS Customer Managed Keys (CMKs) and/or encrypt your Amazon Athena query results, you must also add the following permissions to your AWS KMS Key Policy and trust the Principal (the IAM Role) you will create.
kms:Decrypt
kms:GenerateDataKey
kms:Encrypt
The following steps use an Amazon Linux 2023 AWS CloudShell environment and the AWS CLI, adapt these instructions to your specific operating system or deployment strategy (e.g., AWS CloudFormation, Terraform, AWS Console, etc.).
- Create an environment variable that contains the content of your Trust Policy as shown below, ensure you replace the section
{ "YOUR_TRUST_POLICY_HERE }
with your actual policy!
TRUST_POLICY_JSON=$(cat <<EOF
{ "YOUR_TRUST_POLICY_HERE }
EOF
)
- Create the AWS IAM Role with the following command, ensure you change the
--role-name
value for whatever name you pre-configured the Connector with, you can change the name in the Connector but not the AWS IAM Role!
aws iam create-role \
--role-name "QueryFederatedSearchForAthena" \
--description "Delegates access to Amazon Athena and related services to the Query Federated Search Platform" \
--assume-role-policy-document "$TRUST_POLICY_JSON"
- Attach the IAM Policy you created in the previous section with your IAM Role with the following command, ensure you replace the values for
--role-name
and--policy-arn
for your actual values - the Policy ARN placeholder for$AWS_ACCOUNT_ID
should have your proper AWS Account ID.
aws iam attach-role-policy \
--role-name "QueryFederatedSearchForAthena" \
--policy-arn "arn:aws:iam::$AWS_ACCOUNT_ID:policy/QueryFederatedSearchAthenaPolicy"
- (OPTIONAL STEP) if you use AWS KMS CMKs to encrypt the Amazon Athena query results and/or to encrypt the data within your Athena S3 Bucket you must add the following Statement to your AWS KMS Key Policy. Replace
$AWS_ACCOUNT_ID
and$ROLE_NAME
with your AWS Account ID and the created IAM Role name, respectively.
{
"Sid": "queryFederatedSearchPermissions",
"Effect": "Allow",
"Principal": {
"AWS": "${IAMRoleArn}"
},
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey"
],
"Resource": "arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME"
}
In the final step you will add your schema mapping, test your connection and finalize the Connector.
Finalize configuring your Amazon Athena (for Amazon S3) Connector
In this section you will stage your schema mapping, verify all of your permissions, test that the connection works, and enable it so that you can dispatch searches to Amazon Athena (for Amazon S3) from the Query Federated Search Platform.
-
Execute the Configure Schema workflow to map your target table or view data into the QDM.
-
Ensure that the Connector is still enabled by toggling the Enabled to the right-hand position, though, this should not become disabled during the previous steps.
In the event that you have not achieved a successful Test Connection or cannot Fetch Schema, refer to the Troubleshooting sub-section of Resources, below.
Querying the Amazon Athena (for Amazon S3) Connector
Querying your data completely depends on how your configured your schema mapping, as not all data sources are created the same, you can theoretically query across any number and combination of the currently supported Entities within the Query Federated Search platform.
- IP Addresses (IPv4 and IPv6)
- Domains & Hostnames
- URLs & URIs
- Email Addresses
- Usernames & User IDs
- Email Addresses
- File Hashes (e.g., MD5, SHA1, SHA256, etc.)
- File Names or Directories
- Resource IDs (e.g., Agent or Device IDs, cloud resource IDs)
- Process Names
- MAC Addresses
Resources
Refer to the previous sections' hyperlinks for more information on specific resources, services, tools and concepts. For further help with creating tables and performance tuning, consider some of the resources below.
- Defining crawlers in AWS Glue
- CREATE TABLE
- SQL reference for Athena
- Working with views
- Performance tuning in Athena
- Top 10 Performance Tuning Tips for Amazon Athena
Troubleshooting Tips
- If you recently changed your permissions / Role in Query, log out and in again and clear your cache if you cannot Save or Test Connection.
- Verify that you were able to create the IAM Policy and Role and that they successfully attached.
- Ensure that Amazon Athena (for Amazon S3) does not have reported Issues and that there is data in your AWS Glue Tables from Athena.
- Ensure that your Amazon Athena, Amazon S3, AWS KMS and Amazon Athena (for Amazon S3) resources are all deployed in the same Region - despite Amazon S3 being a "global" resource - Athena and Athena do not respect this.
- Ensure that you entered the proper AWS Region for your Connector.
- Ensure that the Role Name you created matches the Role Name in your Connector
- Ensure that the AWS Account ID you entered in your Connector matches the AWS Account you have deployed Amazon Athena (for Amazon S3) to, the Query Federated Search Platform does not use Subscriber Access.
- Ensure that other details such as the Glue Database, Glue Table, and the Amazon Athena (for Amazon S3) S3 Bucket or Amazon Athena Results S3 Bucket names are correct and are aligned to the data source you're attempting to create.
- Verify that you gave the correct IAM Role Lake Formation
SELECT
andDESCRIBE
permissions to the correct Database(s), Table(s) and/or Views if you are using Lake Formation for your Athena deployments. - Verify that you are not using an AWS KMS CMK for either your Amazon Athena (for Amazon S3) S3 Bucket or Amazon Athena Results S3 Buckets, if you are, refer back to the previous sections to update your AWS IAM Policy and your KMS Key Policies.
- Verify that your Organization, Organizational Unit(s), and/or AWS Account does not have a Service Control Policy (SCP) blocking external principals from
sts:AssumeRole
or other IAM actions related to Athena, Glue, and/or S3.
If you have exhausted the above Troubleshooting list, please contact your designated Query Sales Engineer or Customer Success Manager. If you are using a free tenant please contact Query Customer Success via the Support email in the Help section or via Intercom within your tenant.
Updated 11 months ago