PagerDuty Destination Setup

Send detection alerts to PagerDuty services for on-call notification and incident management using Events API v2.

PagerDuty Destination Setup

Send detection alerts to PagerDuty services for on-call notification and incident management using Events API v2.

Prerequisites

  • PagerDuty account with admin access
  • Service created or selected for alerts

Setup Steps

1. Create or Select Service

  1. Log into PagerDuty
  2. Navigate to ServicesService Directory
  3. Either:
    • Select an existing service, or
    • Click New Service to create one

2. Configure Service (if creating new)

Name: e.g., "Security Detections"

Escalation Policy: Select appropriate policy for your team

Alert Grouping: Recommended settings:

  • Intelligent: ML-based grouping (recommended)
  • Time-based: Group alerts within time window
  • Content-based: Group by custom fields

Incident Settings: Configure as needed for your workflow

3. Add Events API v2 Integration

  1. Open your service
  2. Go to Integrations tab
  3. Click Add an integration
  4. Select Events API V2
  5. Click Add
  6. Copy the Integration Key (also called routing key)

Format: 32-character hexadecimal string (e.g., R0123456789ABCDEF0123456789ABCDEF)

Important: Save this key securely. You'll need it for configuration.

4. Configure in Query.ai

Contact your Query.ai administrator to configure the PagerDuty destination with:

Required Configuration:

  • Integration Key (routing key) - stored securely

Optional Configuration:

  • Severity override
  • Component name
  • Group name
  • Class name
  • Custom details (JSON object)
  • Timeout in seconds (default: 30)

Alert Details

PagerDuty alerts include:

Summary

Format: [SEVERITY] Detection Name - Replay Link

Example: [HIGH] Suspicious Login Attempts - https://app.query.ai/replay/123

Severity

Detection severity automatically maps to PagerDuty severity:

Detection SeverityPagerDuty Severity
CRITICALcritical
HIGHerror
MEDIUMwarning
LOWinfo

Override with severity configuration option to use fixed severity for all alerts.

Source

Set to the replay link URL for quick access to investigation.

Custom Details

All detection fields included as custom details:

Detection Metadata:

  • detection_id - Detection configuration ID
  • detection_name - Detection name
  • severity - Detection severity
  • description - Detection description (if present)

Execution Details:

  • outcome - MATCHED or ERROR
  • match_count - Number of matches found
  • run_type - SCHEDULED or MANUAL
  • run_id - Unique execution run ID

Threshold Configuration:

  • match_operator - Comparison operator (e.g., GREATER_THAN)
  • match_threshold - Threshold value
  • match_eagerness - EAGER or EXHAUSTIVE

Execution Metadata (if available):

  • match_exhaustiveness - COMPLETED or STOPPED_EARLY
  • search_id - FSQL API search identifier
  • trace_id - AWS X-Ray trace identifier

Timestamps:

  • ran_at - When detection executed
  • range_start - Query time range start
  • range_end - Query time range end

Additional:

  • replay_link - Link to Query.ai replay
  • errors - Array of error objects (if any, limited to 10)

Deduplication

Incidents are deduplicated using SHA256 hash of detection name:

  • Same detection updates existing incident
  • Different detections create separate incidents
  • Prevents duplicate pages for same detection

Testing

Test Integration Key

Test your integration key with curl:

curl -X POST https://events.pagerduty.com/v2/enqueue \
  -H "Content-Type: application/json" \
  -H "Accept: application/vnd.pagerduty+json;version=2" \
  -d '{
    "routing_key": "R0123456789ABCDEF0123456789ABCDEF",
    "event_action": "trigger",
    "payload": {
      "summary": "Test alert from Security Detections",
      "source": "test",
      "severity": "info",
      "custom_details": {
        "test": "This is a test alert"
      }
    }
  }'

Expected Response:

{
  "status": "success",
  "message": "Event processed",
  "dedup_key": "..."
}

Test with Detection

  1. Create a test detection with low threshold
  2. Add PagerDuty destination
  3. Click Run Now
  4. Check PagerDuty service for incident

Troubleshooting

Common Issues

ErrorCauseSolution
400 Bad RequestInvalid payloadVerify routing key format, check required fields
401 UnauthorizedInvalid routing keyVerify integration key is correct
429 Too Many RequestsRate limit exceededAlerts are queued, will retry automatically
Incidents not appearingMultiple issuesVerify: service active, integration enabled, routing key correct
Wrong escalation policyService misconfiguredUpdate service escalation policy

Verify Integration

  1. Log into PagerDuty
  2. Navigate to your service
  3. Go to Integrations tab
  4. Verify Events API V2 integration is present and enabled
  5. Check integration key matches configuration

Check Service Status

  1. Navigate to ServicesService Directory
  2. Find your service
  3. Verify status is Active (not disabled)
  4. Check escalation policy is configured

View Logs

Contact your Query.ai administrator to review CloudWatch logs:

aws logs tail /aws/lambda/detection-outcome-handler --follow

Look for PagerDuty-related errors in the logs.

Multiple Services

Create separate destinations for different services or teams:

Example Use Cases:

  • Critical alerts → On-call engineering service
  • Security alerts → Security operations service
  • Compliance alerts → Compliance team service

Each destination uses a different integration key from different services.

Configuration Options

Required

routing_key (secret)

  • Integration Key from PagerDuty service
  • 32-character hexadecimal string
  • Stored securely in AWS Secrets Manager

Optional

severity

  • Override automatic severity mapping
  • Values: critical, error, warning, info
  • Use to force all alerts to specific severity
  • Example: Force all to critical for high-priority service

component

  • Component of your system that is broken
  • Example: "authentication-service", "network-gateway"
  • Helps identify affected system area

group

  • Logical grouping of components
  • Example: "security", "infrastructure", "applications"
  • Helps organize incidents

class

  • Class or type of the event
  • Example: "detection-alert", "security-event"
  • Helps categorize incidents

custom_details

  • Additional custom details (JSON object)
  • Merged with default detection fields
  • Example: {"environment": "production", "team": "security"}

timeout

  • Request timeout in seconds
  • Default: 30
  • Increase if experiencing timeouts

Advanced Configuration

Custom Details Example

Add environment and runbook information:

{
  "routing_key": {"is_secret": true},
  "custom_details": {
    "value": {
      "environment": "production",
      "team": "security-operations",
      "runbook": "https://wiki.example.com/runbooks/suspicious-logins"
    }
  }
}

Component and Group

Organize alerts by system component:

{
  "routing_key": {"is_secret": true},
  "component": {"value": "authentication-service"},
  "group": {"value": "identity-platform"}
}

Fixed Severity

Force all alerts to critical severity:

{
  "routing_key": {"is_secret": true},
  "severity": {"value": "critical"}
}

Alert Grouping

Configure alert grouping in PagerDuty service settings:

Intelligent Grouping (Recommended)

Uses machine learning to group related alerts:

  • Automatically identifies patterns
  • Groups similar incidents
  • Reduces alert noise

Time-Based Grouping

Groups alerts within time window:

  • Configure window duration (e.g., 5 minutes)
  • All alerts in window grouped together
  • Simple and predictable

Content-Based Grouping

Groups by custom fields:

  • Configure grouping fields
  • Alerts with same field values grouped
  • Useful for detection-specific grouping

Note: Since dedup_key is based on detection name, the same detection updates the same incident rather than creating duplicates.

Escalation Policies

Configure escalation policies for different alert types:

Example: Security Operations

Level 1: On-call security analyst (immediate) Level 2: Security team lead (after 15 minutes) Level 3: Security manager (after 30 minutes)

Example: Critical Infrastructure

Level 1: On-call engineer (immediate) Level 2: Engineering manager (after 10 minutes) Level 3: VP Engineering (after 20 minutes)

Configure in PagerDuty: PeopleEscalation Policies

Response Plays

Create response plays for common detection types:

  1. Navigate to Incident WorkflowsResponse Plays

  2. Click New Response Play

  3. Configure:

    • Name: e.g., "Security Detection Response"
    • Steps: Investigation checklist
    • Stakeholders: Who to notify
    • Conference bridge: For coordination
  4. Link to service or configure auto-run rules

Integration with Query.ai

Incident Investigation

When PagerDuty incident is created:

  1. Open incident in PagerDuty
  2. Click Source link (replay link)
  3. Opens Query.ai with detection results
  4. Investigate matching events
  5. Document findings in PagerDuty incident

Incident Updates

Same detection triggering multiple times:

  • Updates existing incident (via dedup_key)
  • Adds note with new match count
  • Does not create duplicate incidents

Incident Resolution

When detection stops matching:

  • Incident remains open (manual resolution required)
  • Review incident details
  • Resolve when investigation complete

Security Best Practices

  1. Never Commit Keys: Always store routing keys in Secrets Manager
  2. Separate Services: Use different services for different severity levels
  3. Rotate Keys: Rotate integration keys every 90 days
  4. Monitor Usage: Review PagerDuty analytics for alert patterns
  5. Configure Escalations: Ensure appropriate escalation policies
  6. Test Regularly: Test integrations with manual detection runs
  7. Document Runbooks: Link runbooks in custom details

Key Rotation

To rotate integration key:

  1. In PagerDuty, navigate to your service
  2. Go to Integrations tab
  3. Click on Events API V2 integration
  4. Click Regenerate Key
  5. Copy new integration key
  6. Update key in Query.ai configuration
  7. Test with a manual detection run
  8. Old key stops working immediately

Metrics and Analytics

Monitor PagerDuty integration effectiveness:

Key Metrics

Incident Volume:

  • Track incidents created per day/week
  • Identify trends and patterns

Response Time:

  • Time to acknowledge
  • Time to resolve
  • Compare across detection types

Escalation Rate:

  • Percentage of incidents escalated
  • Indicates if Level 1 can handle alerts

Resolution Time:

  • Average time to resolve
  • Identify detections requiring tuning

PagerDuty Analytics

Access in PagerDuty:

  1. Navigate to AnalyticsIncidents
  2. Filter by service
  3. Review metrics and trends
  4. Export data for reporting

Resources