PagerDuty Destination Setup
Send detection alerts to PagerDuty services for on-call notification and incident management using Events API v2.
PagerDuty Destination Setup
Send detection alerts to PagerDuty services for on-call notification and incident management using Events API v2.
Prerequisites
- PagerDuty account with admin access
- Service created or selected for alerts
Setup Steps
1. Create or Select Service
- Log into PagerDuty
- Navigate to Services → Service Directory
- Either:
- Select an existing service, or
- Click New Service to create one
2. Configure Service (if creating new)
Name: e.g., "Security Detections"
Escalation Policy: Select appropriate policy for your team
Alert Grouping: Recommended settings:
- Intelligent: ML-based grouping (recommended)
- Time-based: Group alerts within time window
- Content-based: Group by custom fields
Incident Settings: Configure as needed for your workflow
3. Add Events API v2 Integration
- Open your service
- Go to Integrations tab
- Click Add an integration
- Select Events API V2
- Click Add
- Copy the Integration Key (also called routing key)
Format: 32-character hexadecimal string (e.g., R0123456789ABCDEF0123456789ABCDEF)
Important: Save this key securely. You'll need it for configuration.
4. Configure in Query.ai
Contact your Query.ai administrator to configure the PagerDuty destination with:
Required Configuration:
- Integration Key (routing key) - stored securely
Optional Configuration:
- Severity override
- Component name
- Group name
- Class name
- Custom details (JSON object)
- Timeout in seconds (default: 30)
Alert Details
PagerDuty alerts include:
Summary
Format: [SEVERITY] Detection Name - Replay Link
Example: [HIGH] Suspicious Login Attempts - https://app.query.ai/replay/123
Severity
Detection severity automatically maps to PagerDuty severity:
| Detection Severity | PagerDuty Severity |
|---|---|
| CRITICAL | critical |
| HIGH | error |
| MEDIUM | warning |
| LOW | info |
Override with severity configuration option to use fixed severity for all alerts.
Source
Set to the replay link URL for quick access to investigation.
Custom Details
All detection fields included as custom details:
Detection Metadata:
detection_id- Detection configuration IDdetection_name- Detection nameseverity- Detection severitydescription- Detection description (if present)
Execution Details:
outcome- MATCHED or ERRORmatch_count- Number of matches foundrun_type- SCHEDULED or MANUALrun_id- Unique execution run ID
Threshold Configuration:
match_operator- Comparison operator (e.g., GREATER_THAN)match_threshold- Threshold valuematch_eagerness- EAGER or EXHAUSTIVE
Execution Metadata (if available):
match_exhaustiveness- COMPLETED or STOPPED_EARLYsearch_id- FSQL API search identifiertrace_id- AWS X-Ray trace identifier
Timestamps:
ran_at- When detection executedrange_start- Query time range startrange_end- Query time range end
Additional:
replay_link- Link to Query.ai replayerrors- Array of error objects (if any, limited to 10)
Deduplication
Incidents are deduplicated using SHA256 hash of detection name:
- Same detection updates existing incident
- Different detections create separate incidents
- Prevents duplicate pages for same detection
Testing
Test Integration Key
Test your integration key with curl:
curl -X POST https://events.pagerduty.com/v2/enqueue \
-H "Content-Type: application/json" \
-H "Accept: application/vnd.pagerduty+json;version=2" \
-d '{
"routing_key": "R0123456789ABCDEF0123456789ABCDEF",
"event_action": "trigger",
"payload": {
"summary": "Test alert from Security Detections",
"source": "test",
"severity": "info",
"custom_details": {
"test": "This is a test alert"
}
}
}'Expected Response:
{
"status": "success",
"message": "Event processed",
"dedup_key": "..."
}Test with Detection
- Create a test detection with low threshold
- Add PagerDuty destination
- Click Run Now
- Check PagerDuty service for incident
Troubleshooting
Common Issues
| Error | Cause | Solution |
|---|---|---|
400 Bad Request | Invalid payload | Verify routing key format, check required fields |
401 Unauthorized | Invalid routing key | Verify integration key is correct |
429 Too Many Requests | Rate limit exceeded | Alerts are queued, will retry automatically |
| Incidents not appearing | Multiple issues | Verify: service active, integration enabled, routing key correct |
| Wrong escalation policy | Service misconfigured | Update service escalation policy |
Verify Integration
- Log into PagerDuty
- Navigate to your service
- Go to Integrations tab
- Verify Events API V2 integration is present and enabled
- Check integration key matches configuration
Check Service Status
- Navigate to Services → Service Directory
- Find your service
- Verify status is Active (not disabled)
- Check escalation policy is configured
View Logs
Contact your Query.ai administrator to review CloudWatch logs:
aws logs tail /aws/lambda/detection-outcome-handler --followLook for PagerDuty-related errors in the logs.
Multiple Services
Create separate destinations for different services or teams:
Example Use Cases:
- Critical alerts → On-call engineering service
- Security alerts → Security operations service
- Compliance alerts → Compliance team service
Each destination uses a different integration key from different services.
Configuration Options
Required
routing_key (secret)
- Integration Key from PagerDuty service
- 32-character hexadecimal string
- Stored securely in AWS Secrets Manager
Optional
severity
- Override automatic severity mapping
- Values:
critical,error,warning,info - Use to force all alerts to specific severity
- Example: Force all to
criticalfor high-priority service
component
- Component of your system that is broken
- Example: "authentication-service", "network-gateway"
- Helps identify affected system area
group
- Logical grouping of components
- Example: "security", "infrastructure", "applications"
- Helps organize incidents
class
- Class or type of the event
- Example: "detection-alert", "security-event"
- Helps categorize incidents
custom_details
- Additional custom details (JSON object)
- Merged with default detection fields
- Example:
{"environment": "production", "team": "security"}
timeout
- Request timeout in seconds
- Default: 30
- Increase if experiencing timeouts
Advanced Configuration
Custom Details Example
Add environment and runbook information:
{
"routing_key": {"is_secret": true},
"custom_details": {
"value": {
"environment": "production",
"team": "security-operations",
"runbook": "https://wiki.example.com/runbooks/suspicious-logins"
}
}
}Component and Group
Organize alerts by system component:
{
"routing_key": {"is_secret": true},
"component": {"value": "authentication-service"},
"group": {"value": "identity-platform"}
}Fixed Severity
Force all alerts to critical severity:
{
"routing_key": {"is_secret": true},
"severity": {"value": "critical"}
}Alert Grouping
Configure alert grouping in PagerDuty service settings:
Intelligent Grouping (Recommended)
Uses machine learning to group related alerts:
- Automatically identifies patterns
- Groups similar incidents
- Reduces alert noise
Time-Based Grouping
Groups alerts within time window:
- Configure window duration (e.g., 5 minutes)
- All alerts in window grouped together
- Simple and predictable
Content-Based Grouping
Groups by custom fields:
- Configure grouping fields
- Alerts with same field values grouped
- Useful for detection-specific grouping
Note: Since dedup_key is based on detection name, the same detection updates the same incident rather than creating duplicates.
Escalation Policies
Configure escalation policies for different alert types:
Example: Security Operations
Level 1: On-call security analyst (immediate) Level 2: Security team lead (after 15 minutes) Level 3: Security manager (after 30 minutes)
Example: Critical Infrastructure
Level 1: On-call engineer (immediate) Level 2: Engineering manager (after 10 minutes) Level 3: VP Engineering (after 20 minutes)
Configure in PagerDuty: People → Escalation Policies
Response Plays
Create response plays for common detection types:
-
Navigate to Incident Workflows → Response Plays
-
Click New Response Play
-
Configure:
- Name: e.g., "Security Detection Response"
- Steps: Investigation checklist
- Stakeholders: Who to notify
- Conference bridge: For coordination
-
Link to service or configure auto-run rules
Integration with Query.ai
Incident Investigation
When PagerDuty incident is created:
- Open incident in PagerDuty
- Click Source link (replay link)
- Opens Query.ai with detection results
- Investigate matching events
- Document findings in PagerDuty incident
Incident Updates
Same detection triggering multiple times:
- Updates existing incident (via dedup_key)
- Adds note with new match count
- Does not create duplicate incidents
Incident Resolution
When detection stops matching:
- Incident remains open (manual resolution required)
- Review incident details
- Resolve when investigation complete
Security Best Practices
- Never Commit Keys: Always store routing keys in Secrets Manager
- Separate Services: Use different services for different severity levels
- Rotate Keys: Rotate integration keys every 90 days
- Monitor Usage: Review PagerDuty analytics for alert patterns
- Configure Escalations: Ensure appropriate escalation policies
- Test Regularly: Test integrations with manual detection runs
- Document Runbooks: Link runbooks in custom details
Key Rotation
To rotate integration key:
- In PagerDuty, navigate to your service
- Go to Integrations tab
- Click on Events API V2 integration
- Click Regenerate Key
- Copy new integration key
- Update key in Query.ai configuration
- Test with a manual detection run
- Old key stops working immediately
Metrics and Analytics
Monitor PagerDuty integration effectiveness:
Key Metrics
Incident Volume:
- Track incidents created per day/week
- Identify trends and patterns
Response Time:
- Time to acknowledge
- Time to resolve
- Compare across detection types
Escalation Rate:
- Percentage of incidents escalated
- Indicates if Level 1 can handle alerts
Resolution Time:
- Average time to resolve
- Identify detections requiring tuning
PagerDuty Analytics
Access in PagerDuty:
- Navigate to Analytics → Incidents
- Filter by service
- Review metrics and trends
- Export data for reporting
Resources
Updated 2 days ago