Threat Signal Pipeline
Serverless pipeline for ingesting and correlating threat intelligence indicators (IOCs) using GCP Cloud Run and BigQuery.
GCP Cloud RunBigQueryPythonPub/SubTerraform
Problem
We needed a way to aggregate threat intelligence from multiple open-source feeds and correlate them with our internal logs without paying for an expensive commercial TIP (Threat Intelligence Platform).
Approach
I designed a cloud-native architecture on Google Cloud Platform:
- Ingestion: Cloud Scheduler triggers Cloud Run functions to fetch data from feeds (AlienVault OTX, Abuse.ch).
- Normalization: Python scripts normalize data into a standard JSON schema.
- Streaming: Data is published to Pub/Sub.
- Storage: A Dataflow job (or simple subscriber) writes to BigQuery.
- Analysis: Scheduled SQL queries in BigQuery match IOCs against VPC Flow Logs.
Tools
- GCP Cloud Run: Serverless compute for ingestion scripts.
- Pub/Sub: Decoupling ingestion from processing.
- BigQuery: Data warehousing and high-speed querying.
- Terraform: Infrastructure as Code (IaC).
Output & Impact
- Processed 50k+ IOCs daily for under $5/month.
- Detected a compromised host communicating with a known C2 server within 2 hours of infection.
What I Learned
- Serverless architectures are extremely cost-effective for bursty workloads.
- Schema design in BigQuery is critical for query performance and cost control.