Back to Projects

Threat Signal Pipeline

Serverless pipeline for ingesting and correlating threat intelligence indicators (IOCs) using GCP Cloud Run and BigQuery.

GCP Cloud RunBigQueryPythonPub/SubTerraform

Problem

We needed a way to aggregate threat intelligence from multiple open-source feeds and correlate them with our internal logs without paying for an expensive commercial TIP (Threat Intelligence Platform).

Approach

I designed a cloud-native architecture on Google Cloud Platform:

  1. Ingestion: Cloud Scheduler triggers Cloud Run functions to fetch data from feeds (AlienVault OTX, Abuse.ch).
  2. Normalization: Python scripts normalize data into a standard JSON schema.
  3. Streaming: Data is published to Pub/Sub.
  4. Storage: A Dataflow job (or simple subscriber) writes to BigQuery.
  5. Analysis: Scheduled SQL queries in BigQuery match IOCs against VPC Flow Logs.

Tools

  • GCP Cloud Run: Serverless compute for ingestion scripts.
  • Pub/Sub: Decoupling ingestion from processing.
  • BigQuery: Data warehousing and high-speed querying.
  • Terraform: Infrastructure as Code (IaC).

Output & Impact

  • Processed 50k+ IOCs daily for under $5/month.
  • Detected a compromised host communicating with a known C2 server within 2 hours of infection.

What I Learned

  • Serverless architectures are extremely cost-effective for bursty workloads.
  • Schema design in BigQuery is critical for query performance and cost control.