System Architecture

Deep dive into TaskFlow's distributed, event-driven architecture

Overview

TaskFlow implements a Producer-Consumer pattern with event-driven autoscaling, fault tolerance, and dual-priority task queues. The system is built on Kubernetes with KEDA for dynamic scaling based on queue depth.

System Architecture Flow

graph TD %% --- STYLING --- classDef k8s fill:#326ce5,stroke:#fff,stroke-width:2px,color:#fff; classDef db fill:#ffffff,stroke:#336791,stroke-width:2px,color:#333; classDef redis fill:#ffffff,stroke:#d82c20,stroke-width:2px,color:#333; classDef component fill:#e1f5fe,stroke:#333,stroke-width:1px; classDef plain fill:#fff,stroke:#333,stroke-width:1px; subgraph "Cluster: TaskFlow Namespace" direction TB %% --- AUTOSCALING --- subgraph "Autoscaling Logic" %% Using Kubernetes Logo for KEDA KEDA("
KEDA Operator"):::plain end %% --- DATA LAYER --- subgraph "Data Layer" %% Redis Logo RedisHigh("
Redis (High)"):::redis %% Redis Logo RedisLow("
Redis (Low)"):::redis %% Postgres Logo Postgres("
PostgreSQL"):::db %% PgBouncer PgBouncer("
PgBouncer"):::db end %% --- APP LAYER --- subgraph "Application Layer" API[("TaskFlow API
(2 Replicas)")]:::k8s QueueManager[("Queue Manager
(1 Replica)")]:::k8s Worker[("Worker
(1-20 Replicas)")]:::k8s end %% --- CONNECTIONS --- %% User -> API User((👤 User)) --> |"POST /tasks"| API %% API -> Redis Low API -.-> |"Cache / Rate Limit"| RedisLow %% API -> Redis API --> |"PUSH task"| PgBouncer %% Queue Manager QueueManager -->|Tasks QUEUED| PgBouncer QueueManager -->|PUSH Task| RedisHigh %% KEDA -> Redis (Monitoring) KEDA -.-> |"Checks Queue Depth"| RedisHigh %% KEDA -> Worker (Scaling) KEDA -.-> |"Scales Deployment"| Worker %% Worker -> Redis (Fetching) Worker --> |"POP Task"| RedisHigh %% Worker/API -> PgBouncer Worker --> |"Connection Pool"| PgBouncer API --> |"GET /status"| PgBouncer %% PgBouncer -> Postgres PgBouncer --> |"Real Connection"| Postgres end %% Link Styling linkStyle 0,2,3,4 stroke-width:2px,fill:none,stroke:green linkStyle 1,5,6 stroke-width:2px,fill:none,stroke:orange,stroke-dasharray:5 5 linkStyle 7 stroke-width:2px,fill:none,stroke:#2563eb

Core Components

🌐

API Gateway

FastAPI Python 3.11
  • Authentication: JWT token validation
  • Rate Limiting: Per-user limits via Redis
  • Task Submission: Validates & pushes to Redis
  • Status Queries: Returns results from DB
k8s/apps/api.yaml • 2 replicas
Redis Logo

Redis Cluster

Redis 7 Dual-Priority
  • redis-high: Queue for critical tasks
  • redis-low: Caching, Rate Limits, Bulk tasks
  • Operations: RPUSH, BRPOPLPUSH, LLEN
  • Leader Election: SET NX PX for locking
k8s/infrastructure/redis.yaml
Postgres Logo

PostgreSQL + PgBouncer

PostgreSQL 15 PgBouncer
  • PgBouncer: Connection pooling (port 6432)
  • Purpose: Multiplexes worker connections
  • Tables: Users, Tasks, TaskEvents, ApiKeys
  • Storage: 1Gi Persistent Volume
k8s/infrastructure/postgres.yaml
🔧

Worker Pool

Python 3.11 AsyncIO
  • Scaling: 1-20 replicas via KEDA
  • Atomic Pop: BRPOPLPUSH to processing queue
  • Execution: CPU-intensive business logic
  • Heartbeat: Periodic health signals to Redis
k8s/apps/worker.yaml
Kubernetes Logo

KEDA Autoscaler

KEDA 2.10+ Event-Driven
  • Trigger: Redis queue length (LLEN)
  • Formula: replicas = ceil(queue_length / 10)
  • Polling: Every 5 seconds
  • Cooldown: 30 seconds before scale-down
k8s/autoscaling/worker-scaledobject.yaml
🎯

Queue Manager

Python Multi-threaded
  • Leader Election: Redis-based distributed lock
  • Scheduler: scheduled_at → Redis (5s interval)
  • PEL Scanner: Dead worker recovery (10s)
  • Reconciliation: QUEUED tasks sync (30s)
k8s/apps/queue-manager.yaml

Fault Tolerance

Worker Crash

Detection: Missing heartbeat (10s)
Recovery: PEL Scanner re-queues task

Redis Crash

Detection: Connection error
Recovery: Queued Reconciliation re-pushes

Database Crash

Detection: Connection pool timeout
Recovery: Workers retry on next task

Queue Manager Crash

Detection: Lease expiry (10s)
Recovery: New leader elected via lock