Traffic Prediction for IBB

View the Project on Github
README.md

Istanbul Municipality Traffic Prediction System

A comprehensive system for collecting, analyzing, and predicting Istanbul's traffic data using machine learning with advanced logging and model persistence.

Architecture Overview

flowchart TD %% ===== TITLE ===== TITLE([Istanbul Traffic Prediction System]):::title %% ===== EXTERNAL ENTITIES ===== subgraph EXTERNAL[External Entities] direction TB EXT[🌐 Istanbul Traffic API]:::external USER[👥 End Users]:::external end %% ===== CORE SYSTEM ===== subgraph DOCKER[ Dockerized Services ] direction LR subgraph DOCKER1[🐳 FastAPI Container] API[FastAPI Backend
• REST API Endpoints
• Prediction Requests]:::container SCHEDULER[Scheduler Service
• Trigger Data Collection
• Initiate ML Training]:::container COLLECTOR[Data Collector
• ETL Pipeline
• Data Validation]:::container ML[ML Predictor
• Multi-Horizon Forecasting
• Model Inference]:::container end subgraph DOCKER2[💾 PostgreSQL Container] DB[(Traffic Database
• Historical Records
• Real-time Metrics)]:::database end end %% ===== MODEL STORAGE ===== subgraph MODELS[📦 Model Storage] M15[15-min Model]:::model M30[30-min Model]:::model M60[60-min Model]:::model M120[120-min Model]:::model end %% ===== DATA FLOW ===== USER -->|HTTP Requests
Prediction Queries| API EXT -->|Live Traffic Data
JSON/CSV| COLLECTOR COLLECTOR -->|Cleaned Data
Batch Insert| DB DB -->|Training Data| SCHEDULER SCHEDULER -->|Trigger ETL| COLLECTOR SCHEDULER -->|Start Training| ML ML -->|Save Trained Models| MODELS ML -->|Load Models| MODELS API -->|Request Predictions| ML API -->|Manual Triggers| SCHEDULER %% ===== STYLING ===== classDef title fill:#2c3e50,stroke:none,color:white,font-size:20px,font-weight:bold classDef external fill:#3498db,stroke:#2980b9,color:white,stroke-width:2px classDef container fill:#9b59b6,stroke:#8e44ad,color:white,stroke-width:2px classDef database fill:#27ae60,stroke:#2ecc71,color:white,stroke-width:2px classDef model fill:#e67e22,stroke:#d35400,color:white,stroke-width:2px linkStyle default stroke:#95a5a6,stroke-width:2px

Quick Start

  1. Clone and setup: bash cd /home/utku/ibb-traffic-prediction

  2. Start services: bash cd docker docker-compose up -d

  3. Access API:

  4. API: http://localhost:8000
  5. API Docs: http://localhost:8000/docs

API Endpoints

Traffic Data

  • GET / - Root endpoint
  • GET /health - Health check
  • GET /traffic/latest?limit=10 - Get latest traffic data
  • GET /traffic/stats - Get traffic statistics

Predictions

  • GET /prediction - Get legacy single-point traffic prediction (next minute)
  • GET /predictions - Get multi-horizon traffic predictions (15, 30, 60, 120 minutes)

Prediction Response Examples

Legacy endpoint (/prediction):

{
  "prediction": 42,
  "timestamp": "2025-06-22T17:47:25.813456",
  "status": "prediction_available"
}

Multi-horizon endpoint (/predictions):

{
  "predictions": {
    "15": 41,
    "30": 39,
    "60": 36,
    "120": 34
  },
  "timestamp": "2025-06-22T17:47:18.235315",
  "status": "predictions_available",
  "training_status": {
    "15": true,
    "30": true,
    "60": true,
    "120": false
  }
}

## Configuration

Configuration is managed in `config.py`:

- `TRAFFIC_API_URL`: Istanbul Municipality API endpoint
- `DATA_FETCH_INTERVAL`: Data collection interval (60 seconds)
- `ML_TRIGGER_THRESHOLD`: ML training trigger (10 data points)
- `DATABASE_URL`: PostgreSQL connection string

## Database Schema

```sql
CREATE TABLE traffic_data (
    id SERIAL PRIMARY KEY,
    inserted_timestamp TIMESTAMP DEFAULT NOW(),
    ti INTEGER NOT NULL,
    ti_an INTEGER NOT NULL,
    ti_av INTEGER NOT NULL
);

Machine Learning

Multi-Horizon Prediction System

  • Algorithm: Separate Random Forest Regressor models for each time horizon
  • Horizons: 15, 30, 60, and 120-minute predictions
  • Features: Enhanced feature engineering with:
  • Lag features (1, 2, 3 minutes back) for TI, TI_AN, TI_AV
  • Time-of-day features (hour, minute)
  • Day-of-week patterns
  • Rolling averages and statistical measures

Training Requirements

  • 15-minute horizon: Minimum 20 data points
  • 30-minute horizon: Minimum 35 data points
  • 60-minute horizon: Minimum 65 data points
  • 120-minute horizon: Minimum 125 data points
  • Training Trigger: Every 10 new data points collected
  • Model Persistence: Automatic saving/loading with performance metrics

Monitoring

  • Health check endpoint for service monitoring
  • Comprehensive logging for debugging
  • Database connection health checks

Code Quality

Before pushing code, run the linting script:

# Auto-format and fix all linting issues
./lint.sh

Configuration Files

  • pyproject.toml: Project configuration with Ruff and pytest settings
  • requirements.txt: Production dependencies
  • lint.sh: Linting and code quality script

Code Style Guidelines

  • Line length: 100 characters maximum
  • Import organization: Automated with Ruff's isort functionality
  • Formatting: Handled by Ruff formatter
  • Linting: Comprehensive checks with Ruff