README.md

Istanbul Municipality Traffic Prediction System

A comprehensive system for collecting, analyzing, and predicting Istanbul's traffic data using machine learning with advanced logging and model persistence.

Architecture Overview

flowchart TD %% ===== TITLE ===== TITLE([Istanbul Traffic Prediction System]):::title %% ===== EXTERNAL ENTITIES ===== subgraph EXTERNAL[External Entities] direction TB EXT[🌐 Istanbul Traffic API]:::external USER[👥 End Users]:::external end %% ===== CORE SYSTEM ===== subgraph DOCKER[ Dockerized Services ] direction LR subgraph DOCKER1[🐳 FastAPI Container] API[FastAPI Backend
• REST API Endpoints
• Prediction Requests]:::container SCHEDULER[Scheduler Service
• Trigger Data Collection
• Initiate ML Training]:::container COLLECTOR[Data Collector
• ETL Pipeline
• Data Validation]:::container ML[ML Predictor
• Multi-Horizon Forecasting
• Model Inference]:::container end subgraph DOCKER2[💾 PostgreSQL Container] DB[(Traffic Database
• Historical Records
• Real-time Metrics)]:::database end end %% ===== MODEL STORAGE ===== subgraph MODELS[📦 Model Storage] M15[15-min Model]:::model M30[30-min Model]:::model M60[60-min Model]:::model M120[120-min Model]:::model end %% ===== DATA FLOW ===== USER -->|HTTP Requests
Prediction Queries| API EXT -->|Live Traffic Data
JSON/CSV| COLLECTOR COLLECTOR -->|Cleaned Data
Batch Insert| DB DB -->|Training Data| SCHEDULER SCHEDULER -->|Trigger ETL| COLLECTOR SCHEDULER -->|Start Training| ML ML -->|Save Trained Models| MODELS ML -->|Load Models| MODELS API -->|Request Predictions| ML API -->|Manual Triggers| SCHEDULER %% ===== STYLING ===== classDef title fill:#2c3e50,stroke:none,color:white,font-size:20px,font-weight:bold classDef external fill:#3498db,stroke:#2980b9,color:white,stroke-width:2px classDef container fill:#9b59b6,stroke:#8e44ad,color:white,stroke-width:2px classDef database fill:#27ae60,stroke:#2ecc71,color:white,stroke-width:2px classDef model fill:#e67e22,stroke:#d35400,color:white,stroke-width:2px linkStyle default stroke:#95a5a6,stroke-width:2px

Quick Start

Clone and setup: bash cd /home/utku/ibb-traffic-prediction
Start services: bash cd docker docker-compose up -d
Access API:
API: http://localhost:8000
API Docs: http://localhost:8000/docs

API Endpoints

Traffic Data

GET / - Root endpoint
GET /health - Health check
GET /traffic/latest?limit=10 - Get latest traffic data
GET /traffic/stats - Get traffic statistics

Predictions

GET /prediction - Get legacy single-point traffic prediction (next minute)
GET /predictions - Get multi-horizon traffic predictions (15, 30, 60, 120 minutes)

Prediction Response Examples

Legacy endpoint (/prediction):

{
  "prediction": 42,
  "timestamp": "2025-06-22T17:47:25.813456",
  "status": "prediction_available"
}

Multi-horizon endpoint (/predictions):

{
  "predictions": {
    "15": 41,
    "30": 39,
    "60": 36,
    "120": 34
  },
  "timestamp": "2025-06-22T17:47:18.235315",
  "status": "predictions_available",
  "training_status": {
    "15": true,
    "30": true,
    "60": true,
    "120": false
  }
}

Configuration

Configuration is managed in config.py:

TRAFFIC_API_URL: Istanbul Municipality API endpoint
DATA_FETCH_INTERVAL: Data collection interval (60 seconds)
ML_TRIGGER_THRESHOLD: ML training trigger (10 data points)
DATABASE_URL: PostgreSQL connection string

Database Schema

CREATE TABLE traffic_data (
    id SERIAL PRIMARY KEY,
    inserted_timestamp TIMESTAMP DEFAULT NOW(),
    ti INTEGER NOT NULL,
    ti_an INTEGER NOT NULL,
    ti_av INTEGER NOT NULL
);

Machine Learning

Multi-Horizon Prediction System

Algorithm: Separate Random Forest Regressor models for each time horizon
Horizons: 15, 30, 60, and 120-minute predictions
Features: Enhanced feature engineering with:
Lag features (1, 2, 3 minutes back) for TI, TI_AN, TI_AV
Time-of-day features (hour, minute)
Day-of-week patterns
Rolling averages and statistical measures

Training Requirements

15-minute horizon: Minimum 20 data points
30-minute horizon: Minimum 35 data points
60-minute horizon: Minimum 65 data points
120-minute horizon: Minimum 125 data points
Training Trigger: Every 10 new data points collected
Model Persistence: Automatic saving/loading with performance metrics

Monitoring

Health check endpoint for service monitoring
Comprehensive logging for debugging
Database connection health checks

Code Quality

Before pushing code, run the linting script:

# Auto-format and fix all linting issues
./lint.sh

Configuration Files

pyproject.toml: Project configuration with Ruff and pytest settings
requirements.txt: Production dependencies
lint.sh: Linting and code quality script

Code Style Guidelines

Line length: 100 characters maximum
Import organization: Automated with Ruff's isort functionality
Formatting: Handled by Ruff formatter
Linting: Comprehensive checks with Ruff

Traffic Prediction for IBB