Istanbul Municipality Traffic Prediction System
A comprehensive system for collecting, analyzing, and predicting Istanbul's traffic data using machine learning with advanced logging and model persistence.
Architecture Overview
flowchart TD
%% ===== TITLE =====
TITLE([Istanbul Traffic Prediction System]):::title
%% ===== EXTERNAL ENTITIES =====
subgraph EXTERNAL[External Entities]
direction TB
EXT[🌐 Istanbul Traffic API]:::external
USER[👥 End Users]:::external
end
%% ===== CORE SYSTEM =====
subgraph DOCKER[ Dockerized Services ]
direction LR
subgraph DOCKER1[🐳 FastAPI Container]
API[FastAPI Backend
• REST API Endpoints
• Prediction Requests]:::container
SCHEDULER[Scheduler Service
• Trigger Data Collection
• Initiate ML Training]:::container
COLLECTOR[Data Collector
• ETL Pipeline
• Data Validation]:::container
ML[ML Predictor
• Multi-Horizon Forecasting
• Model Inference]:::container
end
subgraph DOCKER2[💾 PostgreSQL Container]
DB[(Traffic Database
• Historical Records
• Real-time Metrics)]:::database
end
end
%% ===== MODEL STORAGE =====
subgraph MODELS[📦 Model Storage]
M15[15-min Model]:::model
M30[30-min Model]:::model
M60[60-min Model]:::model
M120[120-min Model]:::model
end
%% ===== DATA FLOW =====
USER -->|HTTP Requests
Prediction Queries| API
EXT -->|Live Traffic Data
JSON/CSV| COLLECTOR
COLLECTOR -->|Cleaned Data
Batch Insert| DB
DB -->|Training Data| SCHEDULER
SCHEDULER -->|Trigger ETL| COLLECTOR
SCHEDULER -->|Start Training| ML
ML -->|Save Trained Models| MODELS
ML -->|Load Models| MODELS
API -->|Request Predictions| ML
API -->|Manual Triggers| SCHEDULER
%% ===== STYLING =====
classDef title fill:#2c3e50,stroke:none,color:white,font-size:20px,font-weight:bold
classDef external fill:#3498db,stroke:#2980b9,color:white,stroke-width:2px
classDef container fill:#9b59b6,stroke:#8e44ad,color:white,stroke-width:2px
classDef database fill:#27ae60,stroke:#2ecc71,color:white,stroke-width:2px
classDef model fill:#e67e22,stroke:#d35400,color:white,stroke-width:2px
linkStyle default stroke:#95a5a6,stroke-width:2px
Quick Start
-
Clone and setup:
bash
cd /home/utku/ibb-traffic-prediction
-
Start services:
bash
cd docker
docker-compose up -d
-
Access API:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
API Endpoints
Traffic Data
GET /
- Root endpoint
GET /health
- Health check
GET /traffic/latest?limit=10
- Get latest traffic data
GET /traffic/stats
- Get traffic statistics
Predictions
GET /prediction
- Get legacy single-point traffic prediction (next minute)
GET /predictions
- Get multi-horizon traffic predictions (15, 30, 60, 120 minutes)
Prediction Response Examples
Legacy endpoint (/prediction
):
{
"prediction": 42,
"timestamp": "2025-06-22T17:47:25.813456",
"status": "prediction_available"
}
Multi-horizon endpoint (/predictions
):
{
"predictions": {
"15": 41,
"30": 39,
"60": 36,
"120": 34
},
"timestamp": "2025-06-22T17:47:18.235315",
"status": "predictions_available",
"training_status": {
"15": true,
"30": true,
"60": true,
"120": false
}
}
## Configuration
Configuration is managed in `config.py`:
- `TRAFFIC_API_URL`: Istanbul Municipality API endpoint
- `DATA_FETCH_INTERVAL`: Data collection interval (60 seconds)
- `ML_TRIGGER_THRESHOLD`: ML training trigger (10 data points)
- `DATABASE_URL`: PostgreSQL connection string
## Database Schema
```sql
CREATE TABLE traffic_data (
id SERIAL PRIMARY KEY,
inserted_timestamp TIMESTAMP DEFAULT NOW(),
ti INTEGER NOT NULL,
ti_an INTEGER NOT NULL,
ti_av INTEGER NOT NULL
);
Machine Learning
Multi-Horizon Prediction System
- Algorithm: Separate Random Forest Regressor models for each time horizon
- Horizons: 15, 30, 60, and 120-minute predictions
- Features: Enhanced feature engineering with:
- Lag features (1, 2, 3 minutes back) for TI, TI_AN, TI_AV
- Time-of-day features (hour, minute)
- Day-of-week patterns
- Rolling averages and statistical measures
Training Requirements
- 15-minute horizon: Minimum 20 data points
- 30-minute horizon: Minimum 35 data points
- 60-minute horizon: Minimum 65 data points
- 120-minute horizon: Minimum 125 data points
- Training Trigger: Every 10 new data points collected
- Model Persistence: Automatic saving/loading with performance metrics
Monitoring
- Health check endpoint for service monitoring
- Comprehensive logging for debugging
- Database connection health checks
Code Quality
Before pushing code, run the linting script:
# Auto-format and fix all linting issues
./lint.sh
Configuration Files
- pyproject.toml: Project configuration with Ruff and pytest settings
- requirements.txt: Production dependencies
- lint.sh: Linting and code quality script
Code Style Guidelines
- Line length: 100 characters maximum
- Import organization: Automated with Ruff's isort functionality
- Formatting: Handled by Ruff formatter
- Linting: Comprehensive checks with Ruff