Project Overview

This project applies advanced data analysis techniques to a dataset of exchanges, each characterized by over 30 data points, to identify key patterns and provide strategic insights. Due to the sensitive nature of the data, the following is a generalized description of the approach and methodologies employed.

Key Functionalities and Process Flow:

Data Preprocessing:

Initial cleaning of the dataset to correct numerical formats, ensuring data integrity for subsequent analysis.

Weighted Ratings Calculation:

Computation of nuanced weighted ratings across various platforms to reflect both user sentiment and engagement levels.

Advanced Clustering with K-Means:

Employment of K-Means clustering, with manual cluster definition to fulfill specific analysis goals.Clusters are optimized and sorted based on three-month average visitor metrics.Detailed insights are provided for each cluster, highlighting distinctive features and strategic implications.

Feature Importance Analysis:

Utilization of Random Forest Classifier to evaluate the significance of each feature in the clustering process.

Visualization:

Clusters and exchanges are illustrated through a treemap visualization.Feature importances are plotted to convey their impact visually.




Code Structure Overview: Four Primary Classes Defined


1 - DataCleaner

Ensures data integrity by transforming and cleaning DataFrame inputs.


2 - DataProcessor

Calculates weighted metrics, optimizing DataFrame structure.


3 - ClusteringHandler

Implements and refines data segmentation using KMeans clustering.


4 - Reporting

Visualizes data insights and evaluates cluster performance metrics.


Main Function

Outputs

Mean values for each cluster:


Visualizations


Feature Importances
 Histogram Output


Treemap of Exchanges (Exchange Names Censored)
 Treemap Output