Cold Data Archiving for Data Workflows
Frostbyte is a lightweight tool that compresses, versions, and manages your large data files (CSV, Parquet, Excel). Perfect for data scientists and analysts who need to save disk space while keeping data organized.
# From source
git clone https://github.com/utkuyucel/Frostbyte.git
cd Frostbyte
# Setup environment
python -m venv frostbyte_venv
source frostbyte_venv/bin/activate
# Install
pip install -e .
# Archive your file
fb archive dataset.csv
# Later, after changes, archive again
fb archive dataset.csv # Creates version 2
# List all versions of a specific file
fb ls dataset.csv
# Restore a specific version
fb restore dataset.csv -v 1
Command | Description | Example |
---|---|---|
fb init |
Setup Frostbyte in your project | fb init |
fb archive <file> |
Compress and store a file | fb archive data.csv |
fb ls [file_name] |
List archived files. Optionally specify a file name to see all its versions. | fb ls or fb ls my_data.csv |
fb stats [file] |
Show compression statistics | fb stats or fb stats data.csv |
fb restore <file> |
Restore a file from archive | fb restore data.csv or fb restore data.csv -v 2 |
fb purge <file> |
Remove archive versions | fb purge old_data.csv |
Frostbyte is an efficient cold storage solution for data scientists who need a lightweight way to version and manage their datasets. It offers compression, tracking, and restoration capabilities through a simple CLI interface, all without requiring any cloud dependencies.
Frostbyte tracks changes to your datasets over time, allowing you to maintain a history of modifications without relying on cloud services.
Automatically compresses datasets to minimize storage requirements while preserving data integrity.
Designed with data scientists in mind, the intuitive command-line interface makes versioning and retrieving data straightforward.
Operates entirely on local storage, eliminating the need for cloud services and associated costs.
Easily restore datasets to previous versions with a single command, allowing you to revert changes when needed.
Install Frostbyte using pip:
pip install frostbyte
# Initialize a new Frostbyte repository
frostbyte init
# Add a dataset to tracking
frostbyte add data/my_dataset.csv
# Commit changes with a message
frostbyte commit -m "Initial dataset version"
# List all versions
frostbyte log
# Restore a previous version
frostbyte restore --version v1
Data scientists often work with large datasets that undergo numerous transformations during analysis. Frostbyte addresses the common challenge of tracking these changes without the complexity of full-featured version control systems like Git (which aren't optimized for large data files) or the expense of cloud-based solutions. With Frostbyte, you can maintain a complete history of your datasets locally, making your data science workflow more reproducible and organized.