Frostbyte - A lightweight data versioning tool

View the Project on Github
README.md

Frostbyte

Frostbyte Logo

Cold Data Archiving for Data Workflows

Frostbyte is a lightweight tool that compresses, versions, and manages your large data files (CSV, Parquet, Excel). Perfect for data scientists and analysts who need to save disk space while keeping data organized.

Version Control

flowchart LR V1["Version 1"] -->|"fb archive"| V2["Version 2"] V2 -->|"fb archive"| V3["Version 3"] V3 -.->|"fb restore -v 1"| V1

Features

  • Space-Saving Compression: Reduce storage needs for large datasets
  • Simple Versioning: Track changes in your data files
  • Easy Commands: Intuitive CLI with short aliases
  • Local First: No cloud dependencies, works completely offline

Quick Installation

# From source
git clone https://github.com/utkuyucel/Frostbyte.git
cd Frostbyte

# Setup environment
python -m venv frostbyte_venv
source frostbyte_venv/bin/activate

# Install
pip install -e .

Work with Multiple Versions

# Archive your file
fb archive dataset.csv

# Later, after changes, archive again
fb archive dataset.csv  # Creates version 2

# List all versions of a specific file
fb ls dataset.csv

# Restore a specific version
fb restore dataset.csv -v 1

Command Reference

Command Description Example
fb init Setup Frostbyte in your project fb init
fb archive <file> Compress and store a file fb archive data.csv
fb ls [file_name] List archived files. Optionally specify a file name to see all its versions. fb ls or fb ls my_data.csv
fb stats [file] Show compression statistics fb stats or fb stats data.csv
fb restore <file> Restore a file from archive fb restore data.csv or fb restore data.csv -v 2
fb purge <file> Remove archive versions fb purge old_data.csv

License

MIT License


Frostbyte - A lightweight data versioning tool

Frostbyte is an efficient cold storage solution for data scientists who need a lightweight way to version and manage their datasets. It offers compression, tracking, and restoration capabilities through a simple CLI interface, all without requiring any cloud dependencies.




Key Features:


1. Local Version Control for Data

Frostbyte tracks changes to your datasets over time, allowing you to maintain a history of modifications without relying on cloud services.



2. Efficient Compression

Automatically compresses datasets to minimize storage requirements while preserving data integrity.



3. Simple CLI Interface

Designed with data scientists in mind, the intuitive command-line interface makes versioning and retrieving data straightforward.



4. Zero-Cloud Dependencies

Operates entirely on local storage, eliminating the need for cloud services and associated costs.



5. Quick Dataset Recovery

Easily restore datasets to previous versions with a single command, allowing you to revert changes when needed.



Installation

Install Frostbyte using pip:

pip install frostbyte


Basic Usage

# Initialize a new Frostbyte repository
frostbyte init

# Add a dataset to tracking
frostbyte add data/my_dataset.csv

# Commit changes with a message
frostbyte commit -m "Initial dataset version"

# List all versions
frostbyte log

# Restore a previous version
frostbyte restore --version v1
        


Why Use Frostbyte?

Data scientists often work with large datasets that undergo numerous transformations during analysis. Frostbyte addresses the common challenge of tracking these changes without the complexity of full-featured version control systems like Git (which aren't optimized for large data files) or the expense of cloud-based solutions. With Frostbyte, you can maintain a complete history of your datasets locally, making your data science workflow more reproducible and organized.