# Forked Lingua ## Setup ```bash bash setup/create_env.sh ``` Once that is done your can activate the environment ```bash source ~/envs/lingua_/bin/activate ``` ## Data ```bash python setup/download_prepare_hf_data.py dclm_baseline_1.0_10prct --data_dir /mnt/bn/tiktok-mm-5/aiic/users/linzheng/data/dclm_10prct --seed 42 --nchunks ``` ```bash torchrun --nproc-per-node 8 -m apps.evabyte.train config=apps/evabyte/configs/evabyte_7b.yaml ```