How did you get it to train with Axolotl on a Pro 6000?

by OwenArli - opened 19 days ago

19 days ago

Can't seem to not OOM even on 2x Pro 6000 trying to train with Axolotl. Curious how this model was trained if you are willing to share. Thanks!

Green-eyedDevil

Owner 18 days ago

My dataset is about 2 million tokens in length. With these settings it takes about 78GB of VRAM at a maximum.

base_model: zai-org/GLM-4.5-Air
load_in_4bit: true
load_in_8bit: false
bnb_4bit_use_double_quant: false
qlora_sharded_model_loading: false
datasets:

path: /dataset.json
type: alpaca

val_set_size: 0.1
output_dir: ./outputs/lora-out
adapter: qlora
lora_model_dir:
sequence_len: 16384
sample_packing: false
eval_sample_packing: false
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:

gate_proj
down_proj
up_proj
q_proj
v_proj
k_proj
o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_clipping: 1.0
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_8bit
lr_scheduler: constant
learning_rate: 0.000008
bf16: auto
tf32: false
gradient_checkpointing: true
activation_offloading: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: false
sdp_attention: true
loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3
warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
dtype: bfloat16
low_cpu_mem_usage: true
special_tokens:
pad_token: "<|end_of_text|>"
save_first_step: true

OwenArli

18 days ago

Oh nice! Thanks for sharing, I eventually figured out a config that works for me too! I can apparently do 32K context with 2 cards and Deepspeed Zero 2.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment