See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen3-VL-8B-Instruct
processor_type: AutoProcessor

# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
# remove_unused_columns: false
# sample_packing: false

chat_template: qwen2_vl

datasets:
  - path: yurui983/reference-parsing-finetuning
    split: train
    type: chat_template
    roles_to_train: [assistant]

test_datasets:
  - path: yurui983/reference-parsing-finetuning
    split: validation
    type: chat_template
    roles_to_train: [assistant]

dataset_prepared_path: last_run_prepared
output_dir: ./outputs/qwen3-vl-8b-out

adapter: lora
lora_model_dir:

sequence_len: 10000
excess_length_strategy: drop

lora_r: 64
lora_alpha: 128
lora_dropout: 0.1
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
# lora_target_linear: true  # Don't use this for vLM - it targets vision layers too

wandb_project: Qwen-lora-ref-parsing
wandb_entity: zhuyurui0323-odoma
wandb_watch: gradients
wandb_name: Qwen3-VL-8B-lora-ref-parsing
wandb_log_model: 'false'

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00005

bf16: auto
tf32: false

gradient_checkpointing: false
logging_steps: 5
flash_attention: true
eager_attention:

ddp_find_unused_parameters: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 3
weight_decay: 0.0

max_grad_norm: 1.0

# Additional training stability settings
special_tokens:
  pad_token: "<|endoftext|>"

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

Model description

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct on the odoma/reference-parsing-finetuning dataset.

This repository releases a LoRA adapter tailored to bibliographic reference parsing. Given reference strings, the model produces structured JSON covering authors, title, date, venue/publisher, and related fields, with a particular focus on Social Sciences and Humanities (SSH) citation practices. The adapter is trained on a curated mix of CEX, EXCITE, and LinkedBooks references, and consistently improves field-level F1 over the base model, especially on multilingual and humanities-style citations. It is intended to plug into citation indexing and linking pipelines that already provide reference strings (e.g., from PDF/layout tools or curated text).

It achieves the following results on the evaluation set:

Loss: 0.4088
Memory/max Active (gib): 41.41
Memory/max Allocated (gib): 41.41
Memory/device Reserved (gib): 42.71

Intended uses & limitations

Intended uses

Reference parsing in citation indexing pipelines: convert already-extracted reference strings into structured JSON for downstream linking (e.g., OpenAlex/Wikidata matching) and analytics.
SSH-oriented citation processing, including multilingual and stylistically diverse references (e.g., humanities monographs, footnote-like formats, abbreviated venues/publishers).
Batch parsing of reference lists when each reference string is provided separately (recommended: one reference per call).

Limitations

Not a reference extractor: it does not detect or segment references from raw PDFs or full-text documents. Use a layout/OCR or extraction step first.
Schema sensitivity: the adapter is optimized for a specific target JSON schema; changing field names or required fields may reduce quality unless prompts are updated (or the adapter is re-tuned).
Underspecified citations: when key information is missing or ambiguous in the input string, the model may output partial JSON or infer fields (hallucinations). For high-precision applications, apply validation rules (e.g., required fields, date formats) and consider human review on low-confidence cases.
Formatting noise: line breaks, hyphenation, OCR artifacts, or multiple references concatenated into one string can degrade performance.

Training and evaluation data

The adapter was fine-tuned on odoma/reference-parsing-finetuning, a supervised dataset of (reference string → JSON) pairs formatted in an instruction/chat style. The data is curated from three complementary gold standards that reflect different citation regimes:

CEX: English-language scientific articles with relatively regular bibliography formatting.
EXCITE: German/English SSH documents with end-section, footnote-only, and mixed citation regimes.
LinkedBooks: humanities references with strong stylistic variation and multilinguality; the schema coverage is more limited (typically authors/title/date/place).

Evaluation uses a schema-constrained parsing setup aligned with training, and reports both generation quality (e.g., structured-field accuracy / F1 in downstream benchmarking) and training-time metrics (loss and memory footprint as shown above).

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 8
total_eval_batch_size: 2
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 64
training_steps: 641

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	2.1458	28.92	21.19	29.21
0.5256	0.2529	54	0.4538	30.71	30.71	42.29
0.4721	0.5059	108	0.4089	37.16	37.16	43.29
0.4555	0.7588	162	0.3978	33.16	33.16	47.03
0.4109	1.0094	216	0.3985	29.49	29.49	51.96
0.4537	1.2623	270	0.3949	22.49	22.49	43.8
0.3917	1.5152	324	0.3973	38.62	38.62	44.93
0.3674	1.7681	378	0.3992	30.63	30.63	53.1
0.3429	2.0187	432	0.3993	33.89	33.89	44.89
0.4119	2.2717	486	0.4042	24.19	24.19	53.04
0.3807	2.5246	540	0.4075	22.49	22.49	46.82
0.2968	2.7775	594	0.4088	41.41	41.41	42.71

Framework versions

PEFT 0.17.1
Transformers 4.57.1
Pytorch 2.7.1+cu126
Datasets 4.0.0
Tokenizers 0.22.1

Credits

The dataset is being developed by Yurui Zhu (Odoma). This work is carried out in the context of the EU-funded GRAPHIA project (grant ID: 101188018).

Downloads last month: -

Model tree for odoma/Qwen3-VL-8B-LoRA-ref-parsing

Base model

Qwen/Qwen3-VL-8B-Instruct

Adapter

(48)

this model

odoma
/

Qwen3-VL-8B-LoRA-ref-parsing