See axolotl config
axolotl version: 0.13.0.dev0
base_model: Qwen/Qwen3-VL-8B-Instruct
processor_type: AutoProcessor
# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
# remove_unused_columns: false
# sample_packing: false
chat_template: qwen2_vl
datasets:
- path: yurui983/reference-parsing-finetuning
split: train
type: chat_template
roles_to_train: [assistant]
test_datasets:
- path: yurui983/reference-parsing-finetuning
split: validation
type: chat_template
roles_to_train: [assistant]
dataset_prepared_path: last_run_prepared
output_dir: ./outputs/qwen3-vl-8b-out
adapter: lora
lora_model_dir:
sequence_len: 10000
excess_length_strategy: drop
lora_r: 64
lora_alpha: 128
lora_dropout: 0.1
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
# lora_target_linear: true # Don't use this for vLM - it targets vision layers too
wandb_project: Qwen-lora-ref-parsing
wandb_entity: zhuyurui0323-odoma
wandb_watch: gradients
wandb_name: Qwen3-VL-8B-lora-ref-parsing
wandb_log_model: 'false'
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00005
bf16: auto
tf32: false
gradient_checkpointing: false
logging_steps: 5
flash_attention: true
eager_attention:
ddp_find_unused_parameters: true
loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3
warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 3
weight_decay: 0.0
max_grad_norm: 1.0
# Additional training stability settings
special_tokens:
pad_token: "<|endoftext|>"
# save_first_step: true # uncomment this to validate checkpoint saving works with your config
Model description
This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct on the odoma/reference-parsing-finetuning dataset.
This repository releases a LoRA adapter tailored to bibliographic reference parsing. Given reference strings, the model produces structured JSON covering authors, title, date, venue/publisher, and related fields, with a particular focus on Social Sciences and Humanities (SSH) citation practices. The adapter is trained on a curated mix of CEX, EXCITE, and LinkedBooks references, and consistently improves field-level F1 over the base model, especially on multilingual and humanities-style citations. It is intended to plug into citation indexing and linking pipelines that already provide reference strings (e.g., from PDF/layout tools or curated text).
It achieves the following results on the evaluation set:
- Loss: 0.4088
- Memory/max Active (gib): 41.41
- Memory/max Allocated (gib): 41.41
- Memory/device Reserved (gib): 42.71
Intended uses & limitations
Intended uses
- Reference parsing in citation indexing pipelines: convert already-extracted reference strings into structured JSON for downstream linking (e.g., OpenAlex/Wikidata matching) and analytics.
- SSH-oriented citation processing, including multilingual and stylistically diverse references (e.g., humanities monographs, footnote-like formats, abbreviated venues/publishers).
- Batch parsing of reference lists when each reference string is provided separately (recommended: one reference per call).
Limitations
- Not a reference extractor: it does not detect or segment references from raw PDFs or full-text documents. Use a layout/OCR or extraction step first.
- Schema sensitivity: the adapter is optimized for a specific target JSON schema; changing field names or required fields may reduce quality unless prompts are updated (or the adapter is re-tuned).
- Underspecified citations: when key information is missing or ambiguous in the input string, the model may output partial JSON or infer fields (hallucinations). For high-precision applications, apply validation rules (e.g., required fields, date formats) and consider human review on low-confidence cases.
- Formatting noise: line breaks, hyphenation, OCR artifacts, or multiple references concatenated into one string can degrade performance.
Training and evaluation data
The adapter was fine-tuned on odoma/reference-parsing-finetuning, a supervised dataset of (reference string → JSON) pairs formatted in an instruction/chat style. The data is curated from three complementary gold standards that reflect different citation regimes:
- CEX: English-language scientific articles with relatively regular bibliography formatting.
- EXCITE: German/English SSH documents with end-section, footnote-only, and mixed citation regimes.
- LinkedBooks: humanities references with strong stylistic variation and multilinguality; the schema coverage is more limited (typically authors/title/date/place).
Evaluation uses a schema-constrained parsing setup aligned with training, and reports both generation quality (e.g., structured-field accuracy / F1 in downstream benchmarking) and training-time metrics (loss and memory footprint as shown above).
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- total_eval_batch_size: 2
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 64
- training_steps: 641
Training results
| Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 2.1458 | 28.92 | 21.19 | 29.21 |
| 0.5256 | 0.2529 | 54 | 0.4538 | 30.71 | 30.71 | 42.29 |
| 0.4721 | 0.5059 | 108 | 0.4089 | 37.16 | 37.16 | 43.29 |
| 0.4555 | 0.7588 | 162 | 0.3978 | 33.16 | 33.16 | 47.03 |
| 0.4109 | 1.0094 | 216 | 0.3985 | 29.49 | 29.49 | 51.96 |
| 0.4537 | 1.2623 | 270 | 0.3949 | 22.49 | 22.49 | 43.8 |
| 0.3917 | 1.5152 | 324 | 0.3973 | 38.62 | 38.62 | 44.93 |
| 0.3674 | 1.7681 | 378 | 0.3992 | 30.63 | 30.63 | 53.1 |
| 0.3429 | 2.0187 | 432 | 0.3993 | 33.89 | 33.89 | 44.89 |
| 0.4119 | 2.2717 | 486 | 0.4042 | 24.19 | 24.19 | 53.04 |
| 0.3807 | 2.5246 | 540 | 0.4075 | 22.49 | 22.49 | 46.82 |
| 0.2968 | 2.7775 | 594 | 0.4088 | 41.41 | 41.41 | 42.71 |
Framework versions
- PEFT 0.17.1
- Transformers 4.57.1
- Pytorch 2.7.1+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
Credits
The dataset is being developed by Yurui Zhu (Odoma). This work is carried out in the context of the EU-funded GRAPHIA project (grant ID: 101188018).
- Downloads last month
- -
Model tree for odoma/Qwen3-VL-8B-LoRA-ref-parsing
Base model
Qwen/Qwen3-VL-8B-Instruct