Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen3-VL-8B-Instruct
processor_type: AutoProcessor

# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
# remove_unused_columns: false
# sample_packing: false

chat_template: qwen2_vl

datasets:
  - path: yurui983/reference-parsing-finetuning
    split: train
    type: chat_template
    roles_to_train: [assistant]

test_datasets:
  - path: yurui983/reference-parsing-finetuning
    split: validation
    type: chat_template
    roles_to_train: [assistant]

dataset_prepared_path: last_run_prepared
output_dir: ./outputs/qwen3-vl-8b-out

adapter: lora
lora_model_dir:

sequence_len: 10000
excess_length_strategy: drop

lora_r: 64
lora_alpha: 128
lora_dropout: 0.1
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
# lora_target_linear: true  # Don't use this for vLM - it targets vision layers too

wandb_project: Qwen-lora-ref-parsing
wandb_entity: zhuyurui0323-odoma
wandb_watch: gradients
wandb_name: Qwen3-VL-8B-lora-ref-parsing
wandb_log_model: 'false'

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00005

bf16: auto
tf32: false

gradient_checkpointing: false
logging_steps: 5
flash_attention: true
eager_attention:

ddp_find_unused_parameters: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 3
weight_decay: 0.0

max_grad_norm: 1.0

# Additional training stability settings
special_tokens:
  pad_token: "<|endoftext|>"

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

Model description

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct on the odoma/reference-parsing-finetuning dataset.

This repository releases a LoRA adapter tailored to bibliographic reference parsing. Given reference strings, the model produces structured JSON covering authors, title, date, venue/publisher, and related fields, with a particular focus on Social Sciences and Humanities (SSH) citation practices. The adapter is trained on a curated mix of CEX, EXCITE, and LinkedBooks references, and consistently improves field-level F1 over the base model, especially on multilingual and humanities-style citations. It is intended to plug into citation indexing and linking pipelines that already provide reference strings (e.g., from PDF/layout tools or curated text).

It achieves the following results on the evaluation set:

  • Loss: 0.4088
  • Memory/max Active (gib): 41.41
  • Memory/max Allocated (gib): 41.41
  • Memory/device Reserved (gib): 42.71

Intended uses & limitations

Intended uses

  • Reference parsing in citation indexing pipelines: convert already-extracted reference strings into structured JSON for downstream linking (e.g., OpenAlex/Wikidata matching) and analytics.
  • SSH-oriented citation processing, including multilingual and stylistically diverse references (e.g., humanities monographs, footnote-like formats, abbreviated venues/publishers).
  • Batch parsing of reference lists when each reference string is provided separately (recommended: one reference per call).

Limitations

  • Not a reference extractor: it does not detect or segment references from raw PDFs or full-text documents. Use a layout/OCR or extraction step first.
  • Schema sensitivity: the adapter is optimized for a specific target JSON schema; changing field names or required fields may reduce quality unless prompts are updated (or the adapter is re-tuned).
  • Underspecified citations: when key information is missing or ambiguous in the input string, the model may output partial JSON or infer fields (hallucinations). For high-precision applications, apply validation rules (e.g., required fields, date formats) and consider human review on low-confidence cases.
  • Formatting noise: line breaks, hyphenation, OCR artifacts, or multiple references concatenated into one string can degrade performance.

Training and evaluation data

The adapter was fine-tuned on odoma/reference-parsing-finetuning, a supervised dataset of (reference string → JSON) pairs formatted in an instruction/chat style. The data is curated from three complementary gold standards that reflect different citation regimes:

  • CEX: English-language scientific articles with relatively regular bibliography formatting.
  • EXCITE: German/English SSH documents with end-section, footnote-only, and mixed citation regimes.
  • LinkedBooks: humanities references with strong stylistic variation and multilinguality; the schema coverage is more limited (typically authors/title/date/place).

Evaluation uses a schema-constrained parsing setup aligned with training, and reports both generation quality (e.g., structured-field accuracy / F1 in downstream benchmarking) and training-time metrics (loss and memory footprint as shown above).

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 64
  • training_steps: 641

Training results

Training Loss Epoch Step Validation Loss Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 2.1458 28.92 21.19 29.21
0.5256 0.2529 54 0.4538 30.71 30.71 42.29
0.4721 0.5059 108 0.4089 37.16 37.16 43.29
0.4555 0.7588 162 0.3978 33.16 33.16 47.03
0.4109 1.0094 216 0.3985 29.49 29.49 51.96
0.4537 1.2623 270 0.3949 22.49 22.49 43.8
0.3917 1.5152 324 0.3973 38.62 38.62 44.93
0.3674 1.7681 378 0.3992 30.63 30.63 53.1
0.3429 2.0187 432 0.3993 33.89 33.89 44.89
0.4119 2.2717 486 0.4042 24.19 24.19 53.04
0.3807 2.5246 540 0.4075 22.49 22.49 46.82
0.2968 2.7775 594 0.4088 41.41 41.41 42.71

Framework versions

  • PEFT 0.17.1
  • Transformers 4.57.1
  • Pytorch 2.7.1+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1

Credits

The dataset is being developed by Yurui Zhu (Odoma). This work is carried out in the context of the EU-funded GRAPHIA project (grant ID: 101188018).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for odoma/Qwen3-VL-8B-LoRA-ref-parsing

Adapter
(48)
this model

Dataset used to train odoma/Qwen3-VL-8B-LoRA-ref-parsing