NLP Orange AttentionSeekers (PES1UG23AM016, PES1UG23AM053,PES1UG23AM059,PES1UG23AM222)

SmolVLM2-2.2B Fine-tuned on ChartQA (LoRA)

This is a LoRA adapter for SmolVLM2-2.2B-Instruct, fine-tuned on the ChartQA dataset to answer questions about charts and graphs.

GitHub repository: NLP_Orange_Problem_AttentionSeekers


Model Details

Component Details
Base model HuggingFaceTB/SmolVLM2-2.2B-Instruct
Fine-tuning method LoRA (r=16, alpha=32)
Dataset ChartQA (1000 training samples)
Training hardware Kaggle 2x T4 (32 GB VRAM)
Final training loss 1.855
Epochs 2

How to Use

Installation

pip install transformers peft accelerate torch pillow

Load Adapters and Run Inference

import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText

MODEL_PATH   = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
ADAPTER_PATH = "pes1ug23am016/smolvlm2-chartqa-lora"

# 1. Load base model
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="eager"
)
processor = AutoProcessor.from_pretrained(ADAPTER_PATH)

# 2. Load and merge LoRA adapters
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
model = model.merge_and_unload()
model.eval()

# 3. Run inference
def predict(image, question):
    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": question}
        ]}
    ]
    prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=prompt, images=[[image]], return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=64, do_sample=False)
    new_tokens = out[0][inputs["input_ids"].shape[1]:]
    return processor.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()

# Example
image = Image.open("your_chart.png")
answer = predict(image, "What is the highest value in the chart?")
print(answer)

Training Details

LoRA Config

  • r = 16
  • lora_alpha = 32
  • lora_dropout = 0.05
  • target_modules: q_proj, k_proj, v_proj, o_proj

Training Arguments

  • Batch size: 1 (effective 16 with gradient accumulation)
  • Learning rate: 2e-4
  • Epochs: 2
  • fp16: True
  • Optimizer: AdamW

Limitations

  • Fine-tuned on only 1000 samples — performance on complex or unseen chart types may be limited
  • Best suited for the types of charts present in ChartQA (bar, line, pie charts)
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NLP-Orange-Problem/AttentionSeekers

Dataset used to train NLP-Orange-Problem/AttentionSeekers