NLP Orange AttentionSeekers (PES1UG23AM016, PES1UG23AM053,PES1UG23AM059,PES1UG23AM222)
SmolVLM2-2.2B Fine-tuned on ChartQA (LoRA)
This is a LoRA adapter for SmolVLM2-2.2B-Instruct, fine-tuned on the ChartQA dataset to answer questions about charts and graphs.
GitHub repository: NLP_Orange_Problem_AttentionSeekers
Model Details
| Component | Details |
|---|---|
| Base model | HuggingFaceTB/SmolVLM2-2.2B-Instruct |
| Fine-tuning method | LoRA (r=16, alpha=32) |
| Dataset | ChartQA (1000 training samples) |
| Training hardware | Kaggle 2x T4 (32 GB VRAM) |
| Final training loss | 1.855 |
| Epochs | 2 |
How to Use
Installation
pip install transformers peft accelerate torch pillow
Load Adapters and Run Inference
import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText
MODEL_PATH = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
ADAPTER_PATH = "pes1ug23am016/smolvlm2-chartqa-lora"
# 1. Load base model
model = AutoModelForImageTextToText.from_pretrained(
MODEL_PATH,
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="eager"
)
processor = AutoProcessor.from_pretrained(ADAPTER_PATH)
# 2. Load and merge LoRA adapters
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
model = model.merge_and_unload()
model.eval()
# 3. Run inference
def predict(image, question):
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": question}
]}
]
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[[image]], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=64, do_sample=False)
new_tokens = out[0][inputs["input_ids"].shape[1]:]
return processor.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
# Example
image = Image.open("your_chart.png")
answer = predict(image, "What is the highest value in the chart?")
print(answer)
Training Details
LoRA Config
r = 16lora_alpha = 32lora_dropout = 0.05target_modules: q_proj, k_proj, v_proj, o_proj
Training Arguments
- Batch size: 1 (effective 16 with gradient accumulation)
- Learning rate: 2e-4
- Epochs: 2
- fp16: True
- Optimizer: AdamW
Limitations
- Fine-tuned on only 1000 samples — performance on complex or unseen chart types may be limited
- Best suited for the types of charts present in ChartQA (bar, line, pie charts)
- Downloads last month
- 18
Model tree for NLP-Orange-Problem/AttentionSeekers
Base model
HuggingFaceTB/SmolLM2-1.7B Quantized
HuggingFaceTB/SmolLM2-1.7B-Instruct Quantized
HuggingFaceTB/SmolVLM-Instruct Finetuned
HuggingFaceTB/SmolVLM2-2.2B-Instruct