Nizami-1.7B
A Lightweight Language Model
Model Description π
Nizami-1.7B is a fine-tuned version of Qwen3-1.7B in Azerbaijani. It was trained on a curated dataset of 35,916 examples from historical, legal, math, philosophical, and social science texts.
Key Features β¨
- Architecture: Transformer-based language model ποΈ
- Developed by: Rustam Shiriyev
- Language(s): Azerbaijani
- License: MIT
- Fine-Tuning Method: Supervised fine-tuning
- Domain: Academic texts (History, Math, Law, Philosophy, Social Sciences) π
- Finetuned from model: unsloth/Qwen3-1.7B
Intended Use
- Academic research assistance in Azerbaijani π
- Question answering on humanities/social science topics π―
- Knowledge exploration in Azerbaijaniβ‘
Limitations β οΈ
- Generating factual statements without verification
- Limited dataset size (35,916 examples) β may not generalize perfectly outside training domains.
- Possible hallucinations if asked for factual details.
Evaluation π
AARA: khazarai/AARA_Azerbaijani_LLM_Benchmark
| Model Name | AARA |
|---|---|
| khazarai/Nizami-1.7B | 40.0 |
| Qwen/Qwen3-1.7B | 39.0 |
| google/gemma-2-2b-it | 34.5 |
| Qwen/Qwen2.5-1.5B-Instruct | 13.5 |
| meta-llama/Llama-3.2-1B-Instruct | 11.0 |
How to Get Started with the Model π»
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen3-1.7B",
device_map={"": 0}
)
model = PeftModel.from_pretrained(base_model,"khazarai/Nizami-1.7B")
question = """
ΖldΙ olunan arxeoloji qazΔ±ntΔ± materiallarΔ±na ΙsasΙn, Eneolit dΓΆvrΓΌndΙ AzΙrbaycanda metalΔ±n ilk istifadΙsi ilΙ baΔlΔ± hansΔ± konkret obyektlΙr tapΔ±lmΔ±ΕdΔ±r vΙ bu obyektlΙr hΙmin dΓΆvrdΙ cΙmiyyΙtin sosial strukturunun inkiΕafΔ±na necΙ tΙsir etmiΕdir? ΖlavΙ olaraq, hΙmin dΓΆvrdΙ metallurgiya vΙ metaliΕlΙmΙ sΙnΙtkarlΔ±ΔΔ±nΔ±n inkiΕafΔ±nΔ±n iqtisadi vΙ mΙdΙni aspektlΙri haqqΔ±nda nΙ deyΙ bilΙrsiniz?
"""
messages = [
{"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt = True,
enable_thinking = False,
)
from transformers import TextStreamer
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to("cuda"),
max_new_tokens = 1800,
temperature = 0.7,
top_p = 0.8,
top_k = 20,
streamer = TextStreamer(tokenizer, skip_prompt = True),
)
For pipeline:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-1.7B")
model = PeftModel.from_pretrained(base_model, "khazarai/Nizami-1.7B")
question ="""
ΖldΙ olunan arxeoloji qazΔ±ntΔ± materiallarΔ±na ΙsasΙn, Eneolit dΓΆvrΓΌndΙ AzΙrbaycanda metalΔ±n ilk istifadΙsi ilΙ baΔlΔ± hansΔ± konkret obyektlΙr tapΔ±lmΔ±ΕdΔ±r vΙ bu obyektlΙr hΙmin dΓΆvrdΙ cΙmiyyΙtin sosial strukturunun inkiΕafΔ±na necΙ tΙsir etmiΕdir? ΖlavΙ olaraq, hΙmin dΓΆvrdΙ metallurgiya vΙ metaliΕlΙmΙ sΙnΙtkarlΔ±ΔΔ±nΔ±n inkiΕafΔ±nΔ±n iqtisadi vΙ mΙdΙni aspektlΙri haqqΔ±nda nΙ deyΙ bilΙrsiniz?
"""
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role": "user", "content": question}
]
pipe(messages)
Training Data
Dataset I: az-llm/az_academic_qa-v1.0 Description: A 7,000-example dataset for academic-style comprehension and reasoning in Azerbaijani.
Dataset II: az-llm/az_creative-v1.0 Description: A 4,000-example creative dataset with imaginative Azerbaijani prompts and expressive responses. Includes role-based instructions (e.g., Galileo, interstellar assistant, detective), fictional narratives, poetic reasoning, and emotional simulations.
Dataset III: tahmaz/azerbaijani_text_math_qa1 Description: A dataset of 6,500 high school math examples in Azerbaijani.
Dataset IV: omar07ibrahim/Alpaca_Stanford_Azerbaijan Description: Azerbaijani version of the Alpaca dataset for instruction-following tasks.
Framework versions
- PEFT 0.16.0
- Downloads last month
- 14