Instructions to use anarlavrenov/lime-1b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use anarlavrenov/lime-1b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anarlavrenov/lime-1b-instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("anarlavrenov/lime-1b-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use anarlavrenov/lime-1b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anarlavrenov/lime-1b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anarlavrenov/lime-1b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/anarlavrenov/lime-1b-instruct
- SGLang
How to use anarlavrenov/lime-1b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anarlavrenov/lime-1b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anarlavrenov/lime-1b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anarlavrenov/lime-1b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anarlavrenov/lime-1b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use anarlavrenov/lime-1b-instruct with Docker Model Runner:
docker model run hf.co/anarlavrenov/lime-1b-instruct
Note: This model serves as proof that a single individual, without any team or institutional backing, can develop an SLM that demonstrates competitive results. LIME-1B was trained for only ~$1,000 yet delivers quality approaching models trained on hundreds of thousands of dollars of compute-demonstrating exceptional training efficiency.
LIME-1B
LIME-1B is a 1B-parameter, decoder-only Transformer language model trained from scratch on English web data and then instruction-tuned on a curated mixture of assistant-style datasets with and without retrieval context. It is designed as a compact, practical base model for:
- Building RAG systems (context + question → answer)
- Assistant-style Q&A and task completion
- Summarization, explanation, and rewriting tasks in English
⚠️ LIME-1B is not RLHF/DPO-aligned and does not have tool use or multi-turn chat training baked in. It is an instruction-tuned LM, not a fully aligned assistant like ChatGPT.
1. Model architecture
LIME-1B follows is a decoder-only Transformer with several quality-oriented design choices:
| Component | Value |
|---|---|
| Architecture | Decoder-only Transformer |
| Parameters | 1.0B |
| Layers (decoder blocks) | 32 |
| d_model | 1536 |
| FFN dimension (d_ff) | 6144 |
| Attention heads | 24 |
| Vocabulary size | 50,000 |
| Max sequence length | 512 tokens |
| Positional encoding | Sinusoidal |
| Norm | RMSNorm |
| FFN | SiLU MLP |
| Attention | FlashAttention |
| Tying of embeddings | Output head tied to embedding |
| Precision (training) | Mixed fp32/bf16 (autocast) + grad clipping |
2. Training data
2.1 Pretraining
The base model is pretrained as a standard causal language model on English web data:
- Corpus: FineWeb-Edu (CC-MAIN-2025-05 split)
- Language filter: English-only subset
- Objective: next-token prediction (causal LM)
- Token budget: 20B tokens
- Context length: 512 tokens
2.2 Instruction fine-tuning (SFT)
After pretraining, the model is fine-tuned on a unified instruction schema:
<user> instruction_text <assistant> response_text <eos>
SFT Data Mixture (~97k examples total):
- HuggingFaceTB/everyday-conversations-llama3.1-2k
- databricks/databricks-dolly-15k
- HuggingFaceH4/no_robots
- teknium/GPT4-LLM-Cleaned
- Magpie-Align/Magpie-Pro-300K-Filtered
- Dahoas/synthetic-instruct-gptj-pairwise
Training Details
Hardware
- GPUs: 8 × NVIDIA A100 80GB (data parallel)
- Precision: bfloat16 with gradient clipping (max_norm = 1.0)
Pretraining
Objective: Cross-entropy loss on next-token prediction
Optimizer: AdamW
- β₁ = 0.9
- β₂ = 0.95
- Weight decay applied to non-norm/non-bias parameters
Learning Rate Schedule:
- Peak LR: ~5e-4
- Polynomial decay to 5e-6
- Warmup: ~5% of total steps
Instruction fine-tuning (SFT)
Objective: Cross-entropy loss on next-token prediction
Optimizer: AdamW
- β₁ = 0.9
- β₂ = 0.95
- Weight decay applied to non-norm/non-bias parameters
Learning Rate Schedule:
- Peak LR: 8e-5
- Polynomial decay to 1e-5
- Warmup: 10% of total steps
3. Evaluation Benchmarks
The following charts comparing LIME-1B against other models across 8 standard evaluation tasks can be viewed here: 
Usage
# Example usage
# pip install ukraine==0.2.0
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "anarlavrenov/lime-1b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
def build_prompt(question):
uid = "<user>"
aid = "<assistant>"
return uid + question + aid
question = "Write five questions for a Data Scientist interview."
prompt = build_prompt(question)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_length = inputs['input_ids'].shape[1]
outputs = model.generate(
**inputs,
max_new_tokens=128,
num_beams=4,
early_stopping=True,
repetition_penalty=1.15,
no_repeat_ngram_size=3,
min_new_tokens=16,
do_sample=False,
top_p=None,
temperature=None,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
generated_tokens = outputs[0][input_length:]
output = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(output)
# 1. Can you tell us about your experience with data analysis and modeling?
# 2. How do you approach data cleaning and preprocessing?
# 3. How do you approach data visualization and storytelling?
# 4. Can you walk us through a time when you used data to solve a problem?
# 5. How do you approach the ethical considerations of data science and machine learning?
If you use LIME-1B in academic work or public products, please consider citing the model and the underlying datasets according to their respective licenses and documentation.
Anar Lavrenov
Feel free to reach out for questions, or feedback about LIME-1B!
Citation
@misc{lime1b2025,
title = {LIME-1B: A 1B-parameter English Causal Language Model},
author = {Anar Lavrenov},
year = {2025},
howpublished = {\url{https://huggingface.co/anarlavrenov/LIME-1B}}
}
- Downloads last month
- 1,007
