Instructions to use anarlavrenov/lime-1b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anarlavrenov/lime-1b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anarlavrenov/lime-1b-instruct", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("anarlavrenov/lime-1b-instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use anarlavrenov/lime-1b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anarlavrenov/lime-1b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anarlavrenov/lime-1b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/anarlavrenov/lime-1b-instruct

SGLang

How to use anarlavrenov/lime-1b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anarlavrenov/lime-1b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anarlavrenov/lime-1b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anarlavrenov/lime-1b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anarlavrenov/lime-1b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use anarlavrenov/lime-1b-instruct with Docker Model Runner:
```
docker model run hf.co/anarlavrenov/lime-1b-instruct
```

LIME-1B Model Card

Note: This model serves as proof that a single individual, without any team or institutional backing, can develop an SLM that demonstrates competitive results. LIME-1B was trained for only ~$1,000 yet delivers quality approaching models trained on hundreds of thousands of dollars of compute-demonstrating exceptional training efficiency.

LIME-1B

LIME-1B is a 1B-parameter, decoder-only Transformer language model trained from scratch on English web data and then instruction-tuned on a curated mixture of assistant-style datasets with and without retrieval context. It is designed as a compact, practical base model for:

Building RAG systems (context + question → answer)
Assistant-style Q&A and task completion
Summarization, explanation, and rewriting tasks in English

⚠️ LIME-1B is not RLHF/DPO-aligned and does not have tool use or multi-turn chat training baked in. It is an instruction-tuned LM, not a fully aligned assistant like ChatGPT.

1. Model architecture

LIME-1B follows is a decoder-only Transformer with several quality-oriented design choices:

Component	Value
Architecture	Decoder-only Transformer
Parameters	1.0B
Layers (decoder blocks)	32
d_model	1536
FFN dimension (d_ff)	6144
Attention heads	24
Vocabulary size	50,000
Max sequence length	512 tokens
Positional encoding	Sinusoidal
Norm	RMSNorm
FFN	SiLU MLP
Attention	FlashAttention
Tying of embeddings	Output head tied to embedding
Precision (training)	Mixed fp32/bf16 (autocast) + grad clipping

2. Training data

2.1 Pretraining

The base model is pretrained as a standard causal language model on English web data:

Corpus: FineWeb-Edu (CC-MAIN-2025-05 split)
Language filter: English-only subset
Objective: next-token prediction (causal LM)
Token budget: 20B tokens
Context length: 512 tokens

2.2 Instruction fine-tuning (SFT)

After pretraining, the model is fine-tuned on a unified instruction schema:

<user> instruction_text <assistant> response_text <eos>

SFT Data Mixture (~97k examples total):

Training Details

Hardware

GPUs: 8 × NVIDIA A100 80GB (data parallel)
Precision: bfloat16 with gradient clipping (max_norm = 1.0)

Pretraining

Objective: Cross-entropy loss on next-token prediction

Optimizer: AdamW

β₁ = 0.9
β₂ = 0.95
Weight decay applied to non-norm/non-bias parameters

Learning Rate Schedule:

Peak LR: ~5e-4
Polynomial decay to 5e-6
Warmup: ~5% of total steps

Instruction fine-tuning (SFT)

Objective: Cross-entropy loss on next-token prediction

Optimizer: AdamW

β₁ = 0.9
β₂ = 0.95
Weight decay applied to non-norm/non-bias parameters

Learning Rate Schedule:

Peak LR: 8e-5
Polynomial decay to 1e-5
Warmup: 10% of total steps

3. Evaluation Benchmarks

The following charts comparing LIME-1B against other models across 8 standard evaluation tasks can be viewed here:

Usage

# Example usage
# pip install ukraine==0.2.0

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "anarlavrenov/lime-1b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

def build_prompt(question):
  uid = "<user>"
  aid = "<assistant>"
  return uid + question + aid

question = "Write five questions for a Data Scientist interview."
prompt = build_prompt(question)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_length = inputs['input_ids'].shape[1]

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    num_beams=4,
    early_stopping=True,
    repetition_penalty=1.15,
    no_repeat_ngram_size=3,
    min_new_tokens=16,
    do_sample=False,
    top_p=None,
    temperature=None,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

generated_tokens = outputs[0][input_length:]
output = tokenizer.decode(generated_tokens, skip_special_tokens=True)

print(output)

# 1. Can you tell us about your experience with data analysis and modeling? 
# 2. How do you approach data cleaning and preprocessing? 
# 3. How do you approach data visualization and storytelling? 
# 4. Can you walk us through a time when you used data to solve a problem? 
# 5. How do you approach the ethical considerations of data science and machine learning?

If you use LIME-1B in academic work or public products, please consider citing the model and the underlying datasets according to their respective licenses and documentation.

Anar Lavrenov

Feel free to reach out for questions, or feedback about LIME-1B!

Citation

@misc{lime1b2025,
  title         = {LIME-1B: A 1B-parameter English Causal Language Model},
  author        = {Anar Lavrenov},
  year          = {2025},
  howpublished  = {\url{https://huggingface.co/anarlavrenov/LIME-1B}}
}

Downloads last month: 1,007

Safetensors

Model size

1.0B params

Tensor type

F32

anarlavrenov
/

lime-1b-instruct