Instructions to use AtlaAI/Selene-1-Mini-Llama-3.1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AtlaAI/Selene-1-Mini-Llama-3.1-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AtlaAI/Selene-1-Mini-Llama-3.1-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AtlaAI/Selene-1-Mini-Llama-3.1-8B")
model = AutoModelForCausalLM.from_pretrained("AtlaAI/Selene-1-Mini-Llama-3.1-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AtlaAI/Selene-1-Mini-Llama-3.1-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AtlaAI/Selene-1-Mini-Llama-3.1-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AtlaAI/Selene-1-Mini-Llama-3.1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B

SGLang

How to use AtlaAI/Selene-1-Mini-Llama-3.1-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AtlaAI/Selene-1-Mini-Llama-3.1-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AtlaAI/Selene-1-Mini-Llama-3.1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AtlaAI/Selene-1-Mini-Llama-3.1-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AtlaAI/Selene-1-Mini-Llama-3.1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AtlaAI/Selene-1-Mini-Llama-3.1-8B with Docker Model Runner:
```
docker model run hf.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B
```

Applying with Ragas/DeepEval evaluation

by tapos999 - opened Jan 30, 2025

Discussion

tapos999

Jan 30, 2025

Hi, I havent got the chance to try it out but real curious if anyone can confirm if I want to consider RAGAS/DeepEval evaluation for evaluating, will this model work out of the box with their prompts or do I need to take into consideration something specifically for this model?

mathias-atla

Atla org Jan 30, 2025

Hi @tapos999 , glad to hear you are curious!
I think it will just "work out of the box".
You can find a quickstart on the model card.
And we also have a cookbook repo with some usage examples.
Let us know what you think & happy to help if you encounter any issues!

rex099

Mar 6, 2025

hey i would like to know how much of a degradation can i expect in terms of the judge capabilities of the model if i am using a quant of Q4_K_M ?

spisupat

Atla org Mar 7, 2025

Hey @rex099 great question! We haven't evaluated the Q4_K_M quant specifically, but the 4-bit quant we released recently loses only about 0.5 percentage points on average across benchmarks - we quantised that one using GPTQ calibrated on our training data!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment