Instructions to use GSAI-ML/LLaDA-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GSAI-ML/LLaDA-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use GSAI-ML/LLaDA-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GSAI-ML/LLaDA-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/LLaDA-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/GSAI-ML/LLaDA-8B-Instruct

SGLang

How to use GSAI-ML/LLaDA-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GSAI-ML/LLaDA-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/LLaDA-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GSAI-ML/LLaDA-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/LLaDA-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use GSAI-ML/LLaDA-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/GSAI-ML/LLaDA-8B-Instruct
```

Model performance

by icoicqico - opened Mar 3, 2025

Discussion

icoicqico

Mar 3, 2025

•

edited Mar 3, 2025

I would like to know if anyone done some testing on this model? I have tried to run few sample prompt on it, and it seems not as good as other models like llama 3, 3.1, 3.2 and mistral. I used the same prompt format for the testing, the problem of this model is the instruction following not good, my test case is to summarize a document, I gave some instructions to the model like what information to summarize, use bullet point or list item, focus on something. Others model provided output seems to follow the instruction, some with higher quality but we can see at least the model trying to follow the format and try to capture the info I requested. But not even one bullet point/list items when using llada and not focus on anything I requested, just looks like completely ignore my instruction. Other than that, I tried to ask some coding question, it just reply something like as an AI model, can't help you about that, but others model doesn't have this issue.
Edit: After some prompt adjustment, able to get - before sentence, but the structure is still sentence/paragraph not bullet point list item like others model generate. I wonder if it has another template format I have to follow to make this work. It will be good if someone get good instruction following result and share with us how do you do it, thanks.

nieshen

GSAI-ML org Mar 4, 2025

Thank you very much for your attention!

First of all, I would like to ask whether you are using the Base model or the Instruct model. If it is the Instruct model, the performance of LLaDA is still currently behind that of LLaMA3. There may be the following reasons for this: 1. LLaDA has not used Reinforcement Learning from Human Feedback (RLHF) yet. However, models such as LLaMA and Mistral have used RLHF, which gives these autoregressive models stronger instruction-following capabilities; 2. The data quality of our Supervised Fine-Tuning (SFT) is still lacking.

In any case, thank you very much for your reply. We would be extremely grateful if you could provide some bad cases, as this will be very helpful for us to improve LLaDA. We also look forward to the community working together to enhance the performance of LLaDA.

icoicqico

Mar 5, 2025

Thanks for the reply, I guess I will try to fine tune with some of my data.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment