Inference Providers documentation
Cerebras
Get Started
Guides
Your First API CallBuilding Your First AI AppStructured Outputs with LLMsFunction CallingResponses API (beta)How to use OpenAI gpt-ossBuild an Image EditorAutomating Code Review with GitHub ActionsAgentic Coding Environments with OpenEnvEvaluating Models with Inspect
Integrations
OverviewAdd Your IntegrationClaude CodeHermes AgentNeMo Data DesignerMacWhisperOpenCodePiVision AgentsVS Code with GitHub Copilot
Inference Tasks
Providers
CerebrasCohereDeepInfraFal AIFeatherless AIFireworksGroqHyperbolicHF InferenceNovitaNscaleOVHcloud AI EndpointsPublic AIReplicateSambaNovaScalewayTogetherWaveSpeedAIZ.ai
Hub APIRegister as an Inference ProviderCerebras
All supported Cerebras models can be found here
Cerebras stands alone as the world’s fastest AI inference and training platform. Organizations across fields like medical research, cryptography, energy, and agentic AI use our CS-2 and CS-3 systems to build on-premise supercomputers, while developers and enterprises everywhere can access the power of Cerebras through our pay-as-you-go cloud offerings.
Supported tasks
Chat Completion (LLM)
Find out more about Chat Completion (LLM) here.
Language
Client
Provider
Copied
import os
from openai import OpenAI
client = OpenAI(
base_url="https://huggingface.co/proxy/router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="zai-org/GLM-4.7:cerebras",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
)
print(completion.choices[0].message)
