Instructions to use nectec/thai-research-gemma-3-27b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use nectec/thai-research-gemma-3-27b-it with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="nectec/thai-research-gemma-3-27b-it", filename="thai-research-gemma-3-27b-it-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use nectec/thai-research-gemma-3-27b-it with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M # Run inference directly in the terminal: llama-cli -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M # Run inference directly in the terminal: llama-cli -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf nectec/thai-research-gemma-3-27b-it:Q4_K_M
Use Docker
docker model run hf.co/nectec/thai-research-gemma-3-27b-it:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use nectec/thai-research-gemma-3-27b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nectec/thai-research-gemma-3-27b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nectec/thai-research-gemma-3-27b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nectec/thai-research-gemma-3-27b-it:Q4_K_M
- Ollama
How to use nectec/thai-research-gemma-3-27b-it with Ollama:
ollama run hf.co/nectec/thai-research-gemma-3-27b-it:Q4_K_M
- Unsloth Studio
How to use nectec/thai-research-gemma-3-27b-it with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nectec/thai-research-gemma-3-27b-it to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nectec/thai-research-gemma-3-27b-it to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nectec/thai-research-gemma-3-27b-it to start chatting
- Docker Model Runner
How to use nectec/thai-research-gemma-3-27b-it with Docker Model Runner:
docker model run hf.co/nectec/thai-research-gemma-3-27b-it:Q4_K_M
- Lemonade
How to use nectec/thai-research-gemma-3-27b-it with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull nectec/thai-research-gemma-3-27b-it:Q4_K_M
Run and chat with the model
lemonade run user.thai-research-gemma-3-27b-it-Q4_K_M
List all available models
lemonade list
Authors: NECTEC
Description
This model, nectec/thai-research-gemma-3-27b-it, also known as Pathumma LLM AI, is a fine-tuned version of Google's Gemma 3 27B model, specialized for the Thai language. It was developed by the National Electronics and Computer Technology Center (NECTEC) of Thailand.
Starting with the google/gemma-3-27b-pt checkpoint, the model underwent continued pre-training on a diverse corpus of approximately 8 billion Thai tokens. Following this, it was instruction fine-tuned on 3,052,736 high-quality Thai question-answer pairs to enhance its ability to follow instructions and engage in conversation.
Inputs and outputs
Input:
- Text string, such as a question, a prompt, or a document to be summarized, primarily in Thai.
- Total input context of 128K tokens
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document, primarily in Thai.
- Total output context of 8192 tokens
Usage
Below there are some code snippets on how to get quickly started with running the model. First, install the necessary libraries.
Running with transformers on a GPU
$ pip install -U transformers accelerate torch
You can use the model with the transformers library as follows. The EOSLogitsBiasProcessor can be helpful if the model has trouble generating the end-of-sequence token.
from transformers import AutoTokenizer, LogitsProcessor, LogitsProcessorList, AutoModelForImageTextToText
import torch
model_id = "nectec/thai-research-gemma-3-27b-it"
model = AutoModelForImageTextToText.from_pretrained(
model_id, device_map="auto", torch_dtype=torch.bfloat16
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)
class EOSLogitsBiasProcessor(LogitsProcessor):
def __init__(self,tokenizer, eos_token_id, bias_value=5.0, space_bias = -0.3):
self.eos_token_id = eos_token_id
self.bias_value = bias_value
self.space_bias = space_bias
def __call__(self, input_ids, scores):
scores[:, self.eos_token_id] += self.bias_value
scores[:,107] += 0.3
return scores
logits_processor = LogitsProcessorList([
EOSLogitsBiasProcessor(tokenizer,eos_token_id=tokenizer.eos_token_id, bias_value=10.0)
])
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful Thai assistant. You are Pathumma LLM AI that build by National Electronics and Computer Technology Center (NECTEC)."}]
},
{
"role": "user",
"content": [
{"type": "text", "text": "ขอสูตรส้มตำหน่อย"}
]
},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
num_beams=2,
repetition_penalty=1.1,
temperature=0.4,
logits_processor=logits_processor, # add if it have a problem not generate eos token
)
generated_text = tokenizer.decode(generation[0][input_len:], skip_special_tokens=True)
print(generated_text)
Running with llama.cpp
You can also run a quantized version of the model using llama-cpp-python for efficient inference on CPU or GPU.
%pip install llama-cpp-python transformers
import transformers
from llama_cpp import Llama
# Download the GGUF model file
!wget -O './thai-research-gemma-3-27b-it-Q4_K_M.gguf' "https://huggingface.co/nectec/thai-research-gemma-3-27b-it/resolve/main/thai-research-gemma-3-27b-it-Q4_K_M.gguf?download=true"
PROMPT = "You are a helpful Thai assistant. You are Pathumma LLM AI that build by National Electronics and Computer Technology Center (NECTEC)."
llm = Llama(model_path='thai-research-gemma-3-27b-it-Q4_K_M.gguf', n_gpu_layers=-1, n_ctx=4096,verbose=False)
tokenizer = transformers.AutoTokenizer.from_pretrained("nectec/thai-research-gemma-3-27b-it")
memory = [{'content': PROMPT, 'role': 'system'},]
def generate(instuction,memory=memory):
memory.append({'content': instuction, 'role': 'user'})
p = tokenizer.apply_chat_template(
memory,
tokenize=False,
add_generation_prompt=True
)
response = llm(
p,
max_tokens=4096,
temperature=0.4, # ความคิดสร้างสรรค์ถ้าใส่น้อยจะตอบตรงคำถาม ถ้าใส่เยอะสร้างสรรค์แต่อาจจะผิดหรือไม่ตรงประเด็น
repeat_penalty=1.1, # ป้องกันการตอบคำซ้ำ
stop=["<end_of_turn>"]
)
output = response['choices'][0]['text']
memory.append({'content': output, 'role': 'assistant'})
return output
print(generate("ขอสูตรทำส้มตำ",memory=memory))
Citation
If you use this model, please cite the base Gemma model and acknowledge NECTEC's contribution.
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
- Downloads last month
- 87