TIGER-Lab/Mantis-Instruct
Viewer • Updated • 999k • 3.06k • 42
How to use TIGER-Lab/Mantis-8B-Fuyu with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="TIGER-Lab/Mantis-8B-Fuyu") # Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu")
model = AutoModelForCausalLM.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu")How to use TIGER-Lab/Mantis-8B-Fuyu with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TIGER-Lab/Mantis-8B-Fuyu"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "TIGER-Lab/Mantis-8B-Fuyu",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/TIGER-Lab/Mantis-8B-Fuyu
How to use TIGER-Lab/Mantis-8B-Fuyu with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "TIGER-Lab/Mantis-8B-Fuyu" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "TIGER-Lab/Mantis-8B-Fuyu",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "TIGER-Lab/Mantis-8B-Fuyu" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "TIGER-Lab/Mantis-8B-Fuyu",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use TIGER-Lab/Mantis-8B-Fuyu with Docker Model Runner:
docker model run hf.co/TIGER-Lab/Mantis-8B-Fuyu
Paper | Website | Github | Models | Demo
# This only installs minimum packages (torch, transformers, accelerate) for inference, no redundant packages are installed.
pip install git+https://github.com/TIGER-AI-Lab/Mantis.git
from mantis.models.mllava import chat_mllava
from PIL import Image
import torch
image1 = "image1.jpg"
image2 = "image2.jpg"
images = [Image.open(image1), Image.open(image2)]
# load processor and model
from mantis.models.mfuyu import MFuyuForCausalLM, MFuyuProcessor
processor = MFuyuProcessor.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu")
attn_implementation = None # or "flash_attention_2"
model = MFuyuForCausalLM.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu", device_map="cuda", torch_dtype=torch.bfloat16, attn_implementation=attn_implementation)
generation_kwargs = {
"max_new_tokens": 1024,
"num_beams": 1,
"do_sample": False,
"pad_token_id": processor.tokenizer.eos_token_id,
}
# chat
text = "Describe the difference of <image> and <image> as much as you can."
response, history = chat_mllava(text, images, model, processor, **generation_kwargs)
print("USER: ", text)
print("ASSISTANT: ", response)
text = "How many wallets are there in image 1 and image 2 respectively?"
response, history = chat_mllava(text, images, model, processor, history=history, **generation_kwargs)
print("USER: ", text)
print("ASSISTANT: ", response)
"""
USER: Describe the difference of <image> and <image> as much as you can.
ASSISTANT: The second image has more variety in terms of colors and designs. While the first image only shows two brown leather pouches, the second image features four different pouches in various colors and designs, including a purple one with a gold coin, a red one with a gold coin, a black one with a gold coin, and a brown one with a gold coin. This variety makes the second image more visually interesting and dynamic.
USER: How many wallets are there in image 1 and image 2 respectively?
ASSISTANT: There are two wallets in image 1, and four wallets in image 2.
"""
See mantis/train for details
See mantis/benchmark for details
@inproceedings{Jiang2024MANTISIM,
title={MANTIS: Interleaved Multi-Image Instruction Tuning},
author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen},
publisher={arXiv2405.01483}
year={2024},
}
Base model
adept/fuyu-8b