Instructions to use tianyang/lemur-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tianyang/lemur-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tianyang/lemur-7B")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("tianyang/lemur-7B") model = AutoModelForMultimodalLM.from_pretrained("tianyang/lemur-7B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tianyang/lemur-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tianyang/lemur-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tianyang/lemur-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tianyang/lemur-7B
- SGLang
How to use tianyang/lemur-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tianyang/lemur-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tianyang/lemur-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tianyang/lemur-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tianyang/lemur-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tianyang/lemur-7B with Docker Model Runner:
docker model run hf.co/tianyang/lemur-7B
Lemur π¦₯
Lemur is a chatbot model based on the LLaMA model, further fine-tuned using LoRA on several openly available datasets (Alpaca-GPT4, Baize, ShareGPT and Vicuna-Dummy-Conversation).
Note: Lemur is specifically designed and trained for the exclusive purpose of our final project for CSE 256 Statistical Natural Lang Processing at UCSD. It is not intended for any commercial usage or widespread deployment. Testing and exploration are permitted, but we request that you limit your use to this purpose only. Please respect these guidelines to maintain the integrity of our project and its intended use.
Warning: The usage of ShareGPT data might pose copyright issues, which are currently under dispute. If you are considering the use of this model, we strongly advise you to proceed with caution.
Demo on π€ Huggingface Space
https://huggingface.co/spaces/tianyang/lemur-7B
It is running on a 16GB CPU (free π€ Huggingface Spaces), with really slow inferecing speed. (~30s for processing a single question, and ~1s per subsequent token)
Format
MSG = Hi, how are you?
prompt = f"""A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
[Human]: {MSG}
[AI]:"""
Notice
Lemur is specifically designed and trained for the exclusive purpose of our final project. It is not intended for commercial usage or widespread deployment. Testing and exploration are permitted, but we request that you limit your use to this purpose only. Please respect these guidelines to maintain the integrity of our project and its intended use.
- Downloads last month
- 4