Instructions to use ToddGoldfarb/Cadet-Medium with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ToddGoldfarb/Cadet-Medium with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ToddGoldfarb/Cadet-Medium")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("ToddGoldfarb/Cadet-Medium")
model = AutoModelForSeq2SeqLM.from_pretrained("ToddGoldfarb/Cadet-Medium")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ToddGoldfarb/Cadet-Medium with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ToddGoldfarb/Cadet-Medium"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToddGoldfarb/Cadet-Medium",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ToddGoldfarb/Cadet-Medium

SGLang

How to use ToddGoldfarb/Cadet-Medium with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ToddGoldfarb/Cadet-Medium" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToddGoldfarb/Cadet-Medium",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ToddGoldfarb/Cadet-Medium" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToddGoldfarb/Cadet-Medium",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ToddGoldfarb/Cadet-Medium with Docker Model Runner:
```
docker model run hf.co/ToddGoldfarb/Cadet-Medium
```

YAML Metadata Warning:The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

What is Cadet-Medium?

Inspired by Allen AI's Cosmo-XL, Cadet-Medium is a somewhat small conversational model trained off of the SODA dataset. Cadet-Medium is intended for inference at the edge (on something as small as a 2GB RAM Raspberry Pi).

Cadet-Medium is trained off of the t5-base pretrained model from Google.

If you have any questions, or any comments on improvements, please contact me at: [email protected]

Google Colab Link

Here is the link to the Google Colab file, where I walk through the process of training the model and using the SODA public dataset from AI2.

https://colab.research.google.com/drive/1uekZ0gO3GqjPwno16tV1A4Gitrl7p3ur?usp=sharing

Get Started With Cadet-Medium

Use the code snippet below to get started with Cadet-Medium!

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import colorful as cf

cf.use_true_colors()
cf.use_style('monokai')
class CadetMedAgent:
    def __init__(self):
        print(cf.bold | cf.purple("Waking up Cadet-Medium..."))
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.tokenizer = AutoTokenizer.from_pretrained("t5-base", model_max_length=512)
        self.model = AutoModelForSeq2SeqLM.from_pretrained("ToddGoldfarb/Cadet-Medium", low_cpu_mem_usage=True).to(self.device)
        self.conversation_history = ""

    def observe(self, observation):
        self.conversation_history = self.conversation_history + observation
        # The number 400 below is just a truncation safety net. It leaves room for 112 input tokens.
        if len(self.conversation_history) > 400:
            self.conversation_history = self.conversation_history[112:]

    def set_input(self, situation_narrative="", role_instruction=""):
        input_text = "dialog: "

        if situation_narrative != "":
            input_text = input_text + situation_narrative

        if role_instruction != "":
            input_text = input_text + " <SEP> " + role_instruction

        input_text = input_text + " <TURN> " + self.conversation_history

        # Uncomment the line below to see what is fed to the model.
        # print(input_text)

        return input_text

    def generate(self, situation_narrative, role_instruction, user_response):
        user_response = user_response + " <TURN> "
        self.observe(user_response)

        input_text = self.set_input(situation_narrative, role_instruction)

        inputs = self.tokenizer([input_text], return_tensors="pt").to(self.device)
        
        # I encourage you to change the hyperparameters of the model! Start by trying to modify the temperature.
        outputs = self.model.generate(inputs["input_ids"], max_new_tokens=512, temperature=1, top_p=.95,
                                      do_sample=True)
        cadet_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
        added_turn = cadet_response + " <TURN> "
        self.observe(added_turn)

        return cadet_response

    def reset_history(self):
        self.conversation_history = []

    def run(self):
        def get_valid_input(prompt, default):
            while True:
                user_input = input(prompt)
                if user_input in ["Y", "N", "y", "n"]:
                    return user_input
                if user_input == "":
                    return default

        while True:
            continue_chat = ""

            # MODIFY THESE STRINGS TO YOUR LIKING :)
            situation_narrative = "Imagine you are Cadet-Medium talking to ???."
            role_instruction = "You are Cadet-Medium, and you are talking to ???."

            self.chat(situation_narrative, role_instruction)
            continue_chat = get_valid_input(cf.purple("Start a new conversation with new setup? [Y/N]:"), "Y")
            if continue_chat in ["N", "n"]:
                break

        print(cf.blue("CM: See you!"))

    def chat(self, situation_narrative, role_instruction):
        print(cf.green(
            "Cadet-Medium is running! Input [RESET] to reset the conversation history and [END] to end the conversation."))
        while True:
            user_input = input("You: ")
            if user_input == "[RESET]":
                self.reset_history()
                print(cf.green("[Conversation history cleared. Chat with Cadet-Medium!]"))
                continue
            if user_input == "[END]":
                break
            response = self.generate(situation_narrative, role_instruction, user_input)
            print(cf.blue("CM: " + response))


def main():
    print(cf.bold | cf.blue("LOADING MODEL"))

    CadetMed = CadetMedAgent()
    CadetMed.run()


if __name__ == '__main__':
    main()

Citations and Special Thanks

Special thanks to Hyunwoo Kim for discussing with me the best way to use the SODA dataset. If you haven't looked into their work with SODA, Prosocial-Dialog, or COSMO, I recommend you do so! As well, read the paper on SODA! The article is listed below.

@article{kim2022soda,
    title={SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization},
    author={Hyunwoo Kim and Jack Hessel and Liwei Jiang and Peter West and Ximing Lu and Youngjae Yu and Pei Zhou and Ronan Le Bras and Malihe Alikhani and Gunhee Kim and Maarten Sap and Yejin Choi},
    journal={ArXiv},
    year={2022},
    volume={abs/2212.10465}
}

Downloads last month: 5

ToddGoldfarb
/

Cadet-Medium

What is Cadet-Medium?

Google Colab Link

Get Started With Cadet-Medium

Citations and Special Thanks

Dataset used to train ToddGoldfarb/Cadet-Medium

Space using ToddGoldfarb/Cadet-Medium 1