---
license: apache-2.0
tags:
- reinforcement-learning
- sentence-similarity
- tinyllama
---

# TinyLLaMA 1.1B Fine-Tuned

This model is a fine-tuned version of TinyLLaMA-1.1B, trained to align generated outputs with semantically similar target embeddings derived from Pinecone-enriched content. 

## Use Case

Given a context paragraph (from nearest neighbors), it generates responses similar to a specific target paragraph. Reward is computed using cosine similarity of Sentence-BERT embeddings.

## Training Setup

- Base model: `TinyLLaMA-1.1B`
- Fine-tuning method: SFT
- Reward model: `all-MiniLM-L6-v2`
- Prompt: single context from `neighbor_contents[0]`

## Limitations

This model is optimized for short output completions. It may not generalize well outside the Pinecone-enriched structure used during training.