--- license: apache-2.0 tags: - reinforcement-learning - sentence-similarity - tinyllama --- # TinyLLaMA 1.1B Fine-Tuned This model is a fine-tuned version of TinyLLaMA-1.1B, trained to align generated outputs with semantically similar target embeddings derived from Pinecone-enriched content. ## Use Case Given a context paragraph (from nearest neighbors), it generates responses similar to a specific target paragraph. Reward is computed using cosine similarity of Sentence-BERT embeddings. ## Training Setup - Base model: `TinyLLaMA-1.1B` - Fine-tuning method: SFT - Reward model: `all-MiniLM-L6-v2` - Prompt: single context from `neighbor_contents[0]` ## Limitations This model is optimized for short output completions. It may not generalize well outside the Pinecone-enriched structure used during training.