FairSteer BAD Classifier (Secure)

Biased Activation Detection (BAD) classifier optimized for TinyLlama-1.1B. This model detects whether an LLM's internal activation indicates biased reasoning.

This repository contains only SafeTensors weights for security.

Model Details

  • Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  • Target Layer: 14
  • Architecture: Linear Probe (Dropout -> Linear)
  • Performance: 67.90% Balanced Accuracy

Artifacts

  • model.safetensors: Weights (SafeTensors only)
  • scaler.pkl: StandardScaler (Required for inference preprocessing)
  • config.json: Architecture configuration

Usage (FairSteer)

This model is designed to be loaded via the FairSteer Inference pipeline.

Downloads last month
138
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support