geodesic-research 's Collections

Self-Fulfilling (Mis)alignment: Post-Trained Models

Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.