Model Card - LFM2-1.2B (PEFT, Legal Domain)
PEFT/LoRA of LiquidAI/LFM2-1.2B for legal multiple-choice tasks from LexGLUE (CaseHOLD, ECtHR-A, ECtHR-B). The model is part of the thesis Exploring Knowledge Boundaries of LLMs in Specialized Domains, where layerwise entropy is proposed and analyzed as an internal uncertainty probe.
- Author: Ren Jeik Ong
- Layerwise entropy code: GitHub link
Summary
This model specializes a general instruction-tuned LLM to legal MCQ formats using PEFT/LoRA. It is evaluated on held-out test sets for CaseHOLD (accuracy) and ECtHR-A/B (macro-F1). The thesis reports consistent gains vs. the base model and analyzes internal uncertainty patterns across depth using layerwise entropy.
Headline results (test set):
- CaseHOLD: 0.5920 accuracy (baseline 0.3890)
- ECtHR-A: 0.5034 macro-F1 (baseline 0.0641)
- ECtHR-B: 0.5151 macro-F1 (baseline 0.0762)
Intended Use & Scope
- Direct use: legal MCQ-style inference following the standardized prompt/option interface described below.
- Out-of-scope: open-ended legal advice, factual retrieval without verification, or non-MCQ tasks; layerwise-entropy tooling is a research probe, not a safety guarantee.
Data & Tasks
Benchmark: LexGLUE subsets
- CaseHOLD (single-choice, 5 options A-E) - choose the correct legal holding.
- ECtHR-A / ECtHR-B (multi-label, options A-J map to ECHR articles) - choose "select all that apply".
Training Details
Method: LoRA via LiquidAI on LFM2-1.2B. Targets: GLU projections (w1, w2, w3), multi-head attention (q_proj, k_proj, v_proj, out_proj), and conv projections (in_proj, out_proj).
Hyperparameters:
- LoRA: rank r=16, α=16, dropout=0.05; bias=none.
- Optimizer: 8-bit AdamW; lr=5e-5, wd=0.01, cosine-with-restarts, warmup=0.0.
- Epochs: 3 epochs.
- Batching: per-device batch 2, grad-accum 5.
- Precision: bf16 where supported; gradient checkpointing.
- Max context: 4,096 tokens; packing disabled.
Hardware: a single NVIDIA GeForce 3080 Ti; Ubuntu; automatic mixed precision.
Validation/checkpoints: eval & save every 200 steps, keep last 2, select best by val loss.
Layerwise Entropy (Research Context)
The thesis introduces layerwise entropy/varentropy as internal probes to distinguish known vs. unknown inputs and relate mid-to-late layer uncertainty to downstream accuracy/F1.
Note: These probes are analysis tools, not required for normal inference.
Risks, Biases & Limitations
- Trained/evaluated on English legal corpora; domain shift and jurisdictional differences may reduce reliability.
- Out-of-distribution prompts can degrade performance; consider abstention/deferral when uncertainty is high.
- This checkpoint is not a substitute for professional legal advice.
Citation
If you use this model, please cite the thesis:
@mastersthesis{ong2025layerwise-entropy,
title = {Exploring Knowledge Boundaries of LLMs in Specialized Domains},
author = {Ren Jeik Ong},
school = {Technical University of Munich},
year = {2025},
type = {Master's Thesis},
address = {Munich, Germany},
url = {https://github.com/ongxx107/layerwise-entropy}
}
License
MIT (model card & training code in this repo). Pretrained base model and datasets retain their original licenses.
- Downloads last month
- 10
Model tree for kyloren1989/LFM2-1.2B-lexglue
Base model
LiquidAI/LFM2-1.2B