30 14 4

Kalle Hilsenbek

Bachstelze

https://bachstelze.gitlab.io/multisource/

Bachstelze

AI & ML interests

Combining BERT with instructions for explainable AI: gitlab.com/Bachstelze/instructionbert

Recent Activity

updated a dataset about 2 hours ago

Bachstelze/ethical_coconot_6pack_care

published a model about 2 hours ago

Bachstelze/ethical_coconot_6pack_care

new activity about 2 hours ago

allenai/coconot:Response cut

View all activity

Organizations

None yet

New activity in allenai/coconot about 2 hours ago

Response cut

#3 opened about 3 hours ago by

Bachstelze

commented a paper 3 months ago

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 75 •

commented a paper 7 months ago

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29 • 32 •

commented a paper 8 months ago

A Refined Analysis of Massive Activations in LLMs

Paper • 2503.22329 • Published Mar 28 • 14 •

commented a paper 10 months ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published Feb 16 • 23 •

commented a paper 11 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123 •

New activity in answerdotai/ModernBERT-base 11 months ago

ModernBART wen?

👍 3

#38 opened 11 months ago by

Fizzarolli

New activity in HuggingFaceTB/SmolLM2-360M-Instruct about 1 year ago

Adding Evaluation Results

#6 opened about 1 year ago by

leaderboard-pr-bot

commented 4 papers about 1 year ago

Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9, 2024 • 8 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179 •

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published Sep 25, 2024 • 25 •

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 29 •

commented 8 papers over 1 year ago

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Paper • 2408.15079 • Published Aug 27, 2024 • 54 •

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 42 •

Better Alignment with Instruction Back-and-Forth Translation

Paper • 2408.04614 • Published Aug 8, 2024 • 16 •

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 79 •

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 27 •