Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.06425

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 24
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

Papers Storm 🌪️

A curated collection of research papers referenced in Panoram'IA program, offering a comprehensive resource for further exploration.

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Generative World Explorer

Paper • 2411.11844 • Published Nov 18, 2024 • 77
Video Depth without Video Models

Paper • 2411.19189 • Published Nov 28, 2024 • 39
Mobile Video Diffusion

Paper • 2412.07583 • Published Dec 10, 2024 • 20

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 166
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18 • 17

ML Optimization Papers

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 27
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Paper • 2501.06842 • Published Jan 12 • 16
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 52

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 47
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 148

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90

impactful-papers

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Paper • 2311.11077 • Published Nov 18, 2023 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 55
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

Model architecture

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 24
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 47
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 148

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90

Papers Storm 🌪️

A curated collection of research papers referenced in Panoram'IA program, offering a comprehensive resource for further exploration.

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Generative World Explorer

Paper • 2411.11844 • Published Nov 18, 2024 • 77
Video Depth without Video Models

Paper • 2411.19189 • Published Nov 28, 2024 • 39
Mobile Video Diffusion

Paper • 2412.07583 • Published Dec 10, 2024 • 20

impactful-papers

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Paper • 2311.11077 • Published Nov 18, 2023 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 55
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 66

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 166
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18 • 17

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

ML Optimization Papers

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 27
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Paper • 2501.06842 • Published Jan 12 • 16
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 52

Model architecture

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs