Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
SeanLee97 
posted an update 17 days ago
Post
8144
Our lab recently released a paper where we introduce ShadowPEFT, a new Parameter-Efficient Fine-Tuning (PEFT) paradigm tailored for edge computing scenarios.

Unlike traditional approaches such as LoRA and its variants, which inject trainable parameters directly into the weights of Transformer, requiring tight coupling with the backbone.

ShadowPEFT instead enhances the frozen large base model by adding a lightweight, centralized, pretrainable, and detachable Shadow network.
This shadow network operates in parallel with the base model, delivering learned corrections to each decoder layer. Because the shadow module is architecturally decoupled from the backbone, it can be independently trained, stored, and deployed, benefiting edge computing scenarios and edge-cloud collaboration computing.

- HF Paper: ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning (2604.19254)
- GitHub: https://github.com/ShadowLLM/shadow-peft
- HF Collection: https://huggingface.co/collections/shadow-llm/shadow-peft-models

This sounds really cool, definitely adding it to my weekend experiment list (kudos for providing the github repo as well).

Also... ShadowPEFT? Badass name.

·

Thx for following! @cahlen

For reproduction, we public the scripts and hyperparameters used in the paper:
https://github.com/ShadowLLM/shadow-peft/blob/main/experiment/best_script.md

Here is a playground that shows how the detached and attached shadow work:
https://github.com/ShadowLLM/shadow-peft/blob/main/examples/robot_intent_playground.ipynb

这项工作的核心出发点,是重新思考当前主流 PEFT 方法的适配方式。以 LoRA 为代表的方法,通常通过在多个线性层中注入彼此独立的低秩更新来实现下游适配;从机制上看,这是一种相对分散的、局部的 weight-space parameterization。在这项工作中,我们尝试探索另一种可能:将适配过程从分布式的权重扰动,转向集中式的 layer-level representation refinement。

基于这一想法,我们提出了 ShadowPEFT。该框架在冻结 backbone 的基础上,引入一个可跨层复用的 shadow network,并在 层深方向维护一个并行演化的 shadow state。在每一层中,模型通过 Shadow Injection、Base Encoding 与 Shadow Update 三个步骤,对 backbone 的隐藏表示进行持续 refinement。相比于传统低秩方法对局部权重的独立修正,ShadowPEFT 更强调一种共享、状态化、跨层协调的适配机制。

实验结果表明,在 Qwen3 0.6B / 4B / 8B 等不同规模 backbone 上,ShadowPEFT 在与 LoRA / DoRA 可比的训练参数预算下,取得了具有竞争力、并在平均指标上更优的性能表现。更重要的是,由于 shadow 模块与 backbone 在结构上是解耦的,它不仅能够参与 attached 模式下的完整推理,也支持 detached deployment,从而为 edge/cloud 场景下的灵活部署提供了新的可能。

我们也进一步考察了 shadow pretraining 的作用。结果显示,当为 Qwen3 8B 配置一个经过预训练的 0.5B shadow model 时,整体性能能够进一步提升;同时,该 shadow 在 detached 模式下仍保留了较强的独立能力。这一现象说明,shadow 模块并不仅仅是一个附着式的适配器,更可以被视作一种可迁移、可复用、可独立部署的功能性适配单元。

此外,在参数规模扩展实验中,我们观察到 ShadowPEFT 对额外参数容量的利用方式,与传统低秩 PEFT 方法存在明显差异:相较于 LoRA 的相对平稳和 DoRA 在更高参数规模下的退化趋势,ShadowPEFT 能够更稳定地从更大的 shadow 模块中获益。这也提示我们,PEFT 的能力扩展未必只能依赖 rank increase,也可以通过集中式功能模块的扩展来实现。

总体而言,这项工作希望说明:
PEFT 不仅可以被理解为轻量级参数注入,也可以被设计为一种模块化、状态化、可拆卸的函数级适配机制。

This is very interesting. Can we create multiple shadow layers and attach the appropriate layer when needed and detach it after?

·

Theoretically, it can. But the current version does not support multiple shadows.

Oh very wonderful work! Nice work guys

here is a demo of ShadowPEFT deployment on #Unitree Go2 dog. With a 0.5B shadow model deployed, the dog can understand commands with NVIDIA Jetson Orin GPU and perform actions within two seconds.
new option for #embodied #AI.
(speech recognition done by iPhone)