ICML Spotlight 2026 🏆

DecodeShare: Tracing the Shared Subspace of LLM Decode-Time Decisions

Zishan Shao¹, Lixun Zhang^1,2, Kangning Cui³, Yixiao Wang², Ting Jiang¹, Hancheng Ye¹, Qinsi Wang¹, Zhixu Du¹, Yuzhe Fu¹, Fan Yang³, Danyang Zhuo², Yiran Chen¹, Hai Helen Li¹

¹Duke ECE ²Duke CS ³Wake Forest CS

Paper PDF Code Demo

DecodeShare identifies compact task-general subspaces in KV-cached decode-time hidden states, then tests their causal role with matched decode-only interventions.

DecodeShare steering demo comparing baseline, prefill vector, and decode vector — A compact steering demo comparing baseline generation with prefill-estimated and decode-estimated steering behavior.

Steering Reliability

Decode-time validation is a better proxy for held-out steering utility.

Steering vectors can overlap the task-general decode channel. DecodeShare separates shared and residual components, then compares validation signals against held-out KV-cached decoding behavior.

This matters because many steering workflows choose directions using prefill activations, while deployment decisions are made one token at a time during decoding. DecodeShare makes that mismatch visible and provides a decode-aligned way to test whether a vector is likely to survive real generation.

Prefill can be misleading

A direction that looks useful on the full prompt may not align with the hidden state that actually produces the next token.

Shared channels can interfere

Steering directions sometimes borrow task-general decode structure, so projection is a diagnostic rather than a universal repair.

Decode checks track deployment

Evaluating vectors in the same KV-cached regime used at generation time gives a more realistic selection signal.

Method

Decode-time states are the intervention target.

Modern decoder-only LLM inference separates prompt prefill from single-token decode steps. DecodeShare treats the hidden state used during KV-cached decoding as the decision state, estimates cross-task shared directions there, and removes only that component during decoding.

Collect

Pool decode-time hidden states across tasks and prompt variants.

Estimate

Select directions consistently shared across tasks, not just high-energy PCs.

Intervene

Run decode-only projection removal against dimension- and energy-matched controls.

Core Results

Shared decode subspaces are compact and causally important.

Across models and benchmarks, the discovered subspace occupies a small fraction of the hidden dimension, but removing it changes model decisions much more than matched controls. The main point is not the exact number of dimensions; it is that a compact, task-general channel can matter causally at decode time.

Shared core dimensions across Qwen, Llama, and Falcon models — Compact shared cores appear across model families and pass permutation and scramble tests.

Leave-one-task-out decode ablation accuracy results — Leave-one-task-out decode ablations show that removing the shared subspace hurts held-out tasks more than random controls.

Patchback

The shared subspace carries recoverable decision information.

DecodeShare also tests whether the shared subspace can patch corrupted decisions back toward the baseline answer. Targeted patching succeeds where random-vector, random-subspace, and nonshared-patch controls largely fail, suggesting the channel stores more than incidental variance.

Robustness

The claim is checked against energy and threshold confounds.

The paper varies PCA retention, sharedness thresholds, and matched intervention budgets. These controls keep the focus on decode-shared structure rather than simply selecting the largest or highest-energy directions.

Reproduce

Run smoke checks first, then section-level reruns.

The repository contains lightweight smoke checks and full GPU reproduction wrappers for the main paper sections.

conda env create -f environment.yml
conda activate decodeshare
bash scripts/run_all_smoke_tests.sh

DRY_RUN=1 bash scripts/reproduce_ablation_tables.sh
bash scripts/reproduce_h1_tables.sh
bash scripts/reproduce_table_1_patchback.sh

Citation

Cite DecodeShare

@inproceedings{shao2026decodeshare,
  title     = {DecodeShare: Tracing the Shared Subspace of LLM Decode-Time Decisions},
  author    = {Shao, Zishan and Zhang, Lixun and Cui, Kangning and Wang, Yixiao
               and Jiang, Ting and Ye, Hancheng and Wang, Qinsi and Du, Zhixu
               and Fu, Yuzhe and Yang, Fan and Zhuo, Danyang and Chen, Yiran
               and Li, Hai Helen},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026}
}