Update README.md
This commit is contained in:
parent
227dbb1532
commit
f261645e5f
1 changed files with 1 additions and 0 deletions
|
|
@ -414,6 +414,7 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d
|
|||
- Training Large Language Models to Reason in a Continuous Latent Space: https://arxiv.org/abs/2412.06769
|
||||
- Relaxed Recursive Transformers — Effective Parameter Sharing with Layer-wise LoRA: https://arxiv.org/pdf/2410.20672
|
||||
- Mixture-of-Depths Attention: https://arxiv.org/abs/2603.15619
|
||||
- Hyperloop Transformers: https://arxiv.org/abs/2604.21254
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue