Update README.md
This commit is contained in:
parent
f261645e5f
commit
8c68c1fcc7
1 changed files with 1 additions and 0 deletions
|
|
@ -415,6 +415,7 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d
|
|||
- Relaxed Recursive Transformers — Effective Parameter Sharing with Layer-wise LoRA: https://arxiv.org/pdf/2410.20672
|
||||
- Mixture-of-Depths Attention: https://arxiv.org/abs/2603.15619
|
||||
- Hyperloop Transformers: https://arxiv.org/abs/2604.21254
|
||||
- The Recurrent Transformer: Greater Effective Depth and Efficient Decoding: https://arxiv.org/abs/2604.21215
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue