Microsoft Research’s Post

View organization page for Microsoft Research, graphic

279,591 followers

YOCO is a novel decoder-decoder architecture for LLMs, enhancing memory efficiency by caching key-value pairs only once. YOCO markedly reduces KV cache memory and prefilling time by orders of magnitude. YOCO makes 1M-length LLMs practical. https://msft.it/6040YnEVM

  • Overview of the YOCO decoder-decoder architecture. Self-decoder generates the global KV cache.
Then cross-decoder employs cross-attention to reuse the shared KV caches. Both self-decoder and
cross-decoder use causal masking. The overall architecture behaves like a decoder-only Transformer,
autoregressively generating tokens.
Berowne Hlavaty

Senior Quant Analyst at J.P. Morgan

2d

Innovative, but I do wonder what happens when a word late in the text changes the context of an earlier word. example: "I was at the farm then I went to the store and bought an apple but I was disappointed when I found the M4 chip was only available in the ipad, and they still don't offer touch screens on laptops."

Like
Reply
Ömer A.

Technical Lead Generative AI / Senior Data Scientist / AI Consultant at Lufthansa Group

4w

Jon-Paul Boyd

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics