Releases: LMCache/lmcache-vllm
Releases · LMCache/lmcache-vllm
v0.6.2.3
What's Changed
- Fix async store problem by @YaoJiayi in #36
- Drop special tokens. by @XbzOnGit in #33
- Add Docker Build Files by @qyy2003 in #35
- Support Turing GPU by @Second222None in #40
- Fix online multi-turn decode cache saving by @YaoJiayi in #41
- Bump version number to 0.6.2.3 by @ApostaC in #46
New Contributors
- @qyy2003 made their first contribution in #35
- @Second222None made their first contribution in #40
Full Changelog: v0.6.2.2...v0.6.2.3
v0.6.2.2
New version: v0.6.2.2
Compatibility:
- vLLM:
0.6.1.post2
,0.6.2
- LMCache:
0.1.3
Key Features
- Supporting chunked prefill in vLLM
- Faster KV loading for multi-turn conversation by saving KV at the decoding time
- Experimental KV blending feature to enable reusing non-prefix KV caches
- New model support: llama-3.1 and qwen-2
What's Changed
- typo fix on retrive --> retrieve by @KuntaiDu in #14
- bugfix by @YaoJiayi in #15
- Bugfix by @YaoJiayi in #16
- Support saving decode cache by @YaoJiayi in #17
- vllm internal prefix cache compatibility by @XbzOnGit in #19
- Supprt chunk prefill by @YaoJiayi in #18
- [Refactor] Add support for "dtype" (KV cache storage data type) in LMCacheEngineMetadata by @Alex-q-z in #10
- Fix TP by @YaoJiayi in #22
- Add model specific patches by @YaoJiayi in #23
- Fix store's compatibility with suffix prefill by @YaoJiayi in #24
- Cacheblend integration by @ApostaC in #13
- Reduce memory copy on store by @XbzOnGit in #21
- Fix bug for chunk prefill and vllm internal prefix caching by @YaoJiayi in #27
- Optimize chunk prefill performance by @YaoJiayi in #30
- Bump version number to 0.6.2.2, working with vllm 0.6.2 + lmcache 0.1.3 by @ApostaC in #31
New Contributors
Full Changelog: v0.6.2.post1...v0.6.2.2
v0.6.2.post1
v0.1.1-alpha
Tagging with LMCache v0.1.1-alpha