DRAM Requirement for DeepSeek-R1 Q4_K_M (OOM on DRAM) #300
-
Hello, developing team & community. I'm trying to load DeepSeek-R1 Q4_K_M ggufs using ktransformers v0.2 (0.2.0+cu121torch24fancy). Here is the setup of my server: And here's the startup command which leads to OOM of DRAM: python ./ktransformers/local_chat.py --model_path deepseek-ai/Deepseek-R1 --gguf_path /LLM/models/unsloth/Deepseek-R1-Q4_K_M/ --cpu_infer 30 --max_new_tokens 1000 --optimize_rule_path DeepSeek-V3-Chat.yaml During the process of loading ggufs, the memory usage continuously increased to over 500 GB, resulting in the process being killed, while the GPU memory usage increased more slowly. Is this due to insufficient system DRAM? What is the approximate memory requirement for Q4_K_M under such setup? Would using four GPUs reduce the memory demand? Thank you for any help. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
same question: #375 |
Beta Was this translation helpful? Give feedback.
same question: #375