/parallel often produces truncated outputs #5601

k-gyuhak · 2024-02-20T00:48:38Z

I encountered an unexpected behavior when running in the following command:

./parallel -m ./models/llama_7b/llama-2-7b/ggml-model-f16.gguf -t 1 -ngl 100 -c 4096 -b 512 -s 1 -np 8 -ns 128 -n 100 -cb,

following the instruction for Serving multiple clients with parallel decoding and continuous batching (#3749 (comment)).

The model truncates the outputs for some prompts. For instance, the model stoped generating outputs after "... for gettting started:" as shown in the image below:

A similar behavior is observed with mixtral-8x7b-instruct using the following command:

./parallel -m ./models--mistralai--Mixtral-8x7B-Instruct-v0.1/ggml-model-Q4_K_M.gguf -c 8192 -ngl 100 -f ~/data1_10.txt -n 2000 --temp 0.5 --top-p 0.9 --color --in-prefix "[INST]" --in-suffix "[/INST]" -b 8192 -np 2 -ns 10 -cb -t 1.

As shown in the image below, the model truncates output after "... The transcript '".

This behavior disappears when I provide the same prompt to ./main.

I am using 4 A100s.

The text was updated successfully, but these errors were encountered:

slaren · 2024-02-20T01:02:34Z

The parallel example has a few hard-coded stop strings, including the new line. You are also limiting the sequences to 100 tokens with -n 100.

https://github.com/ggerganov/llama.cpp/blob/633782b8d949f24b619e6c68ee37b5cc79167173/examples/parallel/parallel.cpp#L357-L361

k-gyuhak · 2024-02-20T01:09:37Z

The parallel example has a few hard-coded stop strings, including the new line. You are also limiting the sequences to 100 tokens with -n 100.

https://github.com/ggerganov/llama.cpp/blob/633782b8d949f24b619e6c68ee37b5cc79167173/examples/parallel/parallel.cpp#L357-L361

The behavior still persists with large -n values (e.g., -n 5000). Is there a way to specify stop strings in the model response to prevent the truncation?

github-actions · 2024-04-06T01:06:09Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

k-gyuhak added the bug-unconfirmed label Feb 20, 2024

k-gyuhak closed this as completed Feb 20, 2024

k-gyuhak reopened this Feb 20, 2024

github-actions bot added the stale label Mar 22, 2024

github-actions bot closed this as completed Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/parallel often produces truncated outputs #5601

/parallel often produces truncated outputs #5601

k-gyuhak commented Feb 20, 2024

slaren commented Feb 20, 2024

k-gyuhak commented Feb 20, 2024

github-actions bot commented Apr 6, 2024

/parallel often produces truncated outputs #5601

/parallel often produces truncated outputs #5601

Comments

k-gyuhak commented Feb 20, 2024

slaren commented Feb 20, 2024

k-gyuhak commented Feb 20, 2024

github-actions bot commented Apr 6, 2024