You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using ./main with the --grammar flag, llama.cpp successfully generates an output according to the grammar string.
It is expected that this behavior transfers to ./parallel as well.
Current Behavior
./parallel <args> ... --grammar <grammar_string> doesn't respect the grammar, so llama.cpp generates free-form text.
Environment and Context
MacBook Pro, M1 Pro chip, macOS Sonoma
Operating System, e.g. for Linux:
$ uname -a
Darwin <my_username>.local 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64
SDK version, e.g. for Linux:
$ python3 --version
$ make --version
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for i386-apple-darwin11.3.0
$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Failure Information (for bugs)
I'm not sure if it's related, but I noticed parallel decoding treats each line of the prompt as a separate prompt (for separate sequences).
Also, parallel decoding seems to take place ina chat settings, not completion settings.
Steps to Reproduce
For example, try this:
./parallel --prompt 'What's your favorite number?' --in-prefix '' --in-suffix '' --model <model_path> --ctx-size 8192 --color --n-predict 128 --keep 0 --temp 0.8 --repeat-penalty 1.1 --repeat-last-n 64 --grammar '# `root` specifies the pattern for the overall output
root ::= (
value
)
value ::= "1" | "2" | "3"
' --parallel 1 --sequences 1 --threads 10 --n-gpu-layers 128 --main-gpu 0
The text was updated successfully, but these errors were encountered:
Expected Behavior
When using
./main
with the--grammar
flag, llama.cpp successfully generates an output according to the grammar string.It is expected that this behavior transfers to
./parallel
as well.Current Behavior
./parallel <args> ... --grammar <grammar_string>
doesn't respect the grammar, so llama.cpp generates free-form text.Environment and Context
MacBook Pro, M1 Pro chip, macOS Sonoma
$ uname -a
Darwin <my_username>.local 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64
Failure Information (for bugs)
I'm not sure if it's related, but I noticed
parallel
decoding treats each line of the prompt as a separate prompt (for separate sequences).Also,
parallel
decoding seems to take place ina chat settings, not completion settings.Steps to Reproduce
For example, try this:
The text was updated successfully, but these errors were encountered: