Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grammar doesn't work in parallel decoding even when np = 1 #3650

Closed
ibehnam opened this issue Oct 17, 2023 · 1 comment
Closed

grammar doesn't work in parallel decoding even when np = 1 #3650

ibehnam opened this issue Oct 17, 2023 · 1 comment

Comments

@ibehnam
Copy link
Contributor

ibehnam commented Oct 17, 2023

Expected Behavior

When using ./main with the --grammar flag, llama.cpp successfully generates an output according to the grammar string.

It is expected that this behavior transfers to ./parallel as well.

Current Behavior

./parallel <args> ... --grammar <grammar_string> doesn't respect the grammar, so llama.cpp generates free-form text.

Environment and Context

MacBook Pro, M1 Pro chip, macOS Sonoma

  • Operating System, e.g. for Linux:

$ uname -a

Darwin <my_username>.local 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64

  • SDK version, e.g. for Linux:
$ python3 --version
$ make --version

GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

$ g++ --version

Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Failure Information (for bugs)

I'm not sure if it's related, but I noticed parallel decoding treats each line of the prompt as a separate prompt (for separate sequences).

Also, parallel decoding seems to take place ina chat settings, not completion settings.

Steps to Reproduce

For example, try this:

./parallel --prompt 'What's your favorite number?' --in-prefix '' --in-suffix '' --model <model_path> --ctx-size 8192 --color --n-predict 128 --keep 0 --temp 0.8 --repeat-penalty 1.1 --repeat-last-n 64 --grammar '# `root` specifies the pattern for the overall output
root ::= (
    value
)

value ::= "1" | "2" | "3"
' --parallel 1 --sequences 1 --threads 10 --n-gpu-layers 128 --main-gpu 0
@ggerganov
Copy link
Member

This is fixed in #3624 - pending merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants