Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include DeepSeek's reasoning_content when present #592

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

z80dev
Copy link

@z80dev z80dev commented Jan 30, 2025

Creates a separate backend for DeepSeek that inherits from the (mostly compatible) OpenAI backend.

Adds parsing and formatting of reasoning content, and a constructor argument for the backend to configure reasoning inclusion.

@karthink
Copy link
Owner

karthink commented Jan 30, 2025

@z80dev Thanks for the PR!

I have some suggestions to simplify the parser functions, I'll explain when I next have some time.

For now I wanted to mention some thoughts about the show-reasoning configuration. I'm going out on a limb here:

  • I expect that the popularity of deepseek-r1 means that we will soon see many more models that emit chains of thought/reasoning blocks. This might become a semi-standard feature of LLMs.
  • This might also be something that users may want to hide/show on demand. For example, they might want it to be hidden by default, but easily toggle-able before/after a request. (Yes, toggling after a request can work too.)

Making the switch part of the backend definition makes it difficult to do this. Essentially you have to run

(setf (gptel-backend-show-reasoning (gptel-get-backend "Deepseek")) t)

Considering the above points, perhaps it makes sense to make this configurable at the gptel level instead of the gptel-backend level? Perhaps via a switch like gptel-include-reasoning, or gptel-include-chain-of-thought, similar to how we have gptel-include-tool-results or gptel-stream.

@pirminj
Copy link
Contributor

pirminj commented Jan 30, 2025

Regarding the UI, I agree that a quick toggle option is needed. In my testing I often find the reasoning output interesting but not really relevant for the work, therefore showing a side buffer *gptel-reasoning* on demand might me a better solution.

@z80dev
Copy link
Author

z80dev commented Jan 30, 2025

Good idea! I first started implementing with a separate variable (scoped only to deepseek), but then migrated to backend-level config for this because sometimes I wanted one backend to show reasoning, and another not showing reasoning.

For this to work with a global toggle, I would have to be flipped each time I make a request which is not ideal

and yes the point about having reasoning in the main gptel buffer or in a side buffer is another good point. Another option to consider is having the reasoning be put under a sub-heading (so #### Reasoning in markdown format) and having that heading be collapsed by default, and a separate, uncollapsed sub-heading (#### Response) for the main response.. I haven't checked if this is something that's actually possible but I'd be surprised if it wasn't

all this shows how differently our use-cases can be, and we should prioritize flexibility to not get in developer's way

I will start with refactoring the toggle to be a variable instead of part of the backend config, and make it so it can also be set per-buffer via the transient. this supports my use-case I mentioned above (some backends I want the reasoning to show, others not) without making it part of the backend config.

then I will look into "reasoning output destinations" like a collapsed or uncollapsed heading, or a separate buffer, or just in-place

@karthink
Copy link
Owner

...sometimes I wanted one backend to show reasoning, and another not showing reasoning.

For this to work with a global toggle, I would have to be flipped each time I
make a request which is not ideal

I'm not sure what you are comparing this with, because it's the same amount of work in both approaches, but a gptel-level toggle can be more versatile.

  • If it's part of the backend you have to switch backends, possibly buffer-locally.
  • If it's a gptel option you have flip that switch, possibly buffer-locally.

The advantage of the latter is that you can additionally use a single backend both with and without reasoning included. In the former you'd have to define two identical deepseek backends (but for the show-reasoning property) to be able to do that.

and yes the point about having reasoning in the main gptel buffer or in a side
buffer is another good point. Another option to consider is having the reasoning
be put under a sub-heading (so #### Reasoning in markdown format) and having
that heading be collapsed by default, and a separate, uncollapsed sub-heading
(#### Response) for the main response.. I haven't checked if this is something
that's actually possible but I'd be surprised if it wasn't

You can do exactly this with a post response hook. See my comment here.

I will start with refactoring the toggle to be a variable instead of part of the
backend config, and make it so it can also be set per-buffer via the
transient.

You just have to use the gptel-lisp-variable class in the transient infix, the rest is taken care of. Also, I think most people miss the fact that you can also set these options per request, and not just buffer-locally.

then I will look into "reasoning output destinations" like a collapsed or
uncollapsed heading, or a separate buffer, or just in-place

Perhaps gptel-include-reasoning can have three values, t, buffer and nil to determine if the chain of thought should be shown with the response, in a separate buffer or not shown at all. I'm not sure yet if a fourth value (hide or collapse) to include+collapse the CoT is necessary.

@karthink
Copy link
Owner

karthink commented Feb 6, 2025

@z80dev I'd like to merge this, as I'm getting many issues/discussions requesting this feature on the tracker. Do you think you'll be able to incorporate the changes we discussed soon?

@z80dev
Copy link
Author

z80dev commented Feb 6, 2025

@z80dev I'd like to merge this, as I'm getting many issues/discussions requesting this feature on the tracker. Do you think you'll be able to incorporate the changes we discussed soon?

@karthink
yep I can do it today! I've been waiting for the tips on the parser functions that you mentioned in your first comment but I can push the other changes today

@karthink
Copy link
Owner

karthink commented Feb 6, 2025

I've been waiting for the tips on the parser functions that you mentioned in your first comment but I can push the other changes today

It's a little involved so I'll explain when I get a reasonably sized chunk of free time, hopefully soon.

and redefine gptel-deepseek to use this variable in its parsing to
determine whether to include reasoning
@karthink
Copy link
Owner

Can you provide me with the log for a medium sized deepseek-reasoner response? Both streaming and non-streaming, although the former is more important.

To propose changes to the parsers, I need to understand what the output looks like. I don't have Deepseek access so I'm flying blind right now.

(when (and reasoning-content
(not (equal reasoning-content :null))
gptel-include-reasoning)
(unless (plist-get info :header-printed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers feels like a front-end detail for sure. This would be a block / drawer in org and a source block in markdown.

@karthink this is where I think a TYPE argument in the callback would be useful for allowing the front-end to differentiate reasoning as a message type.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers feels like a front-end detail for sure.

Yes, my plan for Deepseek integration is different from this, and in line with separating the response parser from the UI. I need to look at some Deepseek response logs to make a proposal.

@karthink this is where I think a TYPE argument in the callback would be useful for allowing the front-end to differentiate reasoning as a message type.

TYPE will not fly, but I've already covered alternatives in detail in my response to your suggestion.

@sg-qwt
Copy link

sg-qwt commented Feb 11, 2025

Can you provide me with the log for a medium sized deepseek-reasoner response? Both streaming and non-streaming, although the former is more important.

To propose changes to the parsers, I need to understand what the output looks like. I don't have Deepseek access so I'm flying blind right now.

https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1/playground
hi @karthink right now deepseek-r1 is free on Github Models, maybe you can test with that instead?
image

@psionic-k
Copy link
Contributor

Changes in #626 are highly relevant to new response types and the frontend / backend separation.

#+begin_reasoning and a markdown fence with reasoning as the language would work roughly the same as everything about explicit tool call handling.

I'm likely to begin hacking on this PR. It looks okay except for bundling reasoning in with chat.

right now deepseek-r1 is free on Github Models

@sg-qwt did you actually authenticate against it? Please share details so I can catch up.

@sg-qwt
Copy link

sg-qwt commented Feb 13, 2025

@sg-qwt did you actually authenticate against it? Please share details so I can catch up.

Just get a Github PAT(no need to tick any permission), you're good to go. The model always includes <think> </think>tags with reasoning.
Though I found it mostly unusable for free users due to rate limit is tight. Blocked by http 429 just after a few tries.

@psionic-k
Copy link
Contributor

I'm about to crash. If you are awake, I need an eli5. Put the PAT in auth info? How can I add some headers to set super low token and reasoning limits if I just want to be sure and get enough sample requests?

@karthink
Copy link
Owner

karthink commented Feb 13, 2025

I tested it with Github models, and the situation is a little more complicated than it appears. The Github models API just adds literal <think> and </think> tags to the reasoning part of the response. The Deepseek API includes the reasoning chunk as a different field in the JSON object.

This PR is for the Deepseek API. I can't get access to the API, so I need to see some logs (both streaming and non-streaming) before I can decide how best to handle the parsing.

Handling the <think> tags can be done in the OpenAI parser. It needs to be done with care, as we want to avoid scanning every response chunk for <think> when streaming responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants