Include DeepSeek's reasoning_content when present #592

z80dev · 2025-01-30T05:22:25Z

Creates a separate backend for DeepSeek that inherits from the (mostly compatible) OpenAI backend.

Adds parsing and formatting of reasoning content, and a constructor argument for the backend to configure reasoning inclusion.

karthink · 2025-01-30T07:18:35Z

@z80dev Thanks for the PR!

I have some suggestions to simplify the parser functions, I'll explain when I next have some time.

For now I wanted to mention some thoughts about the show-reasoning configuration. I'm going out on a limb here:

I expect that the popularity of deepseek-r1 means that we will soon see many more models that emit chains of thought/reasoning blocks. This might become a semi-standard feature of LLMs.
This might also be something that users may want to hide/show on demand. For example, they might want it to be hidden by default, but easily toggle-able before/after a request. (Yes, toggling after a request can work too.)

Making the switch part of the backend definition makes it difficult to do this. Essentially you have to run

(setf (gptel-backend-show-reasoning (gptel-get-backend "Deepseek")) t)

Considering the above points, perhaps it makes sense to make this configurable at the gptel level instead of the gptel-backend level? Perhaps via a switch like gptel-include-reasoning, or gptel-include-chain-of-thought, similar to how we have gptel-include-tool-results or gptel-stream.

pirminj · 2025-01-30T10:16:40Z

Regarding the UI, I agree that a quick toggle option is needed. In my testing I often find the reasoning output interesting but not really relevant for the work, therefore showing a side buffer *gptel-reasoning* on demand might me a better solution.

z80dev · 2025-01-30T19:09:12Z

Good idea! I first started implementing with a separate variable (scoped only to deepseek), but then migrated to backend-level config for this because sometimes I wanted one backend to show reasoning, and another not showing reasoning.

For this to work with a global toggle, I would have to be flipped each time I make a request which is not ideal

and yes the point about having reasoning in the main gptel buffer or in a side buffer is another good point. Another option to consider is having the reasoning be put under a sub-heading (so #### Reasoning in markdown format) and having that heading be collapsed by default, and a separate, uncollapsed sub-heading (#### Response) for the main response.. I haven't checked if this is something that's actually possible but I'd be surprised if it wasn't

all this shows how differently our use-cases can be, and we should prioritize flexibility to not get in developer's way

I will start with refactoring the toggle to be a variable instead of part of the backend config, and make it so it can also be set per-buffer via the transient. this supports my use-case I mentioned above (some backends I want the reasoning to show, others not) without making it part of the backend config.

then I will look into "reasoning output destinations" like a collapsed or uncollapsed heading, or a separate buffer, or just in-place

karthink · 2025-01-30T19:40:03Z

...sometimes I wanted one backend to show reasoning, and another not showing reasoning.

For this to work with a global toggle, I would have to be flipped each time I
make a request which is not ideal

I'm not sure what you are comparing this with, because it's the same amount of work in both approaches, but a gptel-level toggle can be more versatile.

If it's part of the backend you have to switch backends, possibly buffer-locally.
If it's a gptel option you have flip that switch, possibly buffer-locally.

The advantage of the latter is that you can additionally use a single backend both with and without reasoning included. In the former you'd have to define two identical deepseek backends (but for the show-reasoning property) to be able to do that.

and yes the point about having reasoning in the main gptel buffer or in a side
buffer is another good point. Another option to consider is having the reasoning
be put under a sub-heading (so #### Reasoning in markdown format) and having
that heading be collapsed by default, and a separate, uncollapsed sub-heading
(#### Response) for the main response.. I haven't checked if this is something
that's actually possible but I'd be surprised if it wasn't

You can do exactly this with a post response hook. See my comment here.

I will start with refactoring the toggle to be a variable instead of part of the
backend config, and make it so it can also be set per-buffer via the
transient.

You just have to use the gptel-lisp-variable class in the transient infix, the rest is taken care of. Also, I think most people miss the fact that you can also set these options per request, and not just buffer-locally.

then I will look into "reasoning output destinations" like a collapsed or
uncollapsed heading, or a separate buffer, or just in-place

Perhaps gptel-include-reasoning can have three values, t, buffer and nil to determine if the chain of thought should be shown with the response, in a separate buffer or not shown at all. I'm not sure yet if a fourth value (hide or collapse) to include+collapse the CoT is necessary.

karthink · 2025-02-06T19:51:24Z

@z80dev I'd like to merge this, as I'm getting many issues/discussions requesting this feature on the tracker. Do you think you'll be able to incorporate the changes we discussed soon?

z80dev · 2025-02-06T19:53:03Z

@z80dev I'd like to merge this, as I'm getting many issues/discussions requesting this feature on the tracker. Do you think you'll be able to incorporate the changes we discussed soon?

@karthink
yep I can do it today! I've been waiting for the tips on the parser functions that you mentioned in your first comment but I can push the other changes today

karthink · 2025-02-06T19:54:58Z

I've been waiting for the tips on the parser functions that you mentioned in your first comment but I can push the other changes today

It's a little involved so I'll explain when I get a reasonably sized chunk of free time, hopefully soon.

and redefine gptel-deepseek to use this variable in its parsing to determine whether to include reasoning

karthink · 2025-02-10T22:49:19Z

Can you provide me with the log for a medium sized deepseek-reasoner response? Both streaming and non-streaming, although the former is more important.

To propose changes to the parsers, I need to understand what the output looks like. I don't have Deepseek access so I'm flying blind right now.

psionic-k · 2025-02-11T03:45:04Z

gptel-deepseek.el

+                  (when (and reasoning-content
+                             (not (equal reasoning-content :null))
+                             gptel-include-reasoning)
+                    (unless (plist-get info :header-printed)


Headers feels like a front-end detail for sure. This would be a block / drawer in org and a source block in markdown.

@karthink this is where I think a TYPE argument in the callback would be useful for allowing the front-end to differentiate reasoning as a message type.

Headers feels like a front-end detail for sure.

Yes, my plan for Deepseek integration is different from this, and in line with separating the response parser from the UI. I need to look at some Deepseek response logs to make a proposal.

@karthink this is where I think a TYPE argument in the callback would be useful for allowing the front-end to differentiate reasoning as a message type.

TYPE will not fly, but I've already covered alternatives in detail in my response to your suggestion.

sg-qwt · 2025-02-11T17:44:50Z

Can you provide me with the log for a medium sized deepseek-reasoner response? Both streaming and non-streaming, although the former is more important.

To propose changes to the parsers, I need to understand what the output looks like. I don't have Deepseek access so I'm flying blind right now.

https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1/playground
hi @karthink right now deepseek-r1 is free on Github Models, maybe you can test with that instead?

psionic-k · 2025-02-13T13:32:10Z

Changes in #626 are highly relevant to new response types and the frontend / backend separation.

#+begin_reasoning and a markdown fence with reasoning as the language would work roughly the same as everything about explicit tool call handling.

I'm likely to begin hacking on this PR. It looks okay except for bundling reasoning in with chat.

right now deepseek-r1 is free on Github Models

@sg-qwt did you actually authenticate against it? Please share details so I can catch up.

sg-qwt · 2025-02-13T13:39:59Z

@sg-qwt did you actually authenticate against it? Please share details so I can catch up.

Just get a Github PAT(no need to tick any permission), you're good to go. The model always includes <think> </think>tags with reasoning.
Though I found it mostly unusable for free users due to rate limit is tight. Blocked by http 429 just after a few tries.

psionic-k · 2025-02-13T14:31:29Z

I'm about to crash. If you are awake, I need an eli5. Put the PAT in auth info? How can I add some headers to set super low token and reasoning limits if I just want to be sure and get enough sample requests?

karthink · 2025-02-13T17:06:37Z

I tested it with Github models, and the situation is a little more complicated than it appears. The Github models API just adds literal <think> and </think> tags to the reasoning part of the response. The Deepseek API includes the reasoning chunk as a different field in the JSON object.

This PR is for the Deepseek API. I can't get access to the API, so I need to see some logs (both streaming and non-streaming) before I can decide how best to handle the parsing.

Handling the <think> tags can be done in the OpenAI parser. It needs to be done with care, as we want to avoid scanning every response chunk for <think> when streaming responses.

z80dev added 23 commits January 28, 2025 23:33

add gptel-deepseek.el

d7fd0e7

simplify stream parsing

4502e11

add separator for chain of thought vs contents

dcc91bb

dont print separator when there's content

dc959d4

fix: oops, both are null before response starts

1ee2ea6

only print separator once by keeping track in info

9fef9f1

add some logging to debug

7d108a5

fix content case

e7b6766

new main-content block hopefully it worksc

b9e05d1

more debug code. doom is annoying

7edb29e

add new printing and separator logic

942fa3c

remove duplicate content printing

4380d69

add reasoning header and improve sep

710d1f7

fix header and separator formatting

a3ce63b

remove cond for now. maybe will bring back

9f3da60

update docs

a0c444e

update README

0cd152c

move show-reasoning into constructor

e2f5783

add non-stream parsing

ac0dc66

fix: typo

8dfadae

fix: move defstruct and fix parinfer bug

8f2f948

fix new file headers

a12e01d

remove unused defgroup

d22ce29

Merge branch 'karthink:master' into deepseek-reasoning-api

2b11aa0

karthink mentioned this pull request Feb 6, 2025

OpenRouter reasoning tokens not included in output #613

Closed

karthink mentioned this pull request Feb 6, 2025

Hide reasoning models thoughts #584

Closed

move include-reasoning variable to gptel.el

68bb703

and redefine gptel-deepseek to use this variable in its parsing to determine whether to include reasoning

psionic-k reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include DeepSeek's reasoning_content when present #592

Include DeepSeek's reasoning_content when present #592

z80dev commented Jan 30, 2025

karthink commented Jan 30, 2025 •

edited

Loading

pirminj commented Jan 30, 2025

z80dev commented Jan 30, 2025

karthink commented Jan 30, 2025

karthink commented Feb 6, 2025

z80dev commented Feb 6, 2025 •

edited

Loading

karthink commented Feb 6, 2025

karthink commented Feb 10, 2025

psionic-k Feb 11, 2025

karthink Feb 11, 2025

sg-qwt commented Feb 11, 2025

psionic-k commented Feb 13, 2025

sg-qwt commented Feb 13, 2025 •

edited

Loading

psionic-k commented Feb 13, 2025

karthink commented Feb 13, 2025 •

edited

Loading

Include DeepSeek's reasoning_content when present #592

Are you sure you want to change the base?

Include DeepSeek's reasoning_content when present #592

Conversation

z80dev commented Jan 30, 2025

karthink commented Jan 30, 2025 • edited Loading

pirminj commented Jan 30, 2025

z80dev commented Jan 30, 2025

karthink commented Jan 30, 2025

karthink commented Feb 6, 2025

z80dev commented Feb 6, 2025 • edited Loading

karthink commented Feb 6, 2025

karthink commented Feb 10, 2025

psionic-k Feb 11, 2025

Choose a reason for hiding this comment

karthink Feb 11, 2025

Choose a reason for hiding this comment

sg-qwt commented Feb 11, 2025

psionic-k commented Feb 13, 2025

sg-qwt commented Feb 13, 2025 • edited Loading

psionic-k commented Feb 13, 2025

karthink commented Feb 13, 2025 • edited Loading

karthink commented Jan 30, 2025 •

edited

Loading

z80dev commented Feb 6, 2025 •

edited

Loading

sg-qwt commented Feb 13, 2025 •

edited

Loading

karthink commented Feb 13, 2025 •

edited

Loading