You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current LLM development is moving toward structured output. It's proved to improve model performance in various tasks. Also when training with structured output, we can explore further into training long context which we haven't trained.
Structured responses significantly improve the model’s systematic reasoning ability. To achieve this, they design <SUMMARY>, <CAPTION>, <REASONING>, <CONCLUSION> tags to help the model recognize the current stage of reasoning, and create the LLaVA-o1-100k dataset by using GPT-4o to generate stage-level reasoning.
Inference time scaling: Unlike previous methods like best-of-N search and sentence-level beam search, we propose a novel stage-level beam search method. Specifically, we generate multiple responses for each stage (marked by tags) and select the best one to proceed to the next stage.
Problem Statement
Current LLM development is moving toward structured output. It's proved to improve model performance in various tasks. Also when training with structured output, we can explore further into training long context which we haven't trained.
Idea
Reference: https://arxiv.org/pdf/2411.10440
The text was updated successfully, but these errors were encountered: