Qwenlong-l1 resolves prolonged arguments rational challenge of today’s llms stumps

Qwenlong-l1 resolves prolonged arguments rational challenge of today’s llms stumps

Join our daily and weekly newsletters for newest updates and exclusive content to cover the industry. Learn more


Alibaba Group indicated QWENLONG-L1A new framework enables multiple language models (LLMS) to reason with high-long inputs. This progress can open a new business applications that require models to understand and take insights from the detailed corporate documents such as the company’s detailed statements, or complex financial contracts.

The challenge of long form of reason for AI

Recent developing many models of reason (LRMS), especially Learn to learn (RL), there is very improved with their problem-solving capabilities. Research shows that if trained to repair rl tuning, LRMS has obtained the skills similar to human “Slow thinking“Where they develop sophisticated ways to solve complex functions.

However, these advances were first seen if models work relatively short pieces of text, usually about 4,000 tokens. The ability of these models of scaling their arguments in higher contexts (eg, 120,000 tokens) remain a big challenge. Such a long-term argument requires a strong understanding of the whole context and the ability to make analysis of multi-step. “This limit provides an important obstacle to the practical application that requires interaction with external knowledge, such as withdrawing the environments of the environments of Qwenlong-L1 write their ROLE.

Researchers form these concept challenges of “long rationing rang.” Unlike the rational context, often dependent on the knowledge saved inside the model, the rational rational rectict that the models needed to get the models to get high inputs. At that time they could create chains of reasoning based on this attached information.

Training models for it by rl is deceptive and often result in poor learning and unstable optimization processes. Model struggles to meet good solutions or lose their ability to explore different pathways.

Qwenlong-l1: a way to multi-stage

Qwenlong-L1 is a learning frame designed to help transfer LRMS from the skill of short texts in strong contexts in many contexts. The framework develops existing PWort-Contith LRMS by a careful structure, multi-stage process:

Heating to Fix Good Tuning (SFT): The model first went through one side of SFT, where it was trained by examples of justification of long context. This episode has established a strong foundation, allowing the information of the information in the land accurate from high inputs. It helps develop basic contextual understanding capabilities, make logical rational chains, and get answers.

The curriculum-guides stacked rl: At this stage, the model has been trained in several stages, which is the length of the target input documents gradually increasing. This systematic, stepping step helps the model stable to match the strategies of reasoning from the higher contexts. This disorder avoidance can always be seen if models are suddenly trained in high texts.

Difficulty sampling sampling: The final stage of training includes difficult examples from the main training stages, ensure that the model continues to learn from the hardest problems. This is mainly difficult moments and encourages the model to explore the various rational paths.

Qwenlong-l1 process process: arxiv

More than this structured training, Qwenlong-L1 also uses a separate reward system. While training for rational rational tasks often depend on rule-based rewards (eg, a proper response to the rule mechanism.LLM-AS-A-JUSTCE. “This model judge compares the semanticity of the detrimental response to the reality of the ground, which allows more susceptible flexibility and better handling of different answers to become clear answers,

Put Qwenlong-L1 to try

The Alibaba team has reviewed Qwenlong-L1 using document answering (DOCQA) as the main task. This scenario associated with business needs, where AI should have to understand dense documents to meet the complex questions.

Experiment results across seven Long-Conctxt Docqa Benchmarks show Qwenlong-L1 capabilities. Especially, the Qwenlong-L1-1-32B model (based on Deepseek-R1-Distill-Qwen-32B) Achieved performance similar to anthropic’s Claude-3.7 Sonnet thinkingand experforms’ models like OpenII’s O3-Mini and qwen3-235b-a22b. The smaller qwenlong-l1-14b model also outperformed Google’s Gemini 2.0 Flash Mind and qwen3-32b.

Source: arxiv
Source: arxiv

An important search associated with real world applications is how rl training results are in the model that develops arguments rational rational rational. The paper says that models trained in qwenlong-l1 will be better than “repeat” (identifying complex questions) (“recognize complex questions),” identify complex questions)

For example, while a base model can fail in details of the financial document or analysis of a loop unrelated to involve self-reflection. Successfully filter these distractor details, backtracks from incorrect paths, and come to the correct answer.

Techniques such as qwenlong-l1 can expand AI’s Utility in business. Potential applications include legal tech (analyzing thousands of pages of legal documents), financial opportunities and analyzes (investment reports or services releases Code for qwenlong-l1 recipe and the Weights for the trained model.

Leave a Reply

Your email address will not be published. Required fields are marked *