Why AI Gets “Lost” in Multi-Turn Conversations: Causes and Solutions Explained

Have you ever had an extended conversation with an AI, only to feel like it’s getting confused or stubbornly refusing to adjust its answers? Maybe you noticed it going in circles or giving inconsistent responses as the chat went on. If so, you’re not alone. This phenomenon, where AI seems to get “lost” or less reliable during multi-turn dialogues, has been rigorously studied by researchers from Microsoft and Salesforce. Their recent paper, “LLMs Get Lost In Multi-Turn Conversation”, sheds light on why AI struggles with step-by-step instructions and how we can better interact with these models to get the most out of them.

In this article, we’ll break down the key findings of that research, explain the concepts of single-turn and multi-turn interactions, and provide practical tips on how to work effectively with AI based on these insights. Whether you’re a casual user or someone relying on AI for complex tasks, understanding this behavior will help you avoid frustration and unlock more reliable AI outputs.

Understanding Single-Turn vs. Multi-Turn Interactions

Before diving into the study’s findings, it’s essential to clarify two fundamental concepts: single-turn and multi-turn interactions with AI.

  • Single-turn means giving the AI all the necessary information and instructions in one go. You provide a comprehensive prompt, and the AI responds based on that complete input.
  • Multi-turn mimics a natural human conversation, where you interact with the AI in multiple steps—gradually providing information, adjusting instructions, or correcting misunderstandings over several back-and-forth exchanges.

For example, when I ask AI to help with a project, I might initially give a basic instruction. Later, I realize there are additional considerations or constraints, so I add those in follow-up messages. This iterative process is typical in multi-turn conversations.

Single-turn vs multi-turn conversation explanation

The Core Finding: AI’s Strength vs. Stability in Multi-Turn Conversations

The study reveals a fascinating yet counterintuitive result: while the AI’s ability to produce good answers drops only slightly in multi-turn scenarios, the reliability or stability of those answers plummets significantly.

What do these terms mean exactly?

  • Ability refers to the AI’s potential to generate high-quality responses. In the research, this was quantified by the 90th percentile score—the performance level achieved by the top 10% of responses in many tests.
  • Reliability measures how consistent the AI’s answers are. It’s calculated as the difference between the 10th and 90th percentile scores. A small difference means the AI consistently delivers similar-quality responses, while a large gap indicates high variability—sometimes excellent answers, other times poor ones.

In multi-turn conversations, the AI occasionally produces near-best results but often swings widely in quality, making it less trustworthy overall. This means while the AI hasn’t lost much raw capability, it becomes “moody” or “unstable” as the conversation progresses.

Graph showing ability vs reliability in AI responses

How Was This Tested? The Experiments Behind the Findings

To understand why this happens, the researchers compared AI performance on the same tasks under different prompting styles:

  1. Single-turn prompting: All task-related information was given at once in a fully detailed prompt.
  2. Multi-turn prompting: The same information was split into smaller pieces and fed to the AI sequentially in a conversation.

Within multi-turn prompting, they tested several strategies:

  • Random small chunks: Information was delivered piece-by-piece in no particular order.
  • Unstructured summary: Fragmented information was combined and sent all at once without logical connections.
  • Summary after stepwise input: Information was given step-by-step, then a final summary prompt was provided.
  • Snowballing: Instructions were gradually built up toward a perfect prompt.

They applied these strategies across various problem categories like coding and math summarization to see how well AI models performed.

Results: Multi-Turn Prompts Lead to Lower Scores and Less Stability

The results were striking. While summary-style multi-turn prompts saw only a modest drop in performance compared to single-turn, the small-chunk random prompts caused a dramatic fall in scores.

Looking closely at popular AI models like ChatGPT-3.5, Gemini 2.5 Pro, and Claude 3.7:

  • All models experienced a significant decline in accuracy during multi-turn interactions.
  • The average score across all test categories dropped by about 39% compared to single-turn prompts.

This confirms that the problem is not isolated to a single AI but is general across large language models (LLMs). Even improved multi-turn methods like snowballing and summarizing improved results somewhat but still lagged behind single-turn prompting.

Performance comparison chart of AI models with different prompting methods

Why Does AI Struggle with Multi-Turn Conversations?

The core reason, according to the paper, is how AI forms internal assumptions early in a dialogue and then stubbornly clings to them.

At the start of a conversation, the AI has limited information. Unlike humans who might ask clarifying questions when unsure, AI tends to make the best guess it can and generates an initial hypothesis about what the user wants.

Once this internal hypothesis is set, the AI heavily biases its subsequent outputs based on that first guess. Even when later turns provide new or contradicting information, the AI struggles to revise or discard its initial assumptions. This “anchoring” effect leads to confusion and instability as the conversation continues.

Diagram illustrating AI anchoring on initial hypothesis in conversation

Practical Tips: How to Work with AI More Effectively

Understanding this behavior helps us adjust our approach to AI interactions to achieve better results. Here are some actionable recommendations:

1. Avoid Long Multi-Turn Conversations When Accuracy Matters

If you notice the AI’s responses becoming inconsistent or off-track, don’t try to fix it by continuing the same chat. Instead, start a new chat session to reset the AI’s internal assumptions. This “fresh start” often leads to faster and more accurate answers.

Start a new chat to reset AI assumptions

2. Compose Detailed, Structured Prompts Upfront

Whenever possible, gather all relevant information, constraints, background, and examples and include them in your initial prompt. Use formats like Markdown or YAML to organize the details clearly. This reduces the need for the AI to guess or fill in gaps later.

3. Summarize Past Conversations When Switching Chats

If you must switch to a new chat but want to maintain context, ask the AI to summarize the previous conversation and copy that summary into the new session. This helps the AI start with a more accurate understanding.

4. Use Multi-Turn Interactions for Casual or Exploratory Tasks

Multi-turn conversations are still great for brainstorming, simple questions, or informal idea generation. The instability generally doesn’t matter much in these cases, and the back-and-forth can be enjoyable and productive.

Using multi-turn conversation for casual AI interactions

Conclusion: Use AI Conversations Wisely by Knowing Their Limits

While large language models have made tremendous strides, their internal mechanisms mean they often get “lost” or inconsistent in multi-turn conversations when tasked with gradually absorbing complex instructions. This research clearly shows that AI’s ability remains strong but its reliability suffers as the conversation lengthens.

By understanding this, we can tailor our interactions to maximize AI’s potential—favoring well-structured single-turn prompts for precise work and reserving multi-turn chats for lighter, more flexible uses. Remember to reset chats when things get confusing and always try to pack your instructions carefully at the start.

These insights are invaluable for anyone looking to harness AI effectively, whether for coding, writing, research, or business tasks. Embracing these best practices will save you time and frustration while unlocking the true power of AI assistance.

For those interested in diving deeper, the original paper “LLMs Get Lost In Multi-Turn Conversation” is a fascinating read. And if you want a comprehensive guide to AI usage basics, consider checking out the “AI Utilization Textbook”, which covers everything from foundational mechanisms to prompt engineering.

Happy AI collaborating!

  • Related Posts

    KJ Method Resurfaces in AI Workslop Problem

    To solve the AI ​​Workslop problem, an information organization technique invented in Japan in the 1960s may be effective. Kunihiro Tada, founder of the Mindware Research Institute, says that by reconstructing data mining technology in line with the KJ method,…

    AI Work Slop and the Productivity Paradox in Business

    Introduction: Modern AI tools promise to supercharge productivity, automating tasks and generating content at an unprecedented scale. Yet many business professionals are noticing a curious problem: an overabundance of low-quality, AI-generated work that adds noise and overhead instead of value.…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    The AI Bubble Collapse Is Not the The End — It Is the Beginning of Selection

    The AI Bubble Collapse Is Not the The End — It Is the Beginning of Selection

    Notable AI News Roundup: ChatGPT Atlas, Company Knowledge, Claude Code Web, Pet Cameo, Copilot 12 Features, NTT Tsuzumi 2 and 22 More Developments

    Notable AI News Roundup: ChatGPT Atlas, Company Knowledge, Claude Code Web, Pet Cameo, Copilot 12 Features, NTT Tsuzumi 2 and 22 More Developments

    KJ Method Resurfaces in AI Workslop Problem

    KJ Method Resurfaces in AI Workslop Problem

    AI Work Slop and the Productivity Paradox in Business

    AI Work Slop and the Productivity Paradox in Business

    OpenAI’s “Sora 2” and its impact on Japanese anime and video game copyrights

    OpenAI’s “Sora 2” and its impact on Japanese anime and video game copyrights

    Claude Sonnet 4.5: Technical Evolution and Practical Applications of Next-Generation AI

    Claude Sonnet 4.5: Technical Evolution and Practical Applications of Next-Generation AI

    Global AI Development Summary — September 2025

    Global AI Development Summary — September 2025

    Comparison : GPT-5-Codex V.S. Claude Code

    Comparison : GPT-5-Codex V.S. Claude Code

    【HRM】How a Tiny Hierarchical Reasoning Model Outperformed GPT-Scale Systems: A Clear Explanation of the Hierarchical Reasoning Model

    【HRM】How a Tiny Hierarchical Reasoning Model Outperformed GPT-Scale Systems: A Clear Explanation of the Hierarchical Reasoning Model

    GPT‑5‑Codex: OpenAI’s Agentic Coding Model

    GPT‑5‑Codex: OpenAI’s Agentic Coding Model

    AI Adoption Slowdown: Data Analysis and Implications

    AI Adoption Slowdown: Data Analysis and Implications

    Grokking in Large Language Models: Concepts, Models, and Applications

    Grokking in Large Language Models: Concepts, Models, and Applications

    AI Development — August 2025

    AI Development — August 2025

    Agent-Based Personal AI on Edge Devices (2025)

    Agent-Based Personal AI on Edge Devices (2025)

    Ambient AI and Ambient Intelligence: Current Trends and Future Outlook

    Ambient AI and Ambient Intelligence: Current Trends and Future Outlook

    Comparison of Auto-Coding Tools and Integration Patterns

    Comparison of Auto-Coding Tools and Integration Patterns

    Comparing the Coding Capabilities of OpenAI Codex vs GPT-5

    Comparing the Coding Capabilities of OpenAI Codex vs GPT-5

    Comprehensive Report: GPT-5 – Features, Announcements, Reviews, Reactions, and Impact

    Comprehensive Report: GPT-5 – Features, Announcements, Reviews, Reactions, and Impact

    July 2025 – AI Development Highlights

    July 2025 – AI Development Highlights

    ConceptMiner -Creativity Support System, Integrating qualitative and quantitative data to create a foundation for collaboration between humans and AI

    ConceptMiner -Creativity Support System, Integrating qualitative and quantitative data to create a foundation for collaboration between humans and AI

    ChatGPT Agent (Agent Mode) – Capabilities, Performance, and Security

    ChatGPT Agent (Agent Mode) – Capabilities, Performance, and Security

    The Evolution of AI and Creativity: Insights from Yuval Noah Harari and Hikaru Utada on Art, Music, and Human Emotion in the Age of Artificial Intelligence

    The Evolution of AI and Creativity: Insights from Yuval Noah Harari and Hikaru Utada on Art, Music, and Human Emotion in the Age of Artificial Intelligence

    Why AI Gets “Lost” in Multi-Turn Conversations: Causes and Solutions Explained

    Why AI Gets “Lost” in Multi-Turn Conversations: Causes and Solutions Explained

    Potemkin Understanding in AI: Illusions of Comprehension in Large Language Models

    Potemkin Understanding in AI: Illusions of Comprehension in Large Language Models

    Global AI News and Events Report for June 2025

    Global AI News and Events Report for June 2025