Have you ever had an extended conversation with an AI, only to feel like it’s getting confused or stubbornly refusing to adjust its answers? Maybe you noticed it going in circles or giving inconsistent responses as the chat went on. If so, you’re not alone. This phenomenon, where AI seems to get “lost” or less reliable during multi-turn dialogues, has been rigorously studied by researchers from Microsoft and Salesforce. Their recent paper, “LLMs Get Lost In Multi-Turn Conversation”, sheds light on why AI struggles with step-by-step instructions and how we can better interact with these models to get the most out of them.
In this article, we’ll break down the key findings of that research, explain the concepts of single-turn and multi-turn interactions, and provide practical tips on how to work effectively with AI based on these insights. Whether you’re a casual user or someone relying on AI for complex tasks, understanding this behavior will help you avoid frustration and unlock more reliable AI outputs.
Understanding Single-Turn vs. Multi-Turn Interactions
Before diving into the study’s findings, it’s essential to clarify two fundamental concepts: single-turn and multi-turn interactions with AI.
- Single-turn means giving the AI all the necessary information and instructions in one go. You provide a comprehensive prompt, and the AI responds based on that complete input.
- Multi-turn mimics a natural human conversation, where you interact with the AI in multiple steps—gradually providing information, adjusting instructions, or correcting misunderstandings over several back-and-forth exchanges.
For example, when I ask AI to help with a project, I might initially give a basic instruction. Later, I realize there are additional considerations or constraints, so I add those in follow-up messages. This iterative process is typical in multi-turn conversations.

The Core Finding: AI’s Strength vs. Stability in Multi-Turn Conversations
The study reveals a fascinating yet counterintuitive result: while the AI’s ability to produce good answers drops only slightly in multi-turn scenarios, the reliability or stability of those answers plummets significantly.
What do these terms mean exactly?
- Ability refers to the AI’s potential to generate high-quality responses. In the research, this was quantified by the 90th percentile score—the performance level achieved by the top 10% of responses in many tests.
- Reliability measures how consistent the AI’s answers are. It’s calculated as the difference between the 10th and 90th percentile scores. A small difference means the AI consistently delivers similar-quality responses, while a large gap indicates high variability—sometimes excellent answers, other times poor ones.
In multi-turn conversations, the AI occasionally produces near-best results but often swings widely in quality, making it less trustworthy overall. This means while the AI hasn’t lost much raw capability, it becomes “moody” or “unstable” as the conversation progresses.

How Was This Tested? The Experiments Behind the Findings
To understand why this happens, the researchers compared AI performance on the same tasks under different prompting styles:
- Single-turn prompting: All task-related information was given at once in a fully detailed prompt.
- Multi-turn prompting: The same information was split into smaller pieces and fed to the AI sequentially in a conversation.
Within multi-turn prompting, they tested several strategies:
- Random small chunks: Information was delivered piece-by-piece in no particular order.
- Unstructured summary: Fragmented information was combined and sent all at once without logical connections.
- Summary after stepwise input: Information was given step-by-step, then a final summary prompt was provided.
- Snowballing: Instructions were gradually built up toward a perfect prompt.
They applied these strategies across various problem categories like coding and math summarization to see how well AI models performed.
Results: Multi-Turn Prompts Lead to Lower Scores and Less Stability
The results were striking. While summary-style multi-turn prompts saw only a modest drop in performance compared to single-turn, the small-chunk random prompts caused a dramatic fall in scores.
Looking closely at popular AI models like ChatGPT-3.5, Gemini 2.5 Pro, and Claude 3.7:
- All models experienced a significant decline in accuracy during multi-turn interactions.
- The average score across all test categories dropped by about 39% compared to single-turn prompts.
This confirms that the problem is not isolated to a single AI but is general across large language models (LLMs). Even improved multi-turn methods like snowballing and summarizing improved results somewhat but still lagged behind single-turn prompting.

Why Does AI Struggle with Multi-Turn Conversations?
The core reason, according to the paper, is how AI forms internal assumptions early in a dialogue and then stubbornly clings to them.
At the start of a conversation, the AI has limited information. Unlike humans who might ask clarifying questions when unsure, AI tends to make the best guess it can and generates an initial hypothesis about what the user wants.
Once this internal hypothesis is set, the AI heavily biases its subsequent outputs based on that first guess. Even when later turns provide new or contradicting information, the AI struggles to revise or discard its initial assumptions. This “anchoring” effect leads to confusion and instability as the conversation continues.

Practical Tips: How to Work with AI More Effectively
Understanding this behavior helps us adjust our approach to AI interactions to achieve better results. Here are some actionable recommendations:
1. Avoid Long Multi-Turn Conversations When Accuracy Matters
If you notice the AI’s responses becoming inconsistent or off-track, don’t try to fix it by continuing the same chat. Instead, start a new chat session to reset the AI’s internal assumptions. This “fresh start” often leads to faster and more accurate answers.

2. Compose Detailed, Structured Prompts Upfront
Whenever possible, gather all relevant information, constraints, background, and examples and include them in your initial prompt. Use formats like Markdown or YAML to organize the details clearly. This reduces the need for the AI to guess or fill in gaps later.
3. Summarize Past Conversations When Switching Chats
If you must switch to a new chat but want to maintain context, ask the AI to summarize the previous conversation and copy that summary into the new session. This helps the AI start with a more accurate understanding.
4. Use Multi-Turn Interactions for Casual or Exploratory Tasks
Multi-turn conversations are still great for brainstorming, simple questions, or informal idea generation. The instability generally doesn’t matter much in these cases, and the back-and-forth can be enjoyable and productive.

Conclusion: Use AI Conversations Wisely by Knowing Their Limits
While large language models have made tremendous strides, their internal mechanisms mean they often get “lost” or inconsistent in multi-turn conversations when tasked with gradually absorbing complex instructions. This research clearly shows that AI’s ability remains strong but its reliability suffers as the conversation lengthens.
By understanding this, we can tailor our interactions to maximize AI’s potential—favoring well-structured single-turn prompts for precise work and reserving multi-turn chats for lighter, more flexible uses. Remember to reset chats when things get confusing and always try to pack your instructions carefully at the start.
These insights are invaluable for anyone looking to harness AI effectively, whether for coding, writing, research, or business tasks. Embracing these best practices will save you time and frustration while unlocking the true power of AI assistance.
For those interested in diving deeper, the original paper “LLMs Get Lost In Multi-Turn Conversation” is a fascinating read. And if you want a comprehensive guide to AI usage basics, consider checking out the “AI Utilization Textbook”, which covers everything from foundational mechanisms to prompt engineering.
Happy AI collaborating!

























