{"id":1671,"date":"2025-07-08T22:44:35","date_gmt":"2025-07-08T13:44:35","guid":{"rendered":"https:\/\/www.aicritique.org\/us\/?p=1671"},"modified":"2025-07-08T22:44:35","modified_gmt":"2025-07-08T13:44:35","slug":"why-ai-gets-lost-in-multi-turn-conversations-causes-and-solutions-explained","status":"publish","type":"post","link":"https:\/\/www.aicritique.org\/us\/2025\/07\/08\/why-ai-gets-lost-in-multi-turn-conversations-causes-and-solutions-explained\/","title":{"rendered":"Why AI Gets &#8220;Lost&#8221; in Multi-Turn Conversations: Causes and Solutions Explained"},"content":{"rendered":"\n<p>Have you ever had an extended conversation with an AI, only to feel like it\u2019s getting confused or stubbornly refusing to adjust its answers? Maybe you noticed it going in circles or giving inconsistent responses as the chat went on. If so, you\u2019re not alone. This phenomenon, where AI seems to get &#8220;lost&#8221; or less reliable during multi-turn dialogues, has been rigorously studied by researchers from Microsoft and Salesforce. Their recent paper, <strong>&#8220;LLMs Get Lost In Multi-Turn Conversation&#8221;<\/strong>, sheds light on why AI struggles with step-by-step instructions and how we can better interact with these models to get the most out of them.<\/p>\n\n\n\n<p>In this article, we\u2019ll break down the key findings of that research, explain the concepts of single-turn and multi-turn interactions, and provide practical tips on how to work effectively with AI based on these insights. Whether you\u2019re a casual user or someone relying on AI for complex tasks, understanding this behavior will help you avoid frustration and unlock more reliable AI outputs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understanding Single-Turn vs. Multi-Turn Interactions<\/h2>\n\n\n\n<p>Before diving into the study\u2019s findings, it\u2019s essential to clarify two fundamental concepts: <strong>single-turn<\/strong> and <strong>multi-turn<\/strong> interactions with AI.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single-turn<\/strong> means giving the AI all the necessary information and instructions in one go. You provide a comprehensive prompt, and the AI responds based on that complete input.<\/li>\n\n\n\n<li><strong>Multi-turn<\/strong> mimics a natural human conversation, where you interact with the AI in multiple steps\u2014gradually providing information, adjusting instructions, or correcting misunderstandings over several back-and-forth exchanges.<\/li>\n<\/ul>\n\n\n\n<p>For example, when I ask AI to help with a project, I might initially give a basic instruction. Later, I realize there are additional considerations or constraints, so I add those in follow-up messages. This iterative process is typical in multi-turn conversations.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/firebasestorage.googleapis.com\/v0\/b\/videotoblog-35c6e.appspot.com\/o\/%2Fusers%2Fmd8Dlk0UBldiM3vkKK7kilzP23z2%2Fblogs%2FbVFvJnFwNg1DvCgxlcIu%2Fscreenshots%2F0fcbde53-fd1b-431e-b0ef-4aae34d61ed4.webp?alt=media&amp;token=83bda902-7b6e-4706-9bd5-2aeeb0c632bb\" alt=\"Single-turn vs multi-turn conversation explanation\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">The Core Finding: AI&#8217;s Strength vs. Stability in Multi-Turn Conversations<\/h2>\n\n\n\n<p>The study reveals a fascinating yet counterintuitive result: while the AI\u2019s <strong>ability<\/strong> to produce good answers drops only slightly in multi-turn scenarios, the <strong>reliability<\/strong> or <em>stability<\/em> of those answers plummets significantly.<\/p>\n\n\n\n<p>What do these terms mean exactly?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ability<\/strong> refers to the AI\u2019s potential to generate high-quality responses. In the research, this was quantified by the 90th percentile score\u2014the performance level achieved by the top 10% of responses in many tests.<\/li>\n\n\n\n<li><strong>Reliability<\/strong> measures how consistent the AI\u2019s answers are. It\u2019s calculated as the difference between the 10th and 90th percentile scores. A small difference means the AI consistently delivers similar-quality responses, while a large gap indicates high variability\u2014sometimes excellent answers, other times poor ones.<\/li>\n<\/ul>\n\n\n\n<p>In multi-turn conversations, the AI occasionally produces near-best results but often swings widely in quality, making it less trustworthy overall. This means while the AI hasn\u2019t lost much raw capability, it becomes \u201cmoody\u201d or \u201cunstable\u201d as the conversation progresses.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/firebasestorage.googleapis.com\/v0\/b\/videotoblog-35c6e.appspot.com\/o\/%2Fusers%2Fmd8Dlk0UBldiM3vkKK7kilzP23z2%2Fblogs%2FbVFvJnFwNg1DvCgxlcIu%2Fscreenshots%2F21f72e30-5416-4054-aa5b-5a2126c428f3.webp?alt=media&amp;token=464fd1cd-5971-4d70-b632-059a2396162b\" alt=\"Graph showing ability vs reliability in AI responses\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">How Was This Tested? The Experiments Behind the Findings<\/h2>\n\n\n\n<p>To understand why this happens, the researchers compared AI performance on the same tasks under different prompting styles:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Single-turn prompting:<\/strong> All task-related information was given at once in a fully detailed prompt.<\/li>\n\n\n\n<li><strong>Multi-turn prompting:<\/strong> The same information was split into smaller pieces and fed to the AI sequentially in a conversation.<\/li>\n<\/ol>\n\n\n\n<p>Within multi-turn prompting, they tested several strategies:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Random small chunks:<\/strong> Information was delivered piece-by-piece in no particular order.<\/li>\n\n\n\n<li><strong>Unstructured summary:<\/strong> Fragmented information was combined and sent all at once without logical connections.<\/li>\n\n\n\n<li><strong>Summary after stepwise input:<\/strong> Information was given step-by-step, then a final summary prompt was provided.<\/li>\n\n\n\n<li><strong>Snowballing:<\/strong> Instructions were gradually built up toward a perfect prompt.<\/li>\n<\/ul>\n\n\n\n<p>They applied these strategies across various problem categories like coding and math summarization to see how well AI models performed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results: Multi-Turn Prompts Lead to Lower Scores and Less Stability<\/h2>\n\n\n\n<p>The results were striking. While <strong>summary-style multi-turn prompts<\/strong> saw only a modest drop in performance compared to single-turn, the <strong>small-chunk random prompts<\/strong> caused a dramatic fall in scores.<\/p>\n\n\n\n<p>Looking closely at popular AI models like ChatGPT-3.5, Gemini 2.5 Pro, and Claude 3.7:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>All models experienced a significant decline in accuracy during multi-turn interactions.<\/li>\n\n\n\n<li>The average score across all test categories dropped by about 39% compared to single-turn prompts.<\/li>\n<\/ul>\n\n\n\n<p>This confirms that the problem is not isolated to a single AI but is general across large language models (LLMs). Even improved multi-turn methods like snowballing and summarizing improved results somewhat but still lagged behind single-turn prompting.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/firebasestorage.googleapis.com\/v0\/b\/videotoblog-35c6e.appspot.com\/o\/%2Fusers%2Fmd8Dlk0UBldiM3vkKK7kilzP23z2%2Fblogs%2FbVFvJnFwNg1DvCgxlcIu%2Fscreenshots%2F2e5df871-9478-4111-8516-da77ead36e12.webp?alt=media&amp;token=7d8edadd-3d32-414e-b161-87f2f748ddfb\" alt=\"Performance comparison chart of AI models with different prompting methods\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Why Does AI Struggle with Multi-Turn Conversations?<\/h2>\n\n\n\n<p>The core reason, according to the paper, is how AI forms internal assumptions early in a dialogue and then stubbornly clings to them.<\/p>\n\n\n\n<p>At the start of a conversation, the AI has limited information. Unlike humans who might ask clarifying questions when unsure, AI tends to make the best guess it can and generates an initial hypothesis about what the user wants.<\/p>\n\n\n\n<p>Once this internal hypothesis is set, the AI heavily biases its subsequent outputs based on that first guess. Even when later turns provide new or contradicting information, the AI struggles to revise or discard its initial assumptions. This &#8220;anchoring&#8221; effect leads to confusion and instability as the conversation continues.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/firebasestorage.googleapis.com\/v0\/b\/videotoblog-35c6e.appspot.com\/o\/%2Fusers%2Fmd8Dlk0UBldiM3vkKK7kilzP23z2%2Fblogs%2FbVFvJnFwNg1DvCgxlcIu%2Fscreenshots%2F0074ce3e-42b4-444e-b517-3be587038e19.webp?alt=media&amp;token=3ebdf2e3-f0a5-4e82-8e1a-f0a4d91785fa\" alt=\"Diagram illustrating AI anchoring on initial hypothesis in conversation\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Practical Tips: How to Work with AI More Effectively<\/h2>\n\n\n\n<p>Understanding this behavior helps us adjust our approach to AI interactions to achieve better results. Here are some actionable recommendations:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Avoid Long Multi-Turn Conversations When Accuracy Matters<\/h3>\n\n\n\n<p>If you notice the AI\u2019s responses becoming inconsistent or off-track, don\u2019t try to fix it by continuing the same chat. Instead, start a new chat session to reset the AI\u2019s internal assumptions. This \u201cfresh start\u201d often leads to faster and more accurate answers.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/firebasestorage.googleapis.com\/v0\/b\/videotoblog-35c6e.appspot.com\/o\/%2Fusers%2Fmd8Dlk0UBldiM3vkKK7kilzP23z2%2Fblogs%2FbVFvJnFwNg1DvCgxlcIu%2Fscreenshots%2F44d500e3-557a-47ec-b6d0-7ec8694f86b1.webp?alt=media&amp;token=c4e99f83-7ad5-4770-9c24-e12f18c67e33\" alt=\"Start a new chat to reset AI assumptions\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">2. Compose Detailed, Structured Prompts Upfront<\/h3>\n\n\n\n<p>Whenever possible, gather all relevant information, constraints, background, and examples and include them in your initial prompt. Use formats like Markdown or YAML to organize the details clearly. This reduces the need for the AI to guess or fill in gaps later.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Summarize Past Conversations When Switching Chats<\/h3>\n\n\n\n<p>If you must switch to a new chat but want to maintain context, ask the AI to summarize the previous conversation and copy that summary into the new session. This helps the AI start with a more accurate understanding.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Use Multi-Turn Interactions for Casual or Exploratory Tasks<\/h3>\n\n\n\n<p>Multi-turn conversations are still great for brainstorming, simple questions, or informal idea generation. The instability generally doesn\u2019t matter much in these cases, and the back-and-forth can be enjoyable and productive.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/firebasestorage.googleapis.com\/v0\/b\/videotoblog-35c6e.appspot.com\/o\/%2Fusers%2Fmd8Dlk0UBldiM3vkKK7kilzP23z2%2Fblogs%2FbVFvJnFwNg1DvCgxlcIu%2Fscreenshots%2F59caad90-718e-4d51-8980-adbd51e2f6cf.webp?alt=media&amp;token=d4b8e52b-c1b8-4557-929b-b03a23d4c6ac\" alt=\"Using multi-turn conversation for casual AI interactions\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion: Use AI Conversations Wisely by Knowing Their Limits<\/h2>\n\n\n\n<p>While large language models have made tremendous strides, their internal mechanisms mean they often get \u201clost\u201d or inconsistent in multi-turn conversations when tasked with gradually absorbing complex instructions. This research clearly shows that AI\u2019s ability remains strong but its reliability suffers as the conversation lengthens.<\/p>\n\n\n\n<p>By understanding this, we can tailor our interactions to maximize AI\u2019s potential\u2014favoring well-structured single-turn prompts for precise work and reserving multi-turn chats for lighter, more flexible uses. Remember to reset chats when things get confusing and always try to pack your instructions carefully at the start.<\/p>\n\n\n\n<p>These insights are invaluable for anyone looking to harness AI effectively, whether for coding, writing, research, or business tasks. Embracing these best practices will save you time and frustration while unlocking the true power of AI assistance.<\/p>\n\n\n\n<p>For those interested in diving deeper, the original paper <a href=\"https:\/\/arxiv.org\/abs\/2505.06120\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cLLMs Get Lost In Multi-Turn Conversation\u201d<\/a> is a fascinating read. And if you want a comprehensive guide to AI usage basics, consider checking out the <strong>&#8220;AI Utilization Textbook&#8221;<\/strong>, which covers everything from foundational mechanisms to prompt engineering.<\/p>\n\n\n\n<p>Happy AI collaborating!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Have you ever had an extended conversation with an AI, only to feel like it\u2019s getting confused or stubbornly refusing to adjust its answers? Maybe you noticed it going in circles or giving inconsistent responses as the chat went on.&hellip;<\/p>\n","protected":false},"author":5,"featured_media":1672,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23,3],"tags":[],"class_list":["post-1671","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-academic","category-llm"],"_links":{"self":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts\/1671","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/comments?post=1671"}],"version-history":[{"count":1,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts\/1671\/revisions"}],"predecessor-version":[{"id":1673,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts\/1671\/revisions\/1673"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/media\/1672"}],"wp:attachment":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/media?parent=1671"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/categories?post=1671"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/tags?post=1671"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}