{"id":2122,"date":"2026-05-25T10:06:27","date_gmt":"2026-05-25T01:06:27","guid":{"rendered":"https:\/\/www.aicritique.org\/us\/?p=2122"},"modified":"2026-05-25T10:12:06","modified_gmt":"2026-05-25T01:12:06","slug":"corpus2skill-new-standard-of-knowledge-architecture-for-the-llm-era","status":"publish","type":"post","link":"https:\/\/www.aicritique.org\/us\/2026\/05\/25\/corpus2skill-new-standard-of-knowledge-architecture-for-the-llm-era\/","title":{"rendered":"Corpus2Skill &#8212; New Standard of Knowledge Architecture for the LLM Era"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Executive Summary<\/h2>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The core shift in enterprise knowledge systems is no longer just from \u201cdocuments\u201d to \u201cLLMs.\u201d It is from&nbsp;<strong>retrieving snippets<\/strong>&nbsp;toward&nbsp;<strong>structuring, navigating, editing, and exploring knowledge<\/strong>&nbsp;in forms that fit different kinds of work. Standard Retrieval-Augmented Generation, or RAG, remains the practical baseline because it is relatively easy to deploy, easy to update, and strong when the answer already exists in a small number of document fragments. Canonical and official sources describe the now-familiar pipeline: chunk documents, embed them, index them, retrieve relevant chunks, rerank them, inject them into context, and generate an answer. This improves grounding and can reduce hallucination relative to generation without retrieval, but it does not automatically solve search misses, fragmented context, or weak understanding of cross-document structure. (<a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al., 2020<\/a>;&nbsp;<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>;&nbsp;<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/vector-search-how-to-chunk-documents\">Azure chunking guide<\/a>;&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/rerank.html\">Bedrock reranking<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2307.03172\">Lost in the Middle<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2212.10509\">IRCoT<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The newer architectures each address a different blind spot.&nbsp;<strong>GraphRAG<\/strong>&nbsp;adds an explicit graph layer so that the system can reason over entities, relations, communities, and whole-corpus themes, not just top-k chunks.&nbsp;<strong>LLM Wiki<\/strong>&nbsp;treats knowledge not as something to rediscover on every query, but as something an LLM continuously rewrites into a readable Markdown knowledge base.&nbsp;<strong>Corpus2Skill<\/strong>&nbsp;goes one step further toward agent operations: instead of retrieving top-ranked chunks at runtime, it compiles a corpus offline into a hierarchical skill directory that an agent can navigate through&nbsp;<code>SKILL.md<\/code>&nbsp;and&nbsp;<code>INDEX.md<\/code>&nbsp;files.&nbsp;<strong>Mindware Research Institute\u2019s GNG+MST concept-structure approach<\/strong>&nbsp;points in a different but complementary direction: not \u201cfind the answer,\u201d but \u201cfind the structure,\u201d by visualizing clusters, concept distances, bridges, and latent exploration axes. (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">Microsoft GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/\">Microsoft GraphRAG docs<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy, \u201cLLM Wiki\u201d gist<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">Corpus2Skill GitHub<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi manual<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware concept-investigation PDF<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The practical conclusion is that there is no single \u201cpost-RAG\u201d architecture. The next phase is&nbsp;<strong>combinatorial by purpose<\/strong>. Use RAG for fast, grounded answers; GraphRAG when relationships and whole-corpus sensemaking matter; LLM Wiki when the organization needs an editable, compounding knowledge artifact; Corpus2Skill when agents must reliably traverse enterprise knowledge as an operational hierarchy; and GNG+MST when the goal is concept discovery, qualitative analysis, or research exploration rather than narrow question answering. The strategic design question is no longer \u201cWhich one replaces RAG?\u201d but \u201cWhich representation of knowledge best fits the work we need to do?\u201d (<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>;&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\">Microsoft GraphRAG blog<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi manual<\/a>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Title Proposals<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>Beyond Retrieval: Where Knowledge Architecture in the LLM Era Is Heading<\/strong><\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>From Search to Structure: RAG, GraphRAG, LLM Wiki, Corpus2Skill, and Concept Models<\/strong><\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>The Next Knowledge Stack: Retrieval, Graphs, Wikis, Skills, and Concept Structure<\/strong><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Why Knowledge Architecture Is Shifting<\/h2>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">Standard RAG remains the enterprise default because it is conceptually simple and operationally useful. In its common form, documents are chunked, chunk embeddings are computed, an index is built, a query triggers vector or hybrid retrieval, the candidate set is reranked, the top evidence is injected into the prompt, and the model generates an answer grounded in those retrieved fragments. Lewis et al.\u2019s original RAG paper established the idea of combining a parametric model with non-parametric external memory, while Azure and Amazon Bedrock documentation make explicit the modern production pipeline: chunking, embedding, retrieval, reranking, and response generation. (<a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al., 2020<\/a>;\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>;\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/ai-ml\/guide\/rag\/rag-information-retrieval\">Azure information retrieval guide<\/a>;\u00a0<a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/rerank.html\">Bedrock reranking<\/a>)<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"907\" height=\"118\" src=\"https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-5.png\" alt=\"\" class=\"wp-image-2124\" srcset=\"https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-5.png 907w, https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-5-300x39.png 300w, https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-5-768x100.png 768w\" sizes=\"auto, (max-width: 907px) 100vw, 907px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">This pipeline is good at&nbsp;<strong>finding local answers<\/strong>. If the question is \u201cWhat does the return policy say?\u201d or \u201cWhich API parameter controls retries?\u201d RAG often works well because the answer can be grounded in one or a few fragments. In that sense, RAG is like a strong search engine that can also write. It improves freshness and grounding relative to relying only on model parameters, and it can reduce hallucination when relevant evidence is actually retrieved. (<a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al., 2020<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2310.11511\">Self-RAG<\/a>;&nbsp;<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">But the limits have become increasingly visible. First, retrieval can simply miss the right evidence. BM25 is robust for lexical matches and remains a strong baseline, but it depends on term overlap and ranking heuristics; dense retrieval depends on embedding geometry and nearest-neighbor search; both can fail when the question is underspecified or when the necessary evidence is distributed across multiple documents. Second, chunking fragments context. A document may make sense as a whole even when no single chunk cleanly answers the question. Third, long context windows do not magically solve the issue, because models can still underuse relevant information placed in the middle of long prompts. Research such as IRCoT and Lost in the Middle shows why single-shot retrieve-and-read pipelines often struggle on multi-step reasoning and long-context use. (<a href=\"https:\/\/nlp.stanford.edu\/IR-book\/html\/htmledition\/okapi-bm25-a-non-binary-model-1.html\">Stanford IR book on BM25<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2004.04906\">DPR<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2212.10509\">IRCoT<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2307.03172\">Lost in the Middle<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">That is why the field is moving from pure retrieval toward&nbsp;<strong>knowledge forms<\/strong>&nbsp;better suited to different tasks. If RAG is a search box, GraphRAG is a map of connections, LLM Wiki is an edited handbook, Corpus2Skill is an agent-operable directory tree, and GNG+MST is a terrain map of meaning. The shift is not away from retrieval entirely, but away from assuming retrieval should be the only organizing principle. (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">Microsoft GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware concept-investigation PDF<\/a>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Corpus2Skill in Detail<\/h2>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">Corpus2Skill, introduced in the preprint&nbsp;<strong>\u201cDon\u2019t Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG\u201d<\/strong>&nbsp;(arXiv: 2604.14572, April 2026) and implemented in the public repository&nbsp;<strong>dukesun99\/Corpus2Skill<\/strong>, proposes a very specific inversion of the usual RAG assumption: do more work&nbsp;<strong>offline<\/strong>, so the runtime agent needs less retrieval and more guided navigation. The paper\u2019s key claim is that enterprise knowledge can be distilled into a&nbsp;<strong>navigable skill hierarchy<\/strong>&nbsp;instead of a search index. The repository README describes the same idea in practical terms: take an arbitrary document corpus, compile it into a hierarchically arranged skill tree, and let the serving-time agent traverse that tree instead of querying a retrieval stack on every question. The repository is explicitly labeled an&nbsp;<strong>early release \/ work in progress<\/strong>, so it should be treated as promising but experimental. (<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">Corpus2Skill GitHub README<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">Mechanistically, Corpus2Skill has an offline&nbsp;<strong>compile phase<\/strong>&nbsp;and a runtime&nbsp;<strong>serve phase<\/strong>. In the compile phase, the system reads the corpus, computes embeddings, clusters the corpus hierarchically, and uses an LLM to generate concise labels and summaries for each level. The public implementation specifies default models in the README, including&nbsp;<code>Qwen\/Qwen3-Embedding-0.6B<\/code>&nbsp;for embeddings and&nbsp;<code>claude-sonnet-4-6<\/code>&nbsp;for summarization at the time of the public README reviewed here. The output is a file-system-style hierarchy containing&nbsp;<code>SKILL.md<\/code>,&nbsp;<code>INDEX.md<\/code>, and document references. The paper explains that hard assignment is used so that each document lives on one path in the tree, making the hierarchy materializable as a directory structure. (<a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">Corpus2Skill GitHub README<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">At runtime, the agent does not begin by asking a vector database for the top-k nearest chunks. Instead, it begins with a high-level overview of the available skills and descends the tree. It reads a top-level&nbsp;<code>SKILL.md<\/code>, chooses a branch, opens lower-level&nbsp;<code>INDEX.md<\/code>&nbsp;files, and only then fetches source documents through a document lookup when needed. In other words, the agent is following a&nbsp;<strong>knowledge directory<\/strong>, not issuing repeated blind searches. That is the practical meaning of \u201cnavigate, don\u2019t retrieve.\u201d The knowledge base is pre-shaped into an explorable hierarchy, like moving through a company\u2019s internal operations manual rather than typing keywords into a search bar. (<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">This makes Corpus2Skill fundamentally different from both BM25 and standard vector-database RAG. BM25 ranks documents by lexical relevance using frequency-based scoring; dense retrieval ranks by embedding similarity; standard RAG usually turns those rankings into a top-k context pack for the model. Corpus2Skill still uses embeddings during compilation, but&nbsp;<strong>its runtime interface is not a ranked list of chunks<\/strong>. It is a navigation problem over a tree of summaries and directories. That matters because it gives the agent explicit \u201cbranching choices.\u201d If a chosen path looks wrong, the agent can back up and try another branch, rather than being silently constrained by a possibly poor top-k retrieval result. (<a href=\"https:\/\/nlp.stanford.edu\/IR-book\/html\/htmledition\/okapi-bm25-a-non-binary-model-1.html\">Stanford IR book on BM25<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2004.04906\">DPR<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The paper evaluates Corpus2Skill on&nbsp;<strong>WixQA<\/strong>, a benchmark for enterprise support-style QA based on the Wix Help Center snapshot dated 2024-12-02. The Corpus2Skill paper reports, on the 200-query ExpertWritten split, a&nbsp;<strong>Token F1 of 0.460<\/strong>, compared with&nbsp;<strong>0.342 for BM25<\/strong>,&nbsp;<strong>0.363 for Dense<\/strong>,&nbsp;<strong>0.389 for RAPTOR<\/strong>, and&nbsp;<strong>0.388 for Agentic RAG<\/strong>. It also reports&nbsp;<strong>Factuality 0.729<\/strong>&nbsp;and&nbsp;<strong>Context Recall 0.652<\/strong>, again above the reported baselines in that setting. The associated WixQA benchmark paper and dataset card describe the benchmark composition, including expert-written, simulated, and synthetic queries over the Wix knowledge base. These are strong results, but they should still be read as benchmark-specific evidence rather than universal proof. (<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2505.08643\">WixQA paper<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The trade-offs are equally important. Corpus2Skill\u2019s runtime can be expensive because the skill files themselves consume input tokens. The paper reports roughly&nbsp;<strong>$0.172 per query<\/strong>&nbsp;in its main setting, and the public README says Anthropic prompt caching reduced that to&nbsp;<strong>$0.089 per query<\/strong>&nbsp;on the WixQA benchmark in repository testing. The paper also names several limitations: dependence on Anthropic\u2019s Skills-style serving pattern, top-level routing errors caused by hard single-path clustering, and lack of mature support for incremental updates without recompilation. Those are not minor footnotes. They mean Corpus2Skill is best understood as a serious architectural idea at an early stage, not yet a drop-in replacement for every production RAG system. (<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">Corpus2Skill GitHub README<\/a>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Comparative Map of RAG, GraphRAG, LLM Wiki, and GNG+MST<\/h2>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>Standard RAG<\/strong>&nbsp;is the retrieval-centered baseline. Its advantages are operational familiarity, modular indexing, relatively easy updates, and good performance on questions whose answers already exist in locally retrievable evidence. Its weaknesses are equally well known: search misses, chunk fragmentation, weak global structure, and the tendency to confuse \u201cmore retrieved text\u201d with \u201cbetter understanding.\u201d (<a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al., 2020<\/a>;&nbsp;<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>GraphRAG<\/strong>, especially in Microsoft\u2019s implementation, adds another layer of knowledge representation: extract entities, relations, and claims; detect communities; generate community reports; and support both local and global search modes. Global search is aimed at whole-corpus sensemaking, while local search starts from semantically relevant entities and expands into related neighborhoods. One practical detail from the official docs matters: GraphRAG does&nbsp;<strong>not necessarily require a dedicated graph database<\/strong>. Microsoft\u2019s OSS implementation stores outputs in Parquet tables plus a vector store. Its real cost lies elsewhere: extraction quality, prompt tuning, report generation, hierarchy design, re-indexing, and operational maintenance. The docs openly state that out-of-the-box settings are not always optimal and that some features, such as claim extraction, require tuning to be useful. (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">Microsoft GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/index\/overview\/\">Microsoft GraphRAG docs overview<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/\">Microsoft GraphRAG docs<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>LLM Wiki<\/strong>, from Andrej Karpathy\u2019s gist published on April 4, 2026, is not a product and not a benchmarked framework. It is a pattern. The key move is to convert raw sources into a persistent&nbsp;<strong>Markdown wiki<\/strong>&nbsp;that the LLM can read from, edit, and lint over time. Karpathy separates the system into raw sources, the wiki itself, and a schema or conventions layer. He also proposes three recurrent operations: ingest new sources, answer questions from wiki pages, and lint the wiki for contradictions, stale claims, and missing links. For personal or small-to-medium corpora, he argues that simple page indexes can work surprisingly well without a full embedding retrieval stack, roughly around the scale of about a hundred sources and hundreds of pages. That makes LLM Wiki especially relevant where the goal is&nbsp;<strong>knowledge compression, editing, and accumulation<\/strong>, not just one-shot retrieval. (<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>Mindware\u2019s GNG+MST concept-structure approach<\/strong>&nbsp;comes from official ThinkNavi and ConceptMiner materials rather than a single canonical research paper. The public manuals and product pages describe a pipeline that uses embeddings and related data, organizes them with Growing Neural Gas and a Minimum Spanning Tree, and uses LLMs to help label dimensions, clusters, and semantic interpretations. The ThinkNavi manual contains the succinct formulation,&nbsp;<strong>\u201cRAG finds answers. ThinkNavi finds structure.\u201d<\/strong>&nbsp;That phrase is analytically useful. It captures the fact that this architecture is not primarily about point-answer retrieval. It is about surfacing conceptual neighborhoods, bridges, latent themes, exploration axes, and the relation between an embedding space and human-interpretable conceptual structure. Public official materials emphasize applications in strategic thinking, market or technology exploration, qualitative data analysis, and research support. What is&nbsp;<strong>not publicly confirmed<\/strong>&nbsp;in the official materials reviewed is a standardized external benchmark comparable to enterprise QA benchmarks such as WixQA. (<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi manual, chapter 1<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-20\/\">ThinkNavi glossary\/manual<\/a>;&nbsp;<a href=\"https:\/\/conceptminer.ai\/?lang=ja&amp;page_id=111\">ConceptMiner developer page<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware concept-investigation PDF<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The simplest way to remember the difference is by metaphor. RAG is a&nbsp;<strong>search engine<\/strong>. GraphRAG is a&nbsp;<strong>map of relations<\/strong>. LLM Wiki is a&nbsp;<strong>living handbook<\/strong>. Corpus2Skill is an&nbsp;<strong>agent-operable directory tree<\/strong>. GNG+MST is a&nbsp;<strong>terrain map of concepts<\/strong>. These are not interchangeable metaphors; they correspond to different knowledge representations and different operational burdens.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-left\" data-align=\"left\">Axis<\/th><th class=\"has-text-align-left\" data-align=\"left\">Standard RAG<\/th><th class=\"has-text-align-left\" data-align=\"left\">GraphRAG<\/th><th class=\"has-text-align-left\" data-align=\"left\">LLM Wiki<\/th><th class=\"has-text-align-left\" data-align=\"left\">Corpus2Skill<\/th><th class=\"has-text-align-left\" data-align=\"left\">GNG+MST<\/th><th class=\"has-text-align-left\" data-align=\"left\">Representative primary sources<\/th><\/tr><\/thead><tbody><tr><td>Primary purpose<\/td><td>Ground answers with retrieved evidence<\/td><td>Capture relations and whole-corpus themes<\/td><td>Edit and accumulate knowledge into readable pages<\/td><td>Let agents traverse enterprise knowledge as skills<\/td><td>Visualize and explore conceptual structure<\/td><td><a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al.<\/a>;&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi<\/a><\/td><\/tr><tr><td>Input data<\/td><td>Documents, PDFs, FAQs, tickets<\/td><td>Unstructured text converted into entities\/relations\/claims<\/td><td>Raw source corpus<\/td><td>Document corpus compiled offline<\/td><td>Embeddings, text, and optionally related structured data<\/td><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/index\/overview\/\">GraphRAG docs<\/a>;&nbsp;<a href=\"https:\/\/conceptminer.ai\/?lang=ja&amp;page_id=111\">ConceptMiner<\/a><\/td><\/tr><tr><td>Knowledge representation<\/td><td>Chunks + embeddings + index<\/td><td>Graph + communities + reports + embeddings<\/td><td>Markdown pages + index\/log + schema<\/td><td><code>SKILL.md<\/code>,&nbsp;<code>INDEX.md<\/code>, document store, skill tree<\/td><td>GNG nodes, MST backbone, clusters, labeled dimensions<\/td><td><a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">Corpus2Skill README<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a><\/td><\/tr><tr><td>Search or exploration method<\/td><td>Retrieve top-k, often hybrid, then rerank<\/td><td>Global, local, and community-based graph search<\/td><td>Read indexes and linked pages; optional search<\/td><td>Descend a hierarchy, then fetch documents<\/td><td>Explore clusters, distances, bridges, sparse regions<\/td><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/ai-ml\/guide\/rag\/rag-information-retrieval\">Azure retrieval<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/\">GraphRAG docs<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a><\/td><\/tr><tr><td>Role of the LLM<\/td><td>Final answer generation; sometimes reranking and query rewrites<\/td><td>Extraction, summarization, report generation, answering<\/td><td>Writing, editing, querying, linting the wiki<\/td><td>Summarizing clusters and navigating the hierarchy<\/td><td>Labeling and interpreting concept structure<\/td><td><a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/rerank.html\">Bedrock rerank<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-20\/\">ThinkNavi<\/a><\/td><\/tr><tr><td>Explicitness of structure<\/td><td>Low to moderate<\/td><td>High<\/td><td>High<\/td><td>High<\/td><td>High, but geometric rather than symbolic<\/td><td><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a><\/td><\/tr><tr><td>Update ease<\/td><td>Usually good<\/td><td>Moderate to difficult<\/td><td>Moderate; depends on wiki discipline<\/td><td>Lower; recompilation is an issue<\/td><td>Not publicly confirmed in detail<\/td><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a><\/td><\/tr><tr><td>Scalability<\/td><td>High in common enterprise settings<\/td><td>Potentially high, but costly to build and maintain<\/td><td>Best for personal to medium corpora<\/td><td>Moderate; runtime pattern and hierarchy constraints matter<\/td><td>Public enterprise claim exists, external benchmark evidence limited<\/td><td><a href=\"https:\/\/microsoft.github.io\/graphrag\/\">GraphRAG docs<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/about\/\">ThinkNavi<\/a><\/td><\/tr><tr><td>Implementation and ops cost<\/td><td>Low to moderate<\/td><td>High<\/td><td>Moderate<\/td><td>Moderate to high<\/td><td>Moderate to high, platform-dependent<\/td><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/\">GraphRAG docs<\/a>;&nbsp;<a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">Corpus2Skill README<\/a><\/td><\/tr><tr><td>Best use cases<\/td><td>FAQ, internal search, support answers<\/td><td>Cross-document reasoning, relationship analysis, thematic summarization<\/td><td>Personal\/team knowledge bases, reading and synthesis<\/td><td>Agent-facing enterprise support and procedural knowledge<\/td><td>Strategy, concept exploration, qualitative analysis, research support<\/td><td><a href=\"https:\/\/arxiv.org\/abs\/2505.08643\">WixQA<\/a>;&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\">GraphRAG blog<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a><\/td><\/tr><tr><td>Weak use cases<\/td><td>Global sensemaking, latent-theme discovery<\/td><td>Cheapest simple FAQ serving<\/td><td>Strict real-time retrieval at large scale<\/td><td>Highly dynamic corpora with frequent updates<\/td><td>Precision FAQ answer serving<\/td><td><a href=\"https:\/\/arxiv.org\/abs\/2307.03172\">Lost in the Middle<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi manual<\/a><\/td><\/tr><tr><td>Hallucination mitigation<\/td><td>Moderate if retrieval succeeds<\/td><td>Moderate to high when graph evidence is good<\/td><td>Moderate; edited knowledge helps, but summarization bias remains<\/td><td>Potentially strong for navigable grounding, but routing errors remain<\/td><td>Indirect; primary goal is structure, not answer grounding<\/td><td><a href=\"https:\/\/arxiv.org\/abs\/2310.11511\">Self-RAG<\/a>;&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a><\/td><\/tr><tr><td>Enterprise suitability<\/td><td>High<\/td><td>Moderate to high<\/td><td>Moderate to high<\/td><td>High for stable knowledge domains<\/td><td>Moderate to high for analysis and exploration<\/td><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/about\/\">ThinkNavi<\/a><\/td><\/tr><tr><td>Research suitability<\/td><td>Moderate<\/td><td>High<\/td><td>High<\/td><td>Moderate<\/td><td>Very high<\/td><td><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Risks, Selection Guidance, and a Combined Hypothesis<\/h2>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">Every architecture has a characteristic failure mode. In RAG, the classic problem is&nbsp;<strong>search omission<\/strong>: if retrieval misses the right evidence, grounding collapses. In GraphRAG, the main risk is&nbsp;<strong>upstream structuring error<\/strong>: bad entity extraction, poor relation extraction, or weak community summaries can distort downstream reasoning. In LLM Wiki, the risk is&nbsp;<strong>editorial bias<\/strong>: once the LLM rewrites and condenses the source material, errors can become persistent and culturally invisible because they now look like \u201corganized knowledge.\u201d In Corpus2Skill, the main risks are&nbsp;<strong>routing mistakes<\/strong>&nbsp;and&nbsp;<strong>recompilation burden<\/strong>: a document forced into one path may belong in several, and the hierarchy is not trivial to update incrementally. In GNG+MST, the difficulty is often&nbsp;<strong>evaluation<\/strong>: the value may lie in revealing a structure, white space, or bridge concept that no standard QA metric captures. (<a href=\"https:\/\/arxiv.org\/abs\/2212.10509\">IRCoT<\/a>;&nbsp;<a href=\"https:\/\/microsoft.github.io\/graphrag\/\">GraphRAG docs<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">For use-case selection, the practical pattern is straightforward. For&nbsp;<strong>FAQ and customer support<\/strong>, start with standard or hybrid RAG, and consider Corpus2Skill when the corpus is relatively stable and questions require agents to traverse several related support topics. For&nbsp;<strong>internal document search<\/strong>, standard RAG is still the sensible first layer, with GraphRAG added when cross-team relationships or thematic structure matter. For&nbsp;<strong>literature review or due diligence<\/strong>, LLM Wiki is powerful because knowledge compounds instead of being re-synthesized from scratch. For&nbsp;<strong>technical documentation exploration<\/strong>, use RAG for pinpoint lookup and Corpus2Skill if the user journey behaves more like following a structured troubleshooting tree. For&nbsp;<strong>strategy and market research<\/strong>, and especially for&nbsp;<strong>qualitative data analysis<\/strong>&nbsp;or&nbsp;<strong>new-business concept exploration<\/strong>, GNG+MST is especially attractive because the goal is often to expose latent themes, conceptual adjacency, and underexplored spaces rather than to answer a single factual question. (<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/conceptminer.ai\/?lang=ja&amp;page_id=111\">ConceptMiner<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The most interesting design hypothesis is that&nbsp;<strong>Corpus2Skill and GNG+MST may be complementary rather than competitive<\/strong>. This is a hypothesis, not a publicly confirmed integration. Corpus2Skill organizes knowledge into an agent-usable hierarchy. GNG+MST aims to discover the latent conceptual structure of a corpus: clusters, bridges, sparse zones, and exploration axes. If GNG+MST is good at revealing the hidden terrain of a knowledge space, then LLM Wiki or a similar editing layer could turn those discoveries into curated, human-readable knowledge pages, and Corpus2Skill could then compile those pages into an agent-operable skill tree. In that stack, concept discovery informs knowledge editing, and knowledge editing informs agent navigation. That would be a plausible architecture for enterprise research support, strategy work, and advanced knowledge platforms, but it remains a hypothesis until a public implementation or case study appears. (<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>)<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"894\" height=\"76\" src=\"https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-6.png\" alt=\"\" class=\"wp-image-2125\" srcset=\"https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-6.png 894w, https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-6-300x26.png 300w, https:\/\/www.aicritique.org\/us\/wp-content\/uploads\/2026\/05\/image-6-768x65.png 768w\" sizes=\"auto, (max-width: 894px) 100vw, 894px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">That is also why \u201cthe next thing after RAG\u201d is the wrong question. The better question is:&nbsp;<strong>Which knowledge form best matches the work?<\/strong>&nbsp;Support answers need grounding. Strategy work needs pattern discovery. Agent workflows need navigation. Research synthesis needs editable accumulation. No single architecture does all of these equally well. (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\">Microsoft GraphRAG blog<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi manual<\/a>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion and References<\/h2>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">The most defensible conclusion from primary and official sources is this:&nbsp;<strong>the future of knowledge architecture in the LLM era is plural, layered, and purpose-specific<\/strong>. RAG remains the default answer engine. GraphRAG extends that engine toward explicit relational structure and multi-document sensemaking. LLM Wiki treats knowledge as a living editorial artifact that compounds over time. Corpus2Skill reframes enterprise knowledge as a navigable hierarchy optimized for agent behavior. Mindware\u2019s GNG+MST approach frames knowledge as conceptual terrain that can be explored for bridges, themes, and strategic white space. None of these makes the others obsolete. They solve different knowledge problems. (<a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al., 2020<\/a>;&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">GraphRAG paper<\/a>;&nbsp;<a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Karpathy gist<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi manual<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">For practitioners, the decision rule is simple. If your question is \u201cHow do I answer correctly and cite evidence?\u201d begin with RAG. If it becomes \u201cHow are these things connected across the corpus?\u201d add GraphRAG. If it becomes \u201cHow do we build a durable, editable knowledge artifact instead of re-answering the same questions forever?\u201d consider LLM Wiki. If it becomes \u201cHow do we let agents move through enterprise knowledge reliably?\u201d evaluate Corpus2Skill. If it becomes \u201cHow do we discover themes, clusters, bridges, and unexplored directions?\u201d look seriously at concept-structure approaches such as GNG+MST. In practice, the best enterprise architectures will often combine two or more of these layers rather than betting everything on one. (<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure RAG overview<\/a>;&nbsp;<a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Corpus2Skill paper<\/a>;&nbsp;<a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware PDF<\/a>)<\/p>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\">A concise practitioner guide is below.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-left\" data-align=\"left\">Situation<\/th><th class=\"has-text-align-left\" data-align=\"left\">Best starting point<\/th><th class=\"has-text-align-left\" data-align=\"left\">Add next when needed<\/th><\/tr><\/thead><tbody><tr><td>FAQ, help center, support answers<\/td><td>Standard or hybrid RAG<\/td><td>Corpus2Skill for navigable agent servicing<\/td><\/tr><tr><td>Internal document search<\/td><td>Standard RAG<\/td><td>GraphRAG for relation-level exploration<\/td><\/tr><tr><td>Literature review, due diligence<\/td><td>LLM Wiki<\/td><td>GraphRAG or GNG+MST for thematic structure<\/td><\/tr><tr><td>Product manuals and technical docs<\/td><td>RAG<\/td><td>Corpus2Skill for guided troubleshooting paths<\/td><\/tr><tr><td>Strategy and market research<\/td><td>GNG+MST<\/td><td>LLM Wiki for editorial consolidation<\/td><\/tr><tr><td>Qualitative data analysis<\/td><td>GNG+MST<\/td><td>GraphRAG for relation mapping<\/td><\/tr><tr><td>Agent-facing enterprise knowledge<\/td><td>Corpus2Skill<\/td><td>RAG fallback for direct evidence lookup<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>References &amp; Links<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-left\" data-align=\"left\">Source<\/th><th class=\"has-text-align-left\" data-align=\"left\">Type<\/th><th class=\"has-text-align-left\" data-align=\"left\">Publication or update date<\/th><th class=\"has-text-align-left\" data-align=\"left\">Notes<\/th><\/tr><\/thead><tbody><tr><td><a href=\"https:\/\/arxiv.org\/abs\/2005.11401\">Lewis et al., \u201cRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks\u201d<\/a><\/td><td>Paper<\/td><td>2020<\/td><td>Canonical RAG paper<\/td><\/tr><tr><td><a href=\"https:\/\/arxiv.org\/abs\/2004.04906\">Karpukhin et al., \u201cDense Passage Retrieval\u201d<\/a><\/td><td>Paper<\/td><td>2020<\/td><td>Canonical dense retrieval reference<\/td><\/tr><tr><td><a href=\"https:\/\/arxiv.org\/abs\/2310.11511\">Asai et al., \u201cSelf-RAG\u201d<\/a><\/td><td>Paper<\/td><td>2023<\/td><td>Retrieval, generation, critique loop<\/td><\/tr><tr><td><a href=\"https:\/\/arxiv.org\/abs\/2212.10509\">Trivedi et al., \u201cIRCoT\u201d<\/a><\/td><td>Paper<\/td><td>2022<\/td><td>Retrieval interleaved with chain-of-thought<\/td><\/tr><tr><td><a href=\"https:\/\/arxiv.org\/abs\/2307.03172\">Liu et al., \u201cLost in the Middle\u201d<\/a><\/td><td>Paper<\/td><td>2023<\/td><td>Long-context placement effects<\/td><\/tr><tr><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/retrieval-augmented-generation-overview\">Azure AI Search: RAG overview<\/a><\/td><td>Official documentation<\/td><td>Update date not publicly confirmed here<\/td><td>Standard enterprise RAG pipeline<\/td><\/tr><tr><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/vector-search-how-to-chunk-documents\">Azure AI Search: document chunking<\/a><\/td><td>Official documentation<\/td><td>Update date not publicly confirmed here<\/td><td>Chunking guidance<\/td><\/tr><tr><td><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/ai-ml\/guide\/rag\/rag-information-retrieval\">Azure architecture guide: RAG information retrieval<\/a><\/td><td>Official documentation<\/td><td>Update date not publicly confirmed here<\/td><td>Retrieval patterns<\/td><\/tr><tr><td><a href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/rerank.html\">Amazon Bedrock: reranking<\/a><\/td><td>Official documentation<\/td><td>Update date not publicly confirmed here<\/td><td>Reranking stage<\/td><\/tr><tr><td><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization\/\">Microsoft Research, \u201cFrom Local to Global: A Graph RAG Approach to Query-Focused Summarization\u201d<\/a><\/td><td>Paper \/ research page<\/td><td>2024<\/td><td>Foundational GraphRAG paper<\/td><\/tr><tr><td><a href=\"https:\/\/microsoft.github.io\/graphrag\/\">Microsoft GraphRAG documentation<\/a><\/td><td>Official documentation<\/td><td>Update date not publicly confirmed here<\/td><td>Current OSS reference<\/td><\/tr><tr><td><a href=\"https:\/\/microsoft.github.io\/graphrag\/index\/overview\/\">Microsoft GraphRAG overview<\/a><\/td><td>Official documentation<\/td><td>Update date not publicly confirmed here<\/td><td>Pipeline and data outputs<\/td><\/tr><tr><td><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\">Microsoft Research blog on GraphRAG<\/a><\/td><td>Official blog<\/td><td>2024<\/td><td>Conceptual motivations and use cases<\/td><\/tr><tr><td><a href=\"https:\/\/arxiv.org\/html\/2604.14572v1\">Yiqun Sun et al., \u201cDon\u2019t Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG\u201d<\/a><\/td><td>Paper<\/td><td>April 2026<\/td><td>Corpus2Skill primary paper<\/td><\/tr><tr><td><a href=\"https:\/\/github.com\/dukesun99\/Corpus2Skill\/blob\/main\/README.md\">dukesun99\/Corpus2Skill README<\/a><\/td><td>Official repository<\/td><td>Public repo; exact last-update date not confirmed here<\/td><td>Implementation details; early release status; cost note<\/td><\/tr><tr><td><a href=\"https:\/\/arxiv.org\/abs\/2505.08643\">WixQA benchmark paper<\/a><\/td><td>Paper<\/td><td>May 2025<\/td><td>Enterprise support-style RAG benchmark<\/td><\/tr><tr><td><a href=\"https:\/\/gist.github.com\/karpathy\/442a6bf555914893e9891c11519de94f\">Andrej Karpathy, \u201cLLM Wiki\u201d gist<\/a><\/td><td>Primary gist<\/td><td>April 4, 2026<\/td><td>Pattern proposal, not a product spec<\/td><\/tr><tr><td><a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-01\/\">ThinkNavi user manual, chapter 1<\/a><\/td><td>Official manual<\/td><td>Update date not publicly confirmed here<\/td><td>Contains \u201cRAG finds answers. ThinkNavi finds structure.\u201d<\/td><\/tr><tr><td><a href=\"https:\/\/www.thinknavi.ai\/user-manual-2\/chapter-20\/\">ThinkNavi glossary\/manual chapter<\/a><\/td><td>Official manual<\/td><td>Update date not publicly confirmed here<\/td><td>GNG\/MST terminology<\/td><\/tr><tr><td><a href=\"https:\/\/www.thinknavi.ai\/about\/\">ThinkNavi about page<\/a><\/td><td>Official site<\/td><td>Update date not publicly confirmed here<\/td><td>Product positioning<\/td><\/tr><tr><td><a href=\"https:\/\/conceptminer.ai\/?lang=ja&amp;page_id=111\">ConceptMiner developer page<\/a><\/td><td>Official site<\/td><td>Update date not publicly confirmed here<\/td><td>GNG+MST and developer-facing description<\/td><\/tr><tr><td><a href=\"https:\/\/www.mindware-jp.com\/files\/Conceptual_Investigation\/%E6%A6%82%E5%BF%B5%E8%AA%BF%E6%9F%BB%E3%81%A8%E6%BD%9C%E5%9C%A8%E7%A9%BA%E9%96%93.pdf\">Mindware Research Institute concept-investigation PDF<\/a><\/td><td>Official PDF<\/td><td>Publication date not publicly confirmed in the file path reviewed<\/td><td>Core conceptual explanation of GNG+MST as concept investigation<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>Public-detail caveats.<\/strong>&nbsp;Corpus2Skill is publicly presented as an early release and experimentally promising; its portability beyond the current skills-centric serving pattern and its incremental-update story are not yet mature in the public materials. Karpathy\u2019s LLM Wiki is intentionally abstract and pattern-oriented, not a standardized benchmarked system. GraphRAG is well documented but operationally heavy and tuning-sensitive. For Mindware\u2019s GNG+MST materials, external benchmark-style validation comparable to enterprise QA benchmarks was&nbsp;<strong>not publicly confirmed<\/strong>&nbsp;in the official sources reviewed for this report.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Executive Summary The core shift in enterprise knowledge systems is no longer just from \u201cdocuments\u201d to \u201cLLMs.\u201d It is from&nbsp;retrieving snippets&nbsp;toward&nbsp;structuring, navigating, editing, and exploring knowledge&nbsp;in forms that fit different kinds of work. Standard Retrieval-Augmented Generation, or RAG, remains the&hellip;<\/p>\n","protected":false},"author":4,"featured_media":2128,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,21,9,59],"tags":[],"class_list":["post-2122","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-llm","category-main","category-rag","category-trende"],"_links":{"self":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts\/2122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/comments?post=2122"}],"version-history":[{"count":2,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts\/2122\/revisions"}],"predecessor-version":[{"id":2129,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/posts\/2122\/revisions\/2129"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/media\/2128"}],"wp:attachment":[{"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/media?parent=2122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/categories?post=2122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aicritique.org\/us\/wp-json\/wp\/v2\/tags?post=2122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}