{"id":30787,"date":"2025-12-24T21:10:00","date_gmt":"2025-12-24T21:10:00","guid":{"rendered":"https:\/\/www.engineernewsnetwork.com\/blog\/?p=30787"},"modified":"2025-12-22T16:49:52","modified_gmt":"2025-12-22T16:49:52","slug":"shrinking-ai-memory-boosts-accuracy-study-finds","status":"publish","type":"post","link":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/","title":{"rendered":"Shrinking AI memory boosts accuracy, study finds"},"content":{"rendered":"\n<p>Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or help save significant amounts of energy.\u00a0<\/p>\n\n\n\n<p>Experts from University of Edinburgh and NVIDIA found that large language models (LLMs) using memory eight times smaller than an uncompressed LLM scored better on maths, science and coding tests while spending the same amount of time reasoning.\u00a0The method can be used in an alternative way to help LLMs respond to more user queries simultaneously, reducing the amount of power needed per task.<\/p>\n\n\n\n<p>As well as energy savings, experts say the improvements could benefit AI systems that are used to solve complicated tasks or in devices that have slow or limited memory, such as smart home devices and wearable technology.&nbsp;<\/p>\n\n\n\n<p>By \u201cthinking\u201d about more complex hypotheses or exploring more hypotheses concurrently, AI models improve their problem-solving abilities. In practice, this is achieved by generating more reasoning threads \u2013 a step-by-step logical process used to solve problems \u2013 in text form.&nbsp;<\/p>\n\n\n\n<p>The model\u2019s memory \u2013 called the KV cache \u2013 which stores the portions of the threads generated, can act as a bottleneck, as its size slows down the generation of reasoning thread outputs during inference \u2013 the process by which AI models respond to an input prompt, such as answering a user query.&nbsp;<\/p>\n\n\n\n<p>The more threads there are, and the longer they are, the more memory is required. The larger the memory size used, the longer the LLM takes to retrieve the KV cache from the part of the AI device where it is stored.&nbsp;<\/p>\n\n\n\n<p>To overcome this, the team developed a method to compress the models\u2019 memory \u2013 called Dynamic Memory Sparsification (DMS). Instead of keeping every token \u2013 the units of data that an AI model processes \u2013 DMS decides which ones are important enough to keep and which ones can be deleted.&nbsp;<\/p>\n\n\n\n<p>There is a slight delay between the time when the decisions to delete tokens using sparsification are made and when they are removed. This gives the model a chance to pass on any valuable information from the evicted tokens to preserved ones.&nbsp;<\/p>\n\n\n\n<p>In managing which tokens to keep and which to discard, DMS lets the AI model &#8220;think\u201d in more depth or explore more possible solutions without needing extra computer power.<\/p>\n\n\n\n<p>The researchers tested DMS on different versions of the AI models Llama and Qwen and compared their performance to models without compression.&nbsp;<\/p>\n\n\n\n<p>The models\u2019 performance was assessed using standardised tests. It was found even with memories compressed to one eighth their original size, LLMs fully retain their original accuracy in difficult tasks while accelerating reasoning compared with non-compressed models.&nbsp;<\/p>\n\n\n\n<p>In the standardised maths test AIME 24, which served as the qualifier for the United States Mathematical Olympiad, the compressed models performed twelve points better on average using the same number of KV cache reads to produce an answer.<\/p>\n\n\n\n<p>For GPQA Diamond \u2013 a series of complex questions in biology, chemistry and physics authored by PhD-level experts \u2013 the models performed over eight points better.<\/p>\n\n\n\n<p>The models were also tested with LiveCode Bench, which measures how well AI models can write code. The compressed models scored on average ten points better than non-compressed models.&nbsp;<\/p>\n\n\n\n<p>The findings from this work were peer reviewed and were presented at the prestigious AI conference NeurIPS. A copy of the paper is available <strong><a href=\"https:\/\/openreview.net\/pdf?id=8ZiElzQxf1\">HERE<\/a><\/strong>:<\/p>\n\n\n\n<p>Dr Edoardo Ponti, GAIL Fellow and Lecturer in Natural Language Processing at the University\u2019s School of Informatics, said: \u201cIn a nutshell, our models can reason faster but with the same quality. Hence, for an equivalent time budget for reasoning, they can explore more and longer reasoning threads. This improves their ability to solve complex problems in maths, science, and coding.\u201d<\/p>\n\n\n\n<p>Dr Ponti and his team will continue to investigate ways how large AI systems represent and remember information, making them far more efficient and sustainable as part of a 1.5 million euros European Research Council-funded project called AToM-FM.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or help save significant amounts of energy.\u00a0 Experts from University of Edinburgh and NVIDIA found that large language models (LLMs) using memory eight times smaller than an uncompressed LLM scored better on maths, science &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[199],"tags":[353,14209,14208,13669,159],"class_list":["post-30787","post","type-post","status-publish","format-standard","","category-news-views-and-opinion","tag-ai","tag-large-language-models-llms","tag-memory","tag-nvidia","tag-university-of-edinburgh"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Shrinking AI memory boosts accuracy, study finds - Engineer News Network<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Shrinking AI memory boosts accuracy, study finds - Engineer News Network\" \/>\n<meta property=\"og:description\" content=\"Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or help save significant amounts of energy.\u00a0 Experts from University of Edinburgh and NVIDIA found that large language models (LLMs) using memory eight times smaller than an uncompressed LLM scored better on maths, science &hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/\" \/>\n<meta property=\"og:site_name\" content=\"Engineer News Network\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-24T21:10:00+00:00\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/#\\\/schema\\\/person\\\/4477342aea8e299c6a21761e513ea8e1\"},\"headline\":\"Shrinking AI memory boosts accuracy, study finds\",\"datePublished\":\"2025-12-24T21:10:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/\"},\"wordCount\":667,\"keywords\":[\"AI\",\"large language models (LLMs)\",\"memory\",\"Nvidia\",\"University of Edinburgh\"],\"articleSection\":[\"News, Views and Opinion\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/\",\"url\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/\",\"name\":\"Shrinking AI memory boosts accuracy, study finds - Engineer News Network\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/#website\"},\"datePublished\":\"2025-12-24T21:10:00+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/#\\\/schema\\\/person\\\/4477342aea8e299c6a21761e513ea8e1\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/shrinking-ai-memory-boosts-accuracy-study-finds\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Shrinking AI memory boosts accuracy, study finds\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/\",\"name\":\"Engineer News Network\",\"description\":\"The ultimate online news and information resource for today's engineer\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/#\\\/schema\\\/person\\\/4477342aea8e299c6a21761e513ea8e1\",\"name\":\"admin\",\"url\":\"https:\\\/\\\/www.engineernewsnetwork.com\\\/blog\\\/author\\\/admin\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Shrinking AI memory boosts accuracy, study finds - Engineer News Network","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/","og_locale":"en_GB","og_type":"article","og_title":"Shrinking AI memory boosts accuracy, study finds - Engineer News Network","og_description":"Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or help save significant amounts of energy.\u00a0 Experts from University of Edinburgh and NVIDIA found that large language models (LLMs) using memory eight times smaller than an uncompressed LLM scored better on maths, science &hellip;","og_url":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/","og_site_name":"Engineer News Network","article_published_time":"2025-12-24T21:10:00+00:00","author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Estimated reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/#article","isPartOf":{"@id":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/"},"author":{"name":"admin","@id":"https:\/\/www.engineernewsnetwork.com\/blog\/#\/schema\/person\/4477342aea8e299c6a21761e513ea8e1"},"headline":"Shrinking AI memory boosts accuracy, study finds","datePublished":"2025-12-24T21:10:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/"},"wordCount":667,"keywords":["AI","large language models (LLMs)","memory","Nvidia","University of Edinburgh"],"articleSection":["News, Views and Opinion"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/","url":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/","name":"Shrinking AI memory boosts accuracy, study finds - Engineer News Network","isPartOf":{"@id":"https:\/\/www.engineernewsnetwork.com\/blog\/#website"},"datePublished":"2025-12-24T21:10:00+00:00","author":{"@id":"https:\/\/www.engineernewsnetwork.com\/blog\/#\/schema\/person\/4477342aea8e299c6a21761e513ea8e1"},"breadcrumb":{"@id":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.engineernewsnetwork.com\/blog\/shrinking-ai-memory-boosts-accuracy-study-finds\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.engineernewsnetwork.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Shrinking AI memory boosts accuracy, study finds"}]},{"@type":"WebSite","@id":"https:\/\/www.engineernewsnetwork.com\/blog\/#website","url":"https:\/\/www.engineernewsnetwork.com\/blog\/","name":"Engineer News Network","description":"The ultimate online news and information resource for today's engineer","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.engineernewsnetwork.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.engineernewsnetwork.com\/blog\/#\/schema\/person\/4477342aea8e299c6a21761e513ea8e1","name":"admin","url":"https:\/\/www.engineernewsnetwork.com\/blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/posts\/30787","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/comments?post=30787"}],"version-history":[{"count":1,"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/posts\/30787\/revisions"}],"predecessor-version":[{"id":30788,"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/posts\/30787\/revisions\/30788"}],"wp:attachment":[{"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/media?parent=30787"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/categories?post=30787"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.engineernewsnetwork.com\/blog\/wp-json\/wp\/v2\/tags?post=30787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}