Did Milla Jovovich Vicky use AI to create a “perfect-score project”? Developer hands-on testing: is it genuinely impressive or just exaggerated hype?

CryptoCity

MemPalace, an AI memory system developed by Milla Jovovich and Vikki, claimed perfect scores in testing and quickly went viral. However, the community soon called it out, alleging cheating in the tests and misleading data. Real-world testing found that the results were exaggerated and there were numerous errors. The team has admitted the flaws and is working on fixes.

Milla Jovovich builds an AI memory palace, drawing outside attention

Yesterday (4/7), there was big news in the AI community: Hollywood actress Milla Jovovich (known for Resident Evil and The Fifth Element), along with developer Ben Sigman, used Claude Code to help develop an open-source AI memory system called “MemPalace.”

For a time, the claim that “Hollywood superstar crosses over to deliver a perfect-score project” spread widely. To date, MemPalace has also received more than 20k stars on GitHub, but it wasn’t long before the developer community started questioning: Is it really full of substance, or just hype?

First, let’s talk about MemPalace’s motivation for being created. According to the official documentation, the goal is to address the fact that, in most AI systems, the content of user-AI conversations, decision-making processes, and architecture discussions typically disappear after a work session ends, causing months of effort to go to waste.

To solve this problem, MemPalace uses a spatial architecture to store memories—categorizing information clearly into wing areas representing individuals or projects, as well as into different levels such as corridors, rooms, and drawers—while preserving the original dialogue text for later semantic retrieval.

The development team claims that MemPalace achieved a perfect score of 100% on the long-term memory evaluation benchmark LongMemEval, and also reached 96.6% accuracy without calling any external APIs. It can run completely locally, doesn’t require subscribing to cloud services, and includes an AAAK dialect system claimed to achieve 30x lossless compression.

Image source: GitHub Hollywood star Milla Jovovich builds an AI memory palace, drawing outside attention

Peers and the community cast doubt together over test methods and promotional flaws

However, MemPalace’s claimed perfect performance on LongMemEval quickly drew skepticism from peers.

PenfieldLabs, which also builds AI memory systems, pointed out that MemPalace’s claim of a perfect score on the LoCoMo dataset is mathematically impossible, because the dataset’s standard answers themselves already contain 99 incorrect entries.

After analysis, PenfieldLabs found that MemPalace’s 100% score came from setting the retrieval count to 50 times, but the maximum number of dialogue stages in the test dataset is only 32. This means the system essentially bypasses the retrieval stage and hands all the data directly to the AI model to read.

Regarding the 100% score on LongMemEval, the development team was found to have targeted three specific problems that were concentrated in error, writing dedicated fix code, raising suspicions of cheating on the test set.

Image source: Reddit Peer PenfieldLabs points out that MemPalace’s claim of a perfect score on the LoCoMo dataset is mathematically impossible

Real-world testing by GitHub users shows the benchmark includes misleading elements

GitHub user hugooconnor commented after testing in the real world. MemPalace claims a retrieval accuracy as high as 96.6%, but in reality it did not use the “memory palace” architecture it promotes at all. hugooconnor said that their testing simply calls the default functionality of the underlying database ChromaDB, with no involvement of the categorization logic emphasized by the project—such as wing areas, rooms, or drawers.

After testing, hugooconnor found that when the system’s dedicated categorization logic for these memory palaces is actually enabled, retrieval performance declines instead. For example, in room mode, accuracy drops to 89.4%. And after enabling AAAK compression technology, accuracy drops further to 84.2%—both are lower than the default database performance.

hugooconnor also criticized the test methodology. In MemPalace’s test environment, the retrieval range for each question is intentionally narrowed to about 50 dialogue stages. Finding answers in such a tiny sample library is too easy.

If the range is expanded to more than 19,000 dialogue stages in a real scenario, the accuracy of traditional keyword search would plummet to 30%, showing that MemPalace’s current testing approach is masking the real difficulty of searching.

Image source: GitHub GitHub users tested in real-world conditions, showing that MemPalace benchmark tests contain misleading elements

At the same time, although the development team has already released a correction statement, admitting that the AAAK technology was indeed validated as lossy compression and promising to revise the documentation and system design based on the community’s harsh criticism, the project’s main explanation document still retains multiple exaggerated claims that have not been corrected. These include claims of 30x lossless compression and a 34% retrieval improvement. Moreover, the comparison charts with other competitors also completely lack sources and references.

MemPalace’s original code faces multiple bugs

As more and more developers download the tests, a large number of bug reports about MemPalace’s original code have appeared on GitHub.

User cktang88 listed multiple serious issues, including compression commands that cannot run and cause the system to crash, errors in the summary word-count calculation logic, inaccurate statistical data for digging up rooms, and the fact that the server loads all interpretation data into memory on every call, creating a severe resource consumption problem.

Other issues that have been pointed out include the system hard-coding the developer’s family member name into the default configuration file, and a forced display limit of 10k records when checking query status.

In response to these problems, the open-source community has already begun actively fixing them. User adv3nt3 submitted multiple fix requests, including correcting digging statistics, removing the default family member name, and delaying the initialization time of the knowledge graph. The development team later also acknowledged these errors and is gradually resolving the code issues through community collaboration.

Milla Jovovich Vibe Coding is cool; the marketing isn’t

Regarding the MemPalace project, Hacker News user darkhanakh reached a conclusion: MemPalace gives the impression of OpenClaw—meaning it artificially manipulates benchmark results to make everything look flawless, and then packages it as some kind of major breakthrough for marketing.

He believes the underlying technology behind MemPalace might indeed be interesting, but given the flaws in the testing methodology, it also promotes itself with “the highest publicly available score in history,” which doesn’t really seem appropriate. “But as for the fact that Milla Jovovich is playing Vibe Coding—I still think that’s pretty cool.”

Further reading:
AI writes code and messes up! A convenience store app “Leftover Hunter,” which is causing app security issues, with the home GPS fully exposed

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments