Crashing RPC: Analysis of a New Type of Vulnerability in Memory-Safe Blockchain RPC Nodes

金色财经_

CertiK’s Skyfall team recently discovered multiple vulnerabilities in Rust-based RPC nodes in several blockchains including Aptos, StarCoin, and Sui. Since RPC nodes are critical infrastructure components connecting dApps and the underlying blockchain, their robustness is critical for seamless operation. Blockchain designers know the importance of stable RPC services, so they adopt memory-safe languages such as Rust to avoid common vulnerabilities that can destroy RPC nodes.

Adopting a memory-safe language such as Rust helps RPC nodes avoid many attacks based on memory corruption vulnerabilities. However, through a recent audit, we found that even memory-safe Rust implementations, if not carefully designed and vetted, can be vulnerable to certain security threats that can disrupt the availability of RPC services.

In this article, we will introduce our discovery of a series of vulnerabilities through practical cases.

Blockchain RPC node role

The remote procedure call (RPC) service of the blockchain is the core infrastructure component of the Layer 1 blockchain. It provides users with an important API front-end and acts as a gateway to the back-end blockchain network. However, blockchain RPC service is different from traditional RPC service in that it facilitates user interaction without authentication. Continuous availability of the service is critical, and any disruption in service can severely impact the availability of the underlying blockchain.

Audit perspective: traditional RPC server VS blockchain RPC server

The audit of traditional RPC servers mainly focuses on input verification, authorization/authentication, cross-site request forgery/server-side request forgery (CSRF/SSRF), injection vulnerabilities (such as SQL injection, command injection) and information leakage.

However, the situation is different for blockchain RPC servers. As long as the transaction is signed, there is no need to authenticate the requesting client at the RPC layer. As the front end of the blockchain, one of the main goals of the RPC service is to guarantee its availability. If it fails, users cannot interact with the blockchain, preventing them from querying on-chain data, submitting transactions, or issuing contract functions.

Therefore, the most vulnerable aspect of a blockchain RPC server is “availability”. If the server goes down, users lose the ability to interact with the blockchain. What’s more serious is that some attacks will spread on the chain, affect a large number of nodes, and even lead to the paralysis of the entire network.

Why the new blockchain will use memory-safe RPC

Some well-known Layer 1 blockchains, such as Aptos and Sui, use the memory-safe programming language Rust to implement their RPC services. Thanks to its strong safety and strict compile-time checks, Rust makes programs virtually immune to memory corruption vulnerabilities, such as stack overflows, and null pointer dereference and rereference-after-free vulnerabilities.

To further secure the codebase, developers strictly follow best practices, such as not introducing unsafe code. Use #![forbid(unsafe_code)] in the source code to ensure that unsafe code is blocked and filtered.

Examples of blockchain developers implementing Rust programming practices

To prevent integer overflow, developers usually use functions like checked_add, checked_sub, saturating_add, saturating_sub, etc. instead of simple addition and subtraction (+, -). Mitigate resource exhaustion by setting appropriate timeouts, request size limits, and request item limits.

Memory Safety RPC Threats in Layer 1 Blockchain

While not vulnerable to memory insecurity in the traditional sense, RPC nodes are exposed to easily manipulated inputs by attackers. In a memory-safe RPC implementation, there are several situations that can lead to a denial of service. For example, memory amplification may exhaust the service’s memory, while logic issues may introduce infinite loops. Additionally, race conditions can pose a threat whereby concurrent operations can have an unexpected sequence of events, leaving the system in an undefined state. Additionally, improperly managed dependencies and third-party libraries can introduce unknown vulnerabilities into the system.

In this post, our aim is to draw attention to more immediate ways that Rust’s runtime protections can be triggered, causing services to abort themselves.

Explicit Rust Panic: A way to terminate RPC services directly

Developers can introduce explicit panic code, intentionally or unintentionally. These codes are mainly used to handle unexpected or exceptional conditions. Some common examples include:

assert!(): Use this macro when a condition must be met. If the asserted condition fails, the program will panic, indicating that there is a serious error in the code.

panic!(): This function is called when the program encounters an error from which it cannot recover and cannot continue.

unreachable!(): Use this macro when a piece of code should not be executed. If this macro is invoked, it indicates a serious logic error.

unimplemented!() and todo!(): These macros are placeholders for unimplemented functionality. If this value is reached, the program will crash.

unwrap(): This method is used for Option or Result types. When an Err variable or None is encountered, the program will crash.

Vulnerability 1: Trigger the assert in Move Verifier!

The Aptos blockchain uses the Move bytecode verifier to perform reference security analysis through an abstract interpretation of the bytecode. The ute() function is part of the implementation of the TransferFunctions trait and simulates the execution of bytecode instructions in basic blocks.

The task of the function ute_inner() is to interpret the current bytecode instruction and update the state accordingly. If we have executed to the last instruction in the basic block, as indicated by index == last_index, the function will call assert!(self.stack.is_empty()) to ensure the stack is empty. The intention behind this behavior is to guarantee that all operations are balanced, which also means that every push has a corresponding pop.

In the normal flow of execution, the stack is always balanced during abstract interpretation. This is guaranteed by the Stack Balance Checker, which verifies the bytecode before interpreting it. However, once we broaden our perspective to the realm of abstract interpreters, we see that the stack balance assumption is not always valid.

Patch for the analyze_function vulnerability in AbstractInterpreter

At its core, an abstract interpreter emulates bytecode at the basic block level. In its original implementation, encountering an error during ute_block would prompt the analysis process to log the error and continue execution to the next block in the control flow graph. This can create a situation where an error in an execution block can cause the stack to become unbalanced. If execution continues in this case, an assert! check will be made if the stack is not empty, causing a panic.

This gives attackers an opportunity to exploit. An attacker can trigger an error by designing a specific bytecode in ute_block(), and then ute() may execute an assert if the stack is not empty, causing the assert check to fail. This will further panic and terminate the RPC service, affecting its availability.

To prevent this, the fix implemented ensures that the entire analysis process is stopped when the ute_block function first encounters an error, thereby avoiding the risk of subsequent crashes that may occur when continuing analysis due to stack imbalance due to errors. This modification removes conditions that could cause panics and helps improve the robustness and safety of the abstract interpreter.

** Vulnerability 2: Trigger the panic in StarCoin! **

The Starcoin blockchain has its own fork of the Move implementation. In this Move repo, there is a panic in the constructor of the Struct type! If the provided StructDefinition has Native field information, the panic will be explicitly triggered! .

Explicit panics for initialized structures in normalization routines

This potential risk exists in the process of redistributing modules. If the published module already exists in the data store, module normalization is required for both the existing module and the attacker-controlled input module. During this process, the “normalized::Module::new” function builds the module structure from the attacker-controlled input modules, triggering a “panic!”.

Prerequisites for the normalization routine

This panic can be triggered by submitting a specially crafted payload from the client. Therefore, malicious actors can disrupt the availability of RPC services.

Struct initialization panic patch

The Starcoin patch introduces a new behavior to handle the Native case. Now, rather than panic, it returns an empty ec. This reduces the possibility of users submitting data causing panics.

Implicit Rust Panic: An easily overlooked way to terminate RPC services

Explicit panics are easily identifiable in source code, while implicit panics are more likely to be ignored by developers. Implicit panics usually occur when using APIs provided by standard or third-party libraries. Developers need to read and understand the API documentation thoroughly, or their Rust programs may stop unexpectedly.

Implicit panic in BTreeMap

Let’s take BTreeMap from the Rust STD as an example. BTreeMap is a commonly used data structure that organizes key-value pairs in a sorted binary tree. BTreeMap provides two methods for retrieving values by key: get(&self, key: &Q) and index(&self, key: &Q).

The method get(&self, key: &Q) retrieves the value using the key and returns an Option. Option can be Some(&V), if the key exists, return the reference of the value, if the key is not found in the BTreeMap, return None.

On the other hand, index(&self, key: &Q) directly returns a reference to the value corresponding to the key. However, it has a big risk: it will trigger an implicit panic if the key does not exist in the BTreeMap. If not handled properly, the program can crash unexpectedly, making it a potential vulnerability.

In fact, the index(&self, key: &Q) method is the underlying implementation of the std::ops::Index trait. This trait is an index operation in an immutable context (ie container [index] ) provides convenient syntactic sugar. Developers can directly use btree_map [key] , call the index(&self, key: &Q) method. However, they may ignore the fact that this usage may panic if the key is not found, thus posing an implicit threat to the stability of the program.

Vulnerability 3: Trigger an implicit panic in Sui RPC

The Sui module release routine allows users to submit module payloads via RPC. The RPC handler uses the SuiCommand::Publish function to directly disassemble the received module before forwarding the request to the backend verification network for bytecode verification.

During this disassembly, the code_unit section of the submitted module is used to build a VMControlFlowGraph. The build process consists of creating basic blocks, which are stored in a BTreeMap named “‘blocks’”. The process includes creating and manipulating the Map, where an implicit panic is triggered under certain conditions.

Here is a simplified code:

Implicit panic when creating VMControlFlowGraph

In that code, a new VMControlFlowGraph is created by walking through the code and creating a new basic block for each code unit. Basic blocks are stored in a BTreeMap named block.

The block map is indexed using block[&block] in a loop that iterates over the stack, which has been initialized with ENTRY_BLOCK_ID. The assumption here is that there is at least one ENTRY_BLOCK_ID in the block map.

However, this assumption does not always hold. For example, if the committed code is empty, the “block map” will still be empty after the “create basic block” process. When code later tries to traverse the block map using for succ in &blocks[&block].successors , an implicit panic may be raised if the key is not found. This is because the blocks[&block] expression is essentially a call to the index() method, which, as mentioned earlier, will panic if the key does not exist in the BTreeMap.

An attacker with remote access could exploit the vulnerability in this function by submitting a malformed module payload with an empty code_unit field. This simple RPC request crashes the entire JSON-RPC process. If an attacker continues to send such malformed payloads with minimal effort, it will result in a sustained interruption of service. In a blockchain network, this means that the network may not be able to confirm new transactions, resulting in a denial of service (DoS) situation. Network functionality and user trust in the system will be severely affected.

Sui’s fix: remove disassembly from RPC issue routine

It is worth noting that the CodeUnitVerifier in the Move Bytecode Verifier is responsible for ensuring that the code_unit section is never empty. However, the order of operations exposes RPC handlers to potential vulnerabilities. This is because the validation process takes place on the Validator node, which is a stage after the RPC processes the input modules.

In response to this problem, Sui solved the vulnerability by removing the disassembly function in the module’s release RPC routine. This is an effective way to prevent RPC services from processing potentially dangerous, unvalidated bytecode.

Also, it is worth noting that other RPC methods related to object lookups also contain disassembly capabilities, but they are not vulnerable to the use of empty code cells. This is because they are always querying and disassembling existing published modules. Published modules must have been verified, so the assumption of non-empty code cells always holds when building a VMControlFlowGraph.

Suggestions for developers

After understanding the threats to the stability of RPC services in blockchains from explicit and implicit panics, developers must master strategies to prevent or mitigate these risks. These strategies can reduce the possibility of unplanned service outages and increase the resiliency of the system. Therefore, CertiK’s expert team puts forward the following suggestions and lists them as best practices for Rust programming.

Rust Panic Abstraction: Whenever possible, consider using Rust’s catch_unwind function to catch panics and convert them into error messages. This prevents the entire program from crashing and allows developers to handle errors in a controlled manner.

Use APIs with caution: Implicit panics usually occur due to misuse of APIs provided by standard or third-party libraries. Therefore, it is crucial to fully understand the API and learn to handle potential errors appropriately. Developers should always assume that APIs may fail and prepare for such situations.

Proper error handling: use Result and Option types for error handling instead of resorting to panic. The former provides a more controlled way of handling errors and special cases.

Add documentation and comments: Make sure your code is well-documented and add comments to critical sections (including those where panics may occur). This will help other developers understand potential risks and deal with them effectively.

Summarize

Rust-based RPC nodes play an important role in blockchain systems such as Aptos, StarCoin, and Sui. Since they are used to connect DApps and the underlying blockchain, their reliability is critical to the smooth operation of the blockchain system. Although these systems use the memory-safe language Rust, there is still a risk of poor design. CertiK’s research team explored these risks with real-world examples that demonstrate the need for careful and careful design in memory-safe programming.

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments