Can TensorRT AI Models Be Reverse-Engineered?
TensorRT for accelerating your AI models
TensorRT is a tool developed by NVIDIA to reduce inference time and optimize memory usage for GPU deployment.
It works by taking a model written in a standard format like ONNX and transforming it into a special, optimized binary file called an engine file. This file is adapted to a specific GPU and enables the model to run much more efficiently. TensorRT is widely used and is one of the most commonly used inference tool for deploying AI models on NVIDIA GPU. From embedded hardware like Jetson to on premise servers.
One of the compilation options available when converting an ONNX model into a TensorRT engine is profiling_verbosity, which controls the level of metadata embedded in the engine. When set to DETAILED, the engine includes detailed information about the model structure: layer names, parameters, input/output metadata, and more. Conversely, setting this flag to NONE is supposed to strip all this information, preventing it from being accessed through the TensorRT API.
However, a simple observation challenges this assumption.
A simple binary edit to reveal the model structure
While compiling ONNX models into TensorRT engines, we noticed that the resulting files were the same size whether compiled with profiling_verbosity=NONE or DETAILED. This strongly suggested that the metadata was still present in both cases. Upon closer inspection, we found that the detailed model metadata is always embedded in the binary, the TensorRT API hides it by checking the value of a single flag stored in the file.
This flag is encoded in a single byte. It is therefore easy to trick TensorRT into believing that the engine has been compiled with the DETAILED verbosity level simply by changing this byte, which will unlock access to the metadata.
Example
In the hexadecimal representation of the engine file, we can find a pattern like: 9480010007800100
The byte that follows determines the verbosity level:
-
01=NONE -
02=DETAILED -
03=LAYER_NAMES_ONLY
By simply replacing 01 with 02, the engine becomes fully introspectable again via the TensorRT API and without recompilation!
Here is a little Python script to reproduce the trick for TensorRT 10.10. The script takes an engine as argument and parses the engine binary until it finds the byte string corresponding to the verbosity. The script then replaces the byte following this byte string to set the verbosity to DETAILED:
import os
import sys
if len(sys.argv) != 2:
print("This script takes an engine file as input")
sys.exit(1)
engine_file = sys.argv[1]
# Load the engine and unlock the verbosity
with open(engine_file, "rb") as f:
engine = bytearray(f.read())
verbosity_tag = bytes.fromhex("9480010007800100")
for i in range(0, len(engine) - 9, 1):
extracted_tag = engine[i : i + 8]
if verbosity_tag == extracted_tag:
# Change 01 to 02
# 01 = NONE
# 02 = DETAILED
engine[i + 8] = 0x02
break
# Save the unlocked engine
file_name, extension = os.path.splitext(engine_file)
unlocked_engine_file = file_name + "_unlocked" + extension
with open(unlocked_engine_file, "wb") as f:
f.write(engine)
Security Implications
This behavior highlights a security detail: TensorRT engines compiled with profiling_verbosity=NONE contain the full model metadata, it’s just hidden from the API.
The implications are significant:
- Sensitive information such as model structure, architecture, and layer parameters remains embedded in the file.
- Anyone with access to the engine file can expose this metadata with a simple binary edit.
- The most important: This information can assist in reverse-engineering proprietary AI models, potentially leading to intellectual property leakage or security issues.
Is it possible to extract the weights from an engine?
Yes, “compiled” does not mean “hidden”! For TensorRT, the weights are stored in plain text at the end of the engine file, layer by layer. They are identical (with a few optimizations) to those of the original model and then can be extracted!
By combining weights and model metadata, an attackable version of the model can be created and used to carry out white-box attacks such as generating adversarial examples.
The AI protection of Skyld helps you to protect the engine against this type of attack before and during the engine execution, using mathematical transformations to keep model weights confidential.