AutoPentest-DRL is an open-source framework developed by the Cyber Range Organization and Design (CROND)
at the Japan Advanced Institute of Science and Technology (JAIST). It uses Deep Reinforcement Learning (DRL)
to automate the determination and execution of attack paths in a network environment. Core Functionality
The system is designed to handle both logical simulations and real-world network testing: Logical Attack Mode
: Analyzes a network topology to determine the optimal attack path without performing actual exploits. This is primarily used for educational and research purposes. Real Attack Mode
: Conducts automated penetration testing on a live network by integrating with standard security tools. Methodology
: It uses a two-stage process: first, it gathers data (using tools like Shodan) to build a topology and attack tree (using MulVAL); then, it applies DRL algorithms to find the most efficient attack paths. Key Technical Components autopentest-drl
The framework relies on a specific stack of security and machine learning tools:
: Used for initial network scanning to identify active hosts and open ports. Metasploit
: Serves as the primary engine for executing the attacks suggested by the DRL engine. Pymetasploit3
: A Python-based RPC API that allows the framework to communicate with and control Metasploit. Deep Reinforcement Learning Engine : Typically utilizes Deep Q-Networks (DQN)
to make decisions based on the current state of the network. Installation & Setup The project is primarily developed for Ubuntu 18.04 LTS and requires a Python environment. : Source code is available on the AutoPentest-DRL GitHub repository Requirements requirements.txt file to install necessary Python packages. Infrastructure : A pre-configured Docker image whichard/autopentest-drl ) is also available to simplify environment setup. Limitations and Research Context
nmap -sS), brute-forcing (Hydra), exploiting (Metasploit modules), lateral movement (PsExec, WinRM), and privilege escalation.Unlike supervised learning (which needs labeled attack graphs) or supervised fine-tuned LLMs (which lack true sequential decision-making), Autopentest-DRL learns optimal attack paths through millions of simulated episodes. AutoPentest-DRL is an open-source framework developed by the
We created three network scenarios of increasing complexity:
| Scenario | Hosts | Vulnerabilities | Goal | |----------|-------|----------------|------| | Simple | 3 | EternalBlue, weak SSH creds | Compromise host 3 | | Medium | 7 | 15 (mix of web, SMB, SQLi) | Root access on database server | | Complex | 12 | 28 (including pivoting) | Domain controller compromise |
Baselines:
autopentest-drl is a custom or specific tool, ensure it's properly installed and configured.Table 1: Performance Metrics (averaged over 30 runs)
| Method | Success Rate (%) | Avg. Steps | Time (min) | Coverage (%) | |-------------------|-----------------|------------|------------|--------------| | Random | 12.3 | 147 | 28.4 | 34.1 | | Metasploit Autopwn| 45.6 | 62 | 12.3 | 58.7 | | Q-learning | 52.1 | 58 | 11.8 | 63.2 | | OpenVAS + Manual | 78.4 | N/A | 89.0 | 81.5 | | AutoPenTest-DRL | 91.7 | 33 | 7.4 | 92.3 |
Key Findings:
--script in Nmap) before launching actual exploits.For security researchers and engineering teams, here’s a minimal roadmap:
Step 1: Choose a simulator
CAGEChallenge scenario.Step 2: Define action and observation spaces
from gym import spaces
self.action_space = spaces.Discrete(512) # 512 common pentest commands
self.observation_space = spaces.Dict(
"scan_results": spaces.Box(0, 1, shape=(100,)),
"current_priv": spaces.Discrete(3), # user, root, service
"compromised_hosts": spaces.Box(0, 1, shape=(10,))
)
Step 3: Implement PPO from Stable-Baselines3
from stable_baselines3 import PPO
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=200_000)
Step 4: Reward normalization – Use a running mean and std for rewards to avoid oscillation.
Step 5: Validate – Run 100 episodes and measure: Core Components
Traditional automated penetration testing tools follow static, rule-based decision trees (e.g., Metasploit, OpenVAS). While efficient for known vulnerabilities, they fail to adapt to dynamic, multi-stage attack surfaces. This article introduces AutoPentest-DRL, a novel framework that models the penetration testing process as a Markov Decision Process (MDP) and optimizes attack paths using Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO).
The framework consists of four core modules:
[Reconnaissance] → [Attack Planner (DRL Agent)] → [Exploit Executor] → [State Tracker]
↑ |
└─────────────────── Reward Signal ────────────────────────┘