Live Malware Code Mutation: How AI Generates Evasive Malware

PROMPTFLUX is an experimental VBScript dropper that issues runtime POSTs to the Gemini API (specifying gemini-1.5-flash-latest) using an embedded API credential, requests code-only VBScript obfuscations, logs the model responses locally, and is built to persist regenerated scripts to Windows Startup and copy itself to removable and mapped shares.

As per pentesting services experts, Observed samples are implemented as Visual Basic Script (VBScript) droppers. Notable on-disk artifacts reported include a logfile capturing raw LLM responses at %TEMP%\thinking_robot_log.txt. Filenames and decoy installers in samples (for example crypted_ScreenRec_webinstall) indicate social-engineering lures embedded in the dropper.

LLM integration and network behaviour

The malware performs HTTP POST requests from the infected host to a vendor LLM endpoint and selects a current model tag such as gemini-1.5-flash-latest. Samples contain an embedded API key that is used for authentication when calling the Gemini endpoint. The observable telemetry produced by the sample includes outbound requests to LLM APIs and local logs of the AI responses.

Prompting mechanics and expected outputs

Prompts are constructed to be machine-parsable and to force the model to output code only (no narrative or explanatory text). At least one observed variant frames the model as an “expert VBScript obfuscator” and requests full-source rewrites on a periodic cadence. The prompt design explicitly instructs preservation of the decoy payload, embedded API credential, and self-regeneration logic in regenerated code.

Self-modification lifecycle (observed flow)

Reported execution flow in analyzed samples is: (1) infected host issues a POST with a machine-parsable prompt, (2) the model returns VBScript code per the prompt, (3) the sample writes the returned code to disk (targeting persistence locations such as the Windows Startup folder), and (4) the runtime can execute or replace the running payload with the regenerated script. Some samples included a function named AttemptToUpdateSelf (commented out in the analyzed artifacts), and at least one variant requested hourly full-source rewrites to effect a metamorphic code shape.

Persistence and propagation vectors

Regenerated and obfuscated VBScript files were written to Startup locations to ensure execution at user login. The samples attempted propagation by copying the dropper to removable media (USB) and to mapped network shares; filenames in sample deployments used social-engineering conventions consistent with broad distribution attempts.

Logging and observable artifacts

Samples explicitly wrote the raw LLM responses to disk (example: %TEMP%\thinking_robot_log.txt), creating a persistent record of the AI outputs. Outbound HTTP(S) traffic to vendor LLM endpoints, new or changed VBScript files in Startup, copies on removable media, and the presence of LLM response logs are described as primary observables in the GTIG analysis.

Operational context and conclusions

GTIG characterizes PROMPTFLUX as experimental in the samples they analyzed; the public report did not document confirmed large-scale compromises associated with the observed artifacts. The analysis notes commented-out self-update code in samples and highlights that the prompt designs assume reliable, AV-evasive output from the model—an assumption that the report and external commentators caution is not guaranteed. GTIG also reported disabling assets/accounts tied to observed activity and using the findings to tighten model protections.

AI Tools in Underground Forums and Capability Matrix

Tool / Capability	Deepfake & Image Generation	Malware Development	Phishing	Research & Reconnaissance	Technical Support & Code Generation	Vulnerability Exploitation
DarkDev
EvilAI	✔︎
FraudGPT		✔︎	✔︎		✔︎	✔︎
LoopGPT		✔︎			✔︎
MalwareGPT		✔︎	✔︎		✔︎
NYTHEON AI	✔︎				✔︎
SpamGPT			✔︎
SpamirMailer Bot			✔︎
WormGPT	✔︎	✔︎	✔︎	✔︎	✔︎	✔︎
Xanthorox	✔︎			✔︎	✔︎	✔︎

FraudGPT

What it is (high level): an alleged “black-hat” chatbot marketed to help criminals craft fraud — e.g., spear-phishing copy, spoof pages, social-engineering scripts and reconnaissance summaries. It’s notable because it was presented without safety constraints. Risk: scaled, highly persuasive phishing and fraud content.
Defensive guidance: tighten email filtering and BEC rules, increase multi-factor authentication (MFA) resilience (phishing-resistant MFA), apply targeted anti-phishing training and simulate AI-style phishing templates in red-team exercises. Instrument logs for unusual mass-email patterns and outbound credential harvesting indicators.

WormGPT

What it is (high level): a generative AI service advertised for crafting phishing and BEC campaigns and other social-engineering content; researchers tested examples showing convincing malicious emails. Risk: improves scale/quality of BEC and targeted phishing.
Defensive guidance: deploy robust DMARC/DKIM/SPF posture, escalate anomalies from HR/finance mailflows, use behavioral email analysis (look at sending patterns, reply-chain anomalies) and block known attacker infrastructure. Prioritize kill-chain controls (least privilege for payment processes, multi-party approvals).

MalwareGPT / Malware-style GPTs (generic)

What they claim (high level): marketed as LLMs that can assist with malware development, obfuscation ideas or exploit discovery. Many claims are unverified and some vendors scam buyers; still, the concept is concerning because it tries to lower the skill barrier to creating harmful code.
Defensive guidance: focus on detection (endpoint telemetry, EDR behavioral rules) rather than signature only; harden development pipelines, restrict code execution and build artifact signing, and monitor for suspicious outbound connections from dev/test environments.

EvilAI

What it is (high level): observed in analyst telemetry as a campaign that disguises malware as legitimate “AI tools” or apps; the operators use convincing UI and social engineering to get users to install trojans. Risk: supply-chain or user-installed backdoors disguised as helpful software.
Defensive guidance: enforce application allow-lists, require code signing for internal software, augment user awareness for “too good to be true” tools, and scan incoming binary artifacts with multiple AV/EDR engines. Monitor telemetry for new persistent services and anomalous process trees.

DarkDev

What it is (high level): a label commonly used on underground forums to group scripts, toolkits and services for dark-web development (could include payload generators, exploit wrappers and marketplace listings). Exact capabilities vary by vendor/post.
Defensive guidance: monitor dark-web chatter for targeted mentions of your organization (threat intel feeds), prioritize patching of the specific CVEs observed in those threads, and validate third-party code before deployment.

LoopGPT

What it is (high level): an emergent name appearing in forum lists of illicit tools; generally positioned as a generative assistant for repetitive fraud tasks (message templates, automation scripts). Public detail is sparse and reliability variable.
Defensive guidance: treat as a content-generation risk vector; expand phishing simulations to include templated and multi-step scams that mimic automation-style messaging.

NYTHEON AI

What it is (high level): another forum-marketed project name — often one of several branded AI services sold in criminal communities. Specific capabilities and authenticity vary; many such brands are transient.
Defensive guidance: integrate dark-web monitoring into threat intel, and prioritize generic mitigations (MFA, email auth, EDR, network segmentation).

SpamGPT / SpamIrMailer Bot

What they are (high level): advertised as tools to generate large volumes of targeted spam/phishing content and to manage mail-drop/irmailer infrastructure. Risk: highly automated, personalized spam that evades simple filters.
Defensive guidance: strengthen inbound mail throttling, reputation-based blocking, real-time blacklists, and implement anomaly detection for high-volume or engineered message bursts.

Xanthorox

What it is (high level): a name seen listed on forum inventories — likely a niche toolkit or moniker for a malware/automation pack. Details are sparse in public reporting; these names can be either active tools or pump-and-dump scams.
Defensive guidance: focus on the observable outcomes (malicious binaries, callbacks, C2 patterns) rather than the brand name; tune EDR/XDR for behavior-based detection.

PROMPTFLUX represents an early but noteworthy development in the integration of large language models directly into malware execution workflows. Although the analyzed samples are classified as experimental and no widespread operational impact has been confirmed, the dropper’s design—particularly its ability to request regenerated, code-only VBScript from a model while preserving embedded logic—demonstrates a shift toward runtime code modification driven by LLM services. In this model, obfuscation is no longer a static, pre-packaged phase but an on-demand, externally sourced function.

The implementation shows a clear intent to produce continuous structural variability, complicating traditional static detection approaches, while still leveraging long-established persistence and propagation vectors such as Windows Startup, removable media, and mapped network shares. While the stability and practical efficacy of the regenerated code depend on the model’s responses and the reliability of the prompting routine, PROMPTFLUX establishes the first documented pattern of malware that incorporates a generative model as part of its ongoing operational cycle.

In summary, PROMPTFLUX should not be interpreted as a fully mature or widely deployed threat, but rather as a directional signal: certain malware developers are now engineering tooling around generative models as dynamic obfuscation and self-regeneration engines, indicating a broader strategic shift toward LLM-assisted malware architectures.

Mike Stevens

Information security specialist, currently working as risk infrastructure specialist & investigator.
15 years of experience in risk and control process, security audit support, business continuity design and support, workgroup management and information security standards.

Live Malware Code Mutation: How AI Generates Evasive Malware

LLM integration and network behaviour

Prompting mechanics and expected outputs

Self-modification lifecycle (observed flow)

Persistence and propagation vectors

Logging and observable artifacts

Related LLM-assisted families documented by GTIG

Operational context and conclusions

AI Tools in Underground Forums and Capability Matrix

FraudGPT

WormGPT

MalwareGPT / Malware-style GPTs (generic)

EvilAI

DarkDev

LoopGPT

NYTHEON AI

SpamGPT / SpamIrMailer Bot

Xanthorox

How Hackers Intercept Mobile OTP and Calls Without ‘Hacking’ — The Shocking Power of SIM Boxes

13 Insanely Easy Techniques to Hack & Exploit Agentic AI Browsers

How to Use Google’s OSS Rebuild: A New Open Source Software Supply Chain Security Tool

Phishing 2.0: AI Tools Now Build Fake Login Pages That Fool Even Experts

How TokenBreak Technique Hacks OpenAI, Anthropic, and Gemini AI Filters — Step-by-Step Tutorial

Comparing Top 8 AI Code Assistants: Productivity Miracle or Security Nightmare. Can You Patent AI Code Based App?

No Login Required: How Hackers Hijack Your System with Just One Keystroke: utilman.exe Exploit Explained

How to Send DKIM-Signed, 100% Legit Phishing Emails — Straight from Google That Bypass Everything

Cyber Security Channel