What is XML?

XML explained: element structure, attributes, well-formed vs valid, XML vs JSON, and where XML is still used today.

5 min read·Updated June 2026

XML (eXtensible Markup Language) is a text format for storing and transporting structured data. Unlike HTML, which defines a fixed set of tags, XML lets you define your own tags — hence "extensible". It was standardized by the W3C in 1998 and was the dominant data interchange format for web services and configuration before JSON rose to prominence.

Basic structure

An XML document has a simple, consistent structure:

<?xml version="1.0" encoding="UTF-8"?>
<user id="42">
  <name>Alice Martin</name>
  <email>alice@example.com</email>
  <roles>
    <role>admin</role>
    <role>editor</role>
  </roles>
  <active>true</active>
</user>
ConceptDescriptionExample
DeclarationOptional opening line specifying XML version and encoding<?xml version="1.0"?>
ElementA named opening and closing tag pair — the basic building block<name>Alice</name>
AttributeKey-value pair inside an opening tag<user id="42">
Root elementEvery XML document must have exactly one root element<user>...</user>
Self-closing tagEmpty element with no content<br />
CDATA sectionRaw text that should not be parsed as markup<![CDATA[<b>raw</b>]]>

Well-formed vs valid XML

These two terms are often confused:

  • Well-formed — the document follows XML syntax rules: every tag is closed, attributes are quoted, there is exactly one root element, and special characters (< > & " ') are escaped as entities (&lt; &gt; &amp; &quot; &apos;). Any XML parser can read a well-formed document.
  • Valid — the document conforms to a schema (DTD or XSD) that defines which elements and attributes are allowed and in what order. Validity requires a schema definition and a validating parser.

Most applications only require well-formed XML. Schemas are used in enterprise integrations like SOAP.

XML vs JSON

XMLJSON
SyntaxTag-based, verboseKey-value, concise
CommentsYes — <!-- comment -->No
AttributesYes — metadata on elementsNo — everything is a value
Data typesAll text by default; types from schemaNative strings, numbers, booleans, null
ArraysRepeated sibling elements (no explicit array syntax)First-class [] syntax
NamespacesYes — for merging vocabulariesNo
Human readabilityModerate (verbose)High (concise)
Browser supportXPath, XSLT, DOMNative JSON.parse()
Typical payload sizeLarger (tag overhead)Smaller

Where XML is still used

Despite JSON dominating REST APIs, XML is still widely used in:

  • SOAP web services — enterprise systems, banking, ERP (SAP, Oracle) still rely on SOAP/XML heavily.
  • RSS and Atom feeds — blog and news syndication formats are XML-based.
  • SVG — scalable vector graphics are XML documents.
  • Android layouts — UI layouts in Android apps are defined in XML.
  • Build tools — Maven (pom.xml), Ant, and older Spring configs.
  • Microsoft Office — .docx, .xlsx, and .pptx files are ZIP archives of XML files.
  • XHTML — stricter HTML that follows XML rules.

Namespaces

When combining XML vocabularies from different sources, element names can clash. XML namespaces solve this by prefixing elements with a URI:

<root
  xmlns:html="http://www.w3.org/1999/xhtml"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <html:p>A paragraph</html:p>
  <dc:title>A Dublin Core title</dc:title>
</root>

Namespaces are common in enterprise XML (SOAP envelopes, XSLT stylesheets) but rarely needed in simple XML documents.

For a side-by-side format comparison, see the JSON vs YAML guide — XML, JSON, and YAML all represent the same data in different ways.

Frequently asked questions