What is XML?
XML explained: element structure, attributes, well-formed vs valid, XML vs JSON, and where XML is still used today.
XML (eXtensible Markup Language) is a text format for storing and transporting structured data. Unlike HTML, which defines a fixed set of tags, XML lets you define your own tags — hence "extensible". It was standardized by the W3C in 1998 and was the dominant data interchange format for web services and configuration before JSON rose to prominence.
Basic structure
An XML document has a simple, consistent structure:
<?xml version="1.0" encoding="UTF-8"?>
<user id="42">
<name>Alice Martin</name>
<email>alice@example.com</email>
<roles>
<role>admin</role>
<role>editor</role>
</roles>
<active>true</active>
</user>| Concept | Description | Example |
|---|---|---|
| Declaration | Optional opening line specifying XML version and encoding | <?xml version="1.0"?> |
| Element | A named opening and closing tag pair — the basic building block | <name>Alice</name> |
| Attribute | Key-value pair inside an opening tag | <user id="42"> |
| Root element | Every XML document must have exactly one root element | <user>...</user> |
| Self-closing tag | Empty element with no content | <br /> |
| CDATA section | Raw text that should not be parsed as markup | <![CDATA[<b>raw</b>]]> |
Well-formed vs valid XML
These two terms are often confused:
- Well-formed — the document follows XML syntax rules: every tag is closed, attributes are quoted, there is exactly one root element, and special characters (
< > & " ') are escaped as entities (< > & " '). Any XML parser can read a well-formed document. - Valid — the document conforms to a schema (DTD or XSD) that defines which elements and attributes are allowed and in what order. Validity requires a schema definition and a validating parser.
Most applications only require well-formed XML. Schemas are used in enterprise integrations like SOAP.
XML vs JSON
| XML | JSON | |
|---|---|---|
| Syntax | Tag-based, verbose | Key-value, concise |
| Comments | Yes — <!-- comment --> | No |
| Attributes | Yes — metadata on elements | No — everything is a value |
| Data types | All text by default; types from schema | Native strings, numbers, booleans, null |
| Arrays | Repeated sibling elements (no explicit array syntax) | First-class [] syntax |
| Namespaces | Yes — for merging vocabularies | No |
| Human readability | Moderate (verbose) | High (concise) |
| Browser support | XPath, XSLT, DOM | Native JSON.parse() |
| Typical payload size | Larger (tag overhead) | Smaller |
Where XML is still used
Despite JSON dominating REST APIs, XML is still widely used in:
- SOAP web services — enterprise systems, banking, ERP (SAP, Oracle) still rely on SOAP/XML heavily.
- RSS and Atom feeds — blog and news syndication formats are XML-based.
- SVG — scalable vector graphics are XML documents.
- Android layouts — UI layouts in Android apps are defined in XML.
- Build tools — Maven (
pom.xml), Ant, and older Spring configs. - Microsoft Office — .docx, .xlsx, and .pptx files are ZIP archives of XML files.
- XHTML — stricter HTML that follows XML rules.
Namespaces
When combining XML vocabularies from different sources, element names can clash. XML namespaces solve this by prefixing elements with a URI:
<root
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<html:p>A paragraph</html:p>
<dc:title>A Dublin Core title</dc:title>
</root>Namespaces are common in enterprise XML (SOAP envelopes, XSLT stylesheets) but rarely needed in simple XML documents.
For a side-by-side format comparison, see the JSON vs YAML guide — XML, JSON, and YAML all represent the same data in different ways.
Try it now