Namespaces
- XPath
/root/childfails if document has default namespace—use//*[local-name()='child']or register prefix - Default namespace (
xmlns="...") applies to elements, not attributes—attributes need explicit prefix - Namespace prefix is arbitrary—
<foo:element>and<bar:element>are identical if both prefixes map to same URI - Child elements don't inherit parent's prefixed namespace—each must declare or use prefix explicitly
Encoding
<?xml version="1.0" encoding="UTF-8"?>must match actual file encoding—mismatch corrupts non-ASCII- Encoding declaration must be first thing in file—no whitespace or BOM before it (except UTF-8 BOM allowed)
- Default encoding is UTF-8 if declaration omitted—but explicit is safer across parsers
Escaping & CDATA
- Five entities always escape in text:
&<>"' - CDATA sections
<![CDATA[...]]>for blocks with many special chars—but]]>inside CDATA breaks it - Attribute values: use
"if delimited by", or'if delimited by' - Numeric entities
<and<work everywhere—useful for edge cases
Whitespace
- Whitespace between elements is preserved by default—pretty-printing adds nodes that may break processing
xml:space="preserve"attribute signals whitespace significance—but not all parsers respect it- Normalize-space in XPath:
normalize-space(text())trims and collapses internal whitespace
XPath Pitfalls
//elementis expensive—traverses entire document; use specific paths when structure is known- Position is 1-indexed:
[1]is first, not[0] text()returns direct text children only—usestring()or.for concatenated descendant text- Boolean in predicates:
[@attr]tests existence,[@attr='']tests empty value—different results
Structure
- Self-closing
<tag/>and empty<tag></tag>are semantically identical—but some legacy systems choke on self-closing - Comments cannot contain
--—will break parser even inside string content - Processing instructions
<?target data?>cannot have?>in data - Root element required—document with only comments/PIs and no element is invalid
Validation
- Well-formed ≠ valid—parser may accept structure but fail against schema
- DTD validates but can't express complex constraints—prefer XSD or RelaxNG for new projects
- XSD namespace
xmlns:xs="http://www.w3.org/2001/XMLSchema"commonly confused with instance namespace