Skip to main content

XML and XSD

To validate an XML document against an XSD schema and identify which tags or elements are not compliant with the schema, you can use the lxml library in Python. The lxml library provides a powerful and easy-to-use interface for XML and XSD validation.

Here’s a step-by-step guide to achieve this:

Step 1: Install the lxml library

If you don't have lxml installed, you can install it using pip:

pip install lxml

Step 2: Write the Python script

Below is a Python script that validates an XML file against an XSD schema and identifies any tags or elements that are out of the template:

from lxml import etree

def validate_xml(xml_file, xsd_file):
try:
# Parse the XSD schema
with open(xsd_file, 'rb') as f:
schema_root = etree.XML(f.read())
schema = etree.XMLSchema(schema_root)

# Parse the XML file
with open(xml_file, 'rb') as f:
xml_tree = etree.parse(f)

# Validate the XML against the XSD schema
schema.assertValid(xml_tree)
print("XML is valid against the XSD schema.")

except etree.DocumentInvalid as e:
print("XML is NOT valid against the XSD schema.")
print("Validation errors:")
print(e)

except etree.XMLSyntaxError as e:
print("XML syntax error:")
print(e)

except Exception as e:
print("An error occurred:")
print(e)

def find_out_of_template_tags(xml_file, xsd_file):
try:
# Parse the XSD schema
with open(xsd_file, 'rb') as f:
schema_root = etree.XML(f.read())
schema = etree.XMLSchema(schema_root)

# Parse the XML file
with open(xml_file, 'rb') as f:
xml_tree = etree.parse(f)

# Validate the XML against the XSD schema
if not schema.validate(xml_tree):
print("XML is NOT valid against the XSD schema.")
print("Tags out of template:")
for error in schema.error_log:
print(f"Line {error.line}: {error.message}")

except etree.XMLSyntaxError as e:
print("XML syntax error:")
print(e)

except Exception as e:
print("An error occurred:")
print(e)

# Example usage
xml_file = 'example.xml'
xsd_file = 'schema.xsd'

validate_xml(xml_file, xsd_file)
find_out_of_template_tags(xml_file, xsd_file)

Step 3: Explanation

  • validate_xml function: This function checks if the XML file is valid against the XSD schema. If the XML is invalid, it prints the validation errors.
  • find_out_of_template_tags function: This function identifies and prints the tags or elements that are out of the template by checking the validation errors.

Step 4: Run the script

Replace 'example.xml' and 'schema.xsd' with the paths to your XML and XSD files, respectively, and run the script.

Example Output

If the XML file contains tags that are not defined in the XSD schema, the script will output something like:

XML is NOT valid against the XSD schema.
Tags out of template:
Line 10: Element 'invalid_tag': This element is not expected. Expected is (expected_tag).

This output indicates that the tag invalid_tag at line 10 is not expected according to the XSD schema.

Notes

  • Ensure that the XML and XSD files are correctly formatted.
  • The lxml library is very strict about XML and XSD syntax, so any syntax errors will be reported.

This approach should help you validate your XML against an XSD schema and identify any non-compliant tags.