<< Free on-line XPath tool | Home | Luke - Lucene Index Toolbox >>

Building XPath expression from XML node

When programmatically dealing with large XML (or DXL) documents it is often beneficial to be able to indicate, for logging or similar, which node the processing stopped at or where the "thing" you are logging was found. The simplest way to do this for XML is using XPath. The code below is from a library I wrote and constructs a XPath expression to the org.w3c.dom.Node supplied to the method.

Consider a XML document like the one below and the below table. The left column shows the title we supply to the method and the right column the returned XPath. Notice how the method will try to use "known" attributes to address the specific node (id/name attribute) to make the XPath more readable. If no "known" attribute is found we fall back to the sibling index.

Supplied nodeXPath
Title node of "Harry Potter and the Chamber of Secrets"bookstore/book[@id='2']/title[1]
Second tag node of "Harry Potter and the Prisoner of Azkaban"bookstore/book[@id='3']/tags[1]/tag[2]

If you combine this with a nice logging engine like log4j you have a robust solution for reproducing parsing issues.

Use to your heart's content...

<?xml version="1.0" encoding="iso-8859-1" ?>
<bookstore>
  <book id="1">
    <title>Harry Potter and the Philosopher's Stone</title>
    <isbn>0747532745</isbn>
    <tags>
      <tag>children</tag>
      <tag>stone</tag>
    </tags>
  </book>
  <book id="2">
    <title>Harry Potter and the Chamber of Secrets</title>
    <isbn>0747538484</isbn>
    <tags>
      <tag>children</tag>
      <tag>secrets</tag>
    </tags>
  </book>
  <book id="3">
    <title>Harry Potter and the Prisoner of Azkaban</title>
    <isbn>0747546290</isbn>
    <tags>
      <tag>children</tag>
      <tag>prisoner</tag>
    </tags>
  </book>
</bookstore>
/* *********************************************************************
 *                    *** DISCLAIMER ***
 * This code is covered by the Creative Commons Attribution 2.5 License 
 * (http://creativecommons.org/licenses/by/2.5/).
 * 
 * You may use this code in any way you see fit as long as you realize 
 * that the code is provided AS IS without any warrenties and confers 
 * to rights what so ever! The author cannot be held accountable for 
 * any loss, direct or indirect, afflicted by using the code. 
 * 
 * *********************************************************************
 */

import java.util.Stack;

import org.w3c.dom.Element;
import org.w3c.dom.Node;

/**
 * Utility class for dealing with XML DOM elements.
 * 
 * 
 * @author Mikkel Heisterberg, lekkim@lsdoc.org
 */
public class ElementUtil {
   
   /**
    * Constructs a XPath query to the supplied node.
    * 
    * @param n
    * @return
    */
   public static String getXPath(Node n) {
      // abort early
      if (null == n) return null;

      // declarations
      Node parent = null;
      Stack hierarchy = new Stack();
      StringBuffer buffer = new StringBuffer();

      // push element on stack
      hierarchy.push(n);

      parent = n.getParentNode();
      while (null != parent && parent.getNodeType() != Node.DOCUMENT_NODE) {
         // push on stack
         hierarchy.push(parent);

         // get parent of parent
         parent = parent.getParentNode();
      }

      // construct xpath
      Object obj = null;
      while (!hierarchy.isEmpty() && null != (obj = hierarchy.pop())) {
         Node node = (Node) obj;
         boolean handled = false;

         // only consider elements
         if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element e = (Element) node;

            // is this the root element?
            if (buffer.length() == 0) {
               // root element - simply append element name
               buffer.append(node.getLocalName());
            } else {
               // child element - append slash and element name
               buffer.append("/");
               buffer.append(node.getLocalName());
               
               if (node.hasAttributes()) {
                  // see if the element has a name or id attribute
                  if (e.hasAttribute("id")) {
                     // id attribute found - use that
                     buffer.append("[@id='" + e.getAttribute("id") + "']");
                     handled = true;
                  } else if (e.hasAttribute("name")) {
                     // name attribute found - use that
                     buffer.append("[@name='" + e.getAttribute("name") + "']");
                     handled = true;
                  }
               }

               if (!handled) {
                  // no known attribute we could use - get sibling index
                  int prev_siblings = 1;
                  Node prev_sibling = node.getPreviousSibling();
                  while (null != prev_sibling) {
                     if (prev_sibling.getNodeType() == node.getNodeType()) {
                        if (prev_sibling.getLocalName().equalsIgnoreCase(node.getLocalName())) {
                           prev_siblings++;
                        }
                     }
                     prev_sibling = prev_sibling.getPreviousSibling();
                  }
                  buffer.append("[" + prev_siblings + "]");
               }
            }
         }
      }

      // return buffer
      return buffer.toString();
   }
}

Tags : , , , ,


Re: Building XPath expression from XML node

Great code, Here you find a way to use xpath from Lotussript: http://www.notessidan.se/A55B53/blogg.nsf/plink/TADN-74AGNZ - Thomas
Avatar: Tommy Valand

Re: Building XPath expression from XML node

If you need serverside XSLT from urls (?ReadViewEntries, rss, etc) with LS (also works with authentication), I just posted code here

Re: Building XPath expression from XML node

... A little adaptation in JavaScript code (Mozilla Firefox). Hope it'll be helpful....

 

Gabryz

function GetXPath(GivenNode)
{
 
 if (null == GivenNode) return null
 
 // declarations -----------------------------------------------
 var TempParent = null
 var Hierarchy = new Array()
 var StringBuffer = ""
 //-------------------------------------------------------------
 
 // push first element on stack
 Hierarchy.push(GivenNode)
 
 // search for all ancestors -----------------------------------
 var TempParent = GivenNode.parentNode
 var TempNode
 var TempSibling
 var PrevSiblings
 var Handled = false
 while (null != TempParent && TempParent.nodeType != Node.DOCUMENT_NODE)
 {
  Hierarchy.push(TempParent)
  TempParent = TempParent.parentNode
 }
 //-------------------------------------------------------------

 // construct xpath
 while (!(Hierarchy.length == 0) && null != (TempNode = Hierarchy.pop()))
 {
  Handled = false
  // only consider elements
  if (TempNode.nodeType == Node.ELEMENT_NODE)
  {
            // is this the root element?
   if (StringBuffer.length == 0)
   {
    // root element - simply append element name
    StringBuffer = StringBuffer + TempNode.localName
            }
            else
            {
             // child element - append slash and element name
             StringBuffer = StringBuffer + "/" + TempNode.localName
             if (TempNode.hasAttributes())
             {
              // see if the element has a name or id attribute
              if (TempNode.attributes.getNamedItem("id"))
              {
               StringBuffer = StringBuffer + "[@id='" + TempNode.attributes.getNamedItem("id").nodeValue + "']"
               Handled = true
              }
              else if (TempNode.attributes.getNamedItem("name"))
              {
               StringBuffer = StringBuffer + "[@name='" + TempNode.attributes.getNamedItem("name").nodeValue + "']"
               Handled = true
              }
             }
             
             if (!Handled)
             {
              // no known attribute we could use - get sibling index
              PrevSiblings = 1
              TempSibling = TempNode.previousSibling
              while (null != TempSibling)
              {
               if (TempSibling.nodeType == TempNode.nodeType)
               {
                if (TempSibling.localName.toLowerCase() == TempNode.localName.toLowerCase())
                {
                 PrevSiblings++
                }
               }
               TempSibling = TempSibling.previousSibling
              }
              StringBuffer = StringBuffer + "[" + PrevSiblings + "]"
             }
            }
  }
 }
 // return buffer
 return StringBuffer
}


Add a comment Send a TrackBack