Screamin Igel logo    
  Recursion and the XSLT Result Tree  
 
 
 
    If you have any questions about XML, XSLT, our open source programs, or wish to obtain more information on our products, then please send a message to OpenSource@artige.com and we will happy to handle your request.
 
 
 
  Available on this page:   Overview   Recursion   Result Tree   Style vs. Content
 
 
 
Overview   This article covers a number of XSLT tasks that are related to each other by virtue of their being processed multiple times, or affect that same part of the generated output multiple times, or the author desires that the output be manipulated multiple times. It discusses working with multiple source XML documents, and how the result tree is generated. The impetus for this article were recurring questions concerning the inability to access the result tree from an XSLT styles sheet. As is explained below, the answer simply is: once written the result tree CANNOT be accessed again from the style sheet it was generated from.
 
    Now you may ask, what does the inability to access the XSLT result tree have to do with recursion and iteration? The answer is straightforward. If you were satisfied with making a single pass when applying a style sheet to an XML document, then there would be no need to access the result tree. Once written, you would have been satisfied with its results.
 
    The desire to modify the output after it has been created is directly correlated to the desire to re-process the XML document or result tree. This is why accessing the result tree needs to be discussed in the same context as recursion and iteration, which are the only vehicles that the XSLT language provides for re-processing. This article will discuss the limitations that the XSLT language places upon the style sheet designer with regard to multiple accesses of source and output.
 
 
 
Recursion   There are at least four levels of recursion when considering XSLT style sheet processing. Two of them are defined in XSL Transformations (XSLT) Version 1.0 [W3C Recommendation 16 November 1999], expressed below:
 
    1-  Nodes from the XML source document are selected by templates in the XSLT style sheet for processing according to the rules in the template. The XSLT language is structured such that those rules can reference additional nodes in the XML document, which in turn will trigger the execution of templates that are indicated by the selected nodes. This process of seeking and transforming nodes is continued for each template until the referenced nodes are exhausted and no more nodes can be found. This processing procedure is laid out in sections 5.1 and 5.4 of the XSLT recommendation.
 
    2-  A template can explicitly call itself, ad infinitum. This is a special case of the first recursive method, just that the template is directed to collect nodes by processing itself. This usually pertains to tree processing, where nodes of the same type are nested, such that the template can work its way to the "bottom" of the node list. The processing terminates the same way as above, once the set of nodes that can be collected has been exhausted.
 
    Other kinds of XSLT recursion that we have had expressed to us can be categorized as the following:
 
    3-  The declarative processing model of XSLT style sheet processing has been equated to a kind of recursion. This usually comes from developers who are more comfortable with procedural languages, which is by far the overwhelming majority of developers, and not familiar with declarative programming models. The XSLT processing model is driven in a serial fashion, with templates processed in succession. This is typical of procedural processing models, so by design the XSLT processing model is not recursive in nature. It just so happens that the XSLT recommendation offers a recursive ability. In addition, the replacement approach of a declarative process may seem to be recursive to the procedural developer.
 
    4-  Since the XSLT language uses the XPath language to locate the nodes through pattern matching to process the various templates declared in a style sheet, one should be able to modify the output tree by selecting its nodes using the same XPath methodology, since the result tree is just a collection of nodes (maintained in machine memory), even when the output has been declared to be HTML or text. One reason there is confusion about the status of the result tree is that the XSLT recommendation is extremely quiet about the topic of accessing the result tree from the current style sheet and modifying the contents. Therefore it is taken for granted by XSLT processor writers that this quiet equates to a write-once and never-modify policy regarding the result tree.
 
Iteration / Repetition   When discussing iteration one is concerned whether a task can be processed multiple times. The XSLT recommendation only offers a single method for accomplishing iteration:
 
    1-  The XSLT recommendation provides repetitive capabilities through the use of the xsl:for-each statement. This instruction is able to collect a node set, which presumably will contain a collection of nodes that needs to have the same style applied to its members. A template is associated with the instruction, which informs the XSLT processor what rules to apply to each node member of the collection. Pretty much straightforward, with the end effect of this instruction behaving like a for-loop. The ability to collect a set of nodes using XPath makes this single instruction quite powerful, which is why no other repetitive instruction was included in the XSLT recommendation.
 
Multiple Sources   Another XSLT matter that can be leveraged in multiples are sources. While the XSLT recommendation does not allow the result tree to be accessed or modified after it has been created, it does offer some methods to access multiple sources of XML documents simultaneously.
 
    1-  If one has a desire to insert well-formed XML content into the result tree as-is without any modification, then the xsl:copy-of statement might suffice your needs. This is a no-nonsense instruction that will copy a node-set in its entirety from the XML source document into the result tree. This is useful if you have boilerplate that needs to be inserted into the result tree. Only caveat is that the node set must be well-formed XML.
 
    2-  If one needs to access XML source from a completely separate document than the one being transformed, then the XSLT function document() would be of interest. It allows one to select a node set from a separate document, or merge nodes from multiple documents if a list of documents is provided to this function. This instruction can be used with any XSLT statement that requires a selector. Sometimes it is more efficient to store the returned node set as a xsl:variable, and then reference the variable whenever one needs the data located in the external document. This instruction can be used as often as desired, so any number of external XML documents can be referenced during the style sheet transformation process.
 
    In summary, the XSLT recommendation offers recursive methods to access XML source documents, repeat some of those tasks and access multiple XML sources simultaneously, it does not allow for the results tree to be modified once it has been created. If the output of the XSLT transformation needs to be modified using XSLT, then it will need to be reprocessed using another style sheet.
 
 
 
Result Tree   The result tree is created by applying an XSLT style sheet against an XML document, using an XSLT processor. Microsoft provides such a processor in its MSXML library, while Sun has one in its JAXP package. These are not the only XSLT processors available, there are many more that can be located by doing a simple web search for the term: XSLT processor. The result tree is built in sequential order, abiding by the following rules, located in section 7 of the XSLT recommendation:
 
    1-  The XSLT style sheet is processed sequentially, in serial fashion. There is no ability to jump out of sequence, but there is the ability to call templates, such that the call seems to be similar to a subroutine call. The control is returned to the place the template was called from once the template's node-set has been exhausted. The result set grows as each template is encountered and processed.
 
    2-  Literal elements are processed in the sequence they are encountered, copied over as-is to the result tree. Literal elements are the tags and text that are written into the style sheet that do not have XSLT tags and are not considered to be instructions by the XSLT processor.
 
    3-  If the source XML document, or any documents referenced through the XSLT document() function contain structure and content that is ready to use, then the XSLT xsl:copy-of instruction can be used to copy the nodes from the source verbatim. It will be inserted sequentially into the position of the result tree after the results of the last XSLT instruction or literal elements were inserted. Note that this instruction will copy the nodes and all of its children and attributes over as-is. If you only need the node without children or attributes, then the XSLT xsl:copy instruction might suffice
 
    4-  If the result tree is expected to hold XML or HTML nodes for eventual serialization, and the source XML document does not contain the nodes in a format where they can be copied as-is, and the style sheet cannot store the nodes statically, then one will need to calculate the desired nodes using XSLT statements, such as xsl:element, xsl:attribute, xsl:processing-instruction, xsl:comment, xsl:text and xsl:value-of. In other words, if an XML or HTML structural item (like a tag) needs to be appended to the result tree through calculation, then an XSLT instruction must be used to carry out this task.
 
    The above exercise shows that building up the result tree is an accumulation of four rules sets: read in content as is from the style sheet, process templates, copy nodes from various XML sources and build new nodes through calculations. This is all done in sequential fashion by reading in the XSLT style sheet. The result set is built in memory and can be serialized into XML HTML or text format. Nowhere in the XSLT recommendation is there any reference to the result tree being modified once written to in memory or serialized.
 
 
 
Style vs. Content   Separating style from structure from content
 
    What follows is the reason why this article was written in the first place. During various engagements with clients and also experienced through the XML classes, we were approached with the following XSLT problem. The style sheet author would write a style sheet that contained XSLT xsl:copy-of statements followed by various templates with XPATH selectors for nodes that existed in both original and now in result tree (xsl:output set to XML). Complaint was that the XSLT processor output did not have the desired changes applied to it, when compared to the nodes that were copied. The style sheet author's expectation was that the output would have the template changes applied to it, but that did not happen.
 
    Having gone through the above analysis, we now know for a fact that the result tree cannot be touched once it has been written out. So the nodes that were copied over using the xsl:copy-of statement cannot be touched again by the original style sheet. This is the case regardless that the template tried to reference the same result set nodes as were present in the source XML document. As a matter of fact, had the xsl:output statement been set to text, the output document would have shown the original copied over node set plus the content without tags from the nodes that were being manipulated by the succeeding template instructions. For the given XSLT processor used (MSXML), a text output would have generated the entire result set, while the XML output only generated well formed output, stripping the "extraneous" text from the generated output, hiding the problem from the style sheet author.
 
    When asked why this approach was being used, the answer given concerned the celebrated concept of "separating style from content". The external XML document that was first copied over in its entirety using the XSLT xsl:copy-of statement was a skeleton or boilerplate of the desired structure and style. The primary XML document that had the XSLT style sheet applied to it had the content for a specific instantiation of the output document (an XHTML web page). It was the desire of this style sheet author to have one XML document containing the content, another XML document that was a the example of the output containing the desired structure, and the style sheet was just to be used to house the rules to merge the two XML documents into a single XHTML document.
 
    This seemed to be a laudable goal at first. The style sheet author wanted to avoid interspersing XHTML tags within the style sheet, as they did not seem to have anything to do with the rules that needed to be processed, and were "just coming along for the ride". This would especially pertain to the literal elements decribed above. The tags to be copied were cluttering the style sheet and made it difficult to locate the XSLT node calculation statements that were modifying content and generating tags based on the instance specific requirements. These XSLT node calculation statements are quite verbose. What this author was seeking was a system that separated structure - style - content, not just style from content. This author had an excellent point, but at this time it seems that XSLT recommendation is not designed to do a separation of structure - style - content.
 
    One point that XSLT brings to the table the feasibility to perform the desired separation of structure - style - content. As we all know, the "X" in XSLT stands for eXtensible. One could create their own extension to the XSLT language and add features to an XSLT processor (open source most likely). Then one could run the documents and extended style sheet using the extended XSLT processor, relying upon tags that were defined to access the result tree. Investigating this method in depth we found that the boilerplate did not cover enough of the desired output to cover the needs of a dynamically generated web page. We found that at most half, if even that much, of the desired output could be defined in an XML document that could then be copied over as-is. Using our favorite 80/20 split as a guideline, this project would not bring enough benefit to extend an XSLT processor to access the result tree using special tags. Even the details in filling out a navigation bar and footer, etc. was found to be a much larger effort than the boilerplate XML might have provided. So for the time being we recommended to stick with the typical, but verbose, XSLT style sheet method of generating the XHTML web pages dynamically.
 
 
 
    Return to articles page Navigate to home page
 
 
 

Valid XHTML 1.0! Valid CSS!

All rights reserved.   All site content copyright © 1997-2005 Artige Company     For more info... Legal      For more info... Privacy Policy
Last updated:
17-April-2005 03:49z