XSL Translation: Powerful, Misunderstood, and Mislabeled

Back in the year 1999, XSL and CSS (both dubbed “stylesheet languages”) were unofficial competitors. CSS ended up becoming the “winner”. This was actually a good thing for XSL, which was considered to be more powerful and complex than CSS.
CSS is all well and good for the applying of styles to online content (such as Web pages), but the true power of XSL lies in its ability to translate XML into any other text-based format. XSL can be used completely outside of the context of browsers and Web pages; all you need is a good XSLT processor that you can use in your code (sample code provided below).

An important thing to note, at this point, is that there will be no discussion of any “browser” or “HTML” in this post, nor will there be any discussion of behavior that one would describe as “stylesheet-like”. Instead, the true power of XSL translation will be illustrated.

Example: Converting XML Into CSV Text

This example will illustrate conversion of XML data into CSV (Comma Separated Value, an input format that is supported by Excel, for example).

Here is the input XML:

<?xml version="1.0" encoding="utf-8"?>
<DataTable name="The Data">
 <Row name="Header Row">
  <Column name="Column 1" />
  <Column name="Column 2" />
  <Column name="Column 3" />
 </Row>
 <Row name="Row 1">
  <Column value="Alpha"/>
  <Column value="Bravo"/>
  <Column value="Charlie"/>
 </Row>
 <Row name="Row 2">
  <Column value="Delta"/>
  <Column value="Echo"/>
  <Column value="Foxtrot"/>
 </Row>
 <Row name="Row 3">
  <Column value="Golf"/>
  <Column value="Hotel"/>
  <Column value="India"/>
 </Row>
</DataTable>

Here is the desired spreadsheet layout:

Column 1 Column 2 Column 3
Alpha Bravo Charlie
Delta Echo Foxtrot
Golf Hotel India

 

Here is the XSL translation code (more information on XSLT available here):

<?xml version='1.0'?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output
    method="text"
    omit-xml-declaration="yes"/>

    <xsl:variable
    name="dataTableName"
    select="/DataTable/@name"/>

    <xsl:template match="/DataTable">
        <xsl:choose>
            <xsl:when test="$dataTableName">
                Data Table Name: "<xsl:value-of select="$dataTableName"/>"&#13;&#10;
            </xsl:when>
            <xsl:otherwise>Data Table Name: (no name)&#13;&#10;</xsl:otherwise>
        </xsl:choose>
        <xsl:apply-templates select="Row"/>
    </xsl:template>

    <xsl:template match="Row">
        <xsl:if test="@name">
            <xsl:value-of select="@name"/>,
        </xsl:if>
        <xsl:apply-templates select="Column"/>
        <!--  add a line break (0D0A) between rows -->
        <xsl:if test="following-sibling::Row">&#13;&#10;</xsl:if>
    </xsl:template>

    <xsl:template match="Column">
        <xsl:choose>
            <xsl:when test="@name">
                <xsl:value-of select="@name"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="@value">
                    <xsl:value-of select="@value"/>
                </xsl:if>
            </xsl:otherwise>
        </xsl:choose>
        <!--  add a comma between columns -->
        <xsl:if test="following-sibling::Column">,</xsl:if>
    </xsl:template>

</xsl:stylesheet>

Here is C++ source code that will perform the translation (using MSXML):

#define WIN32_LEAN_AND_MEAN

#include <stdio.h>
#import <msxml3.dll> named_guids
using namespace MSXML2;

int main(int argc, char* argv[])
{
   CoInitialize(NULL);	

   variant_t vResult;
   void *output  = NULL;
   MSXML2::IXMLDOMDocumentPtr pXml(MSXML2::CLSID_DOMDocument);
   MSXML2::IXMLDOMDocumentPtr pXslt(CLSID_FreeThreadedDOMDocument);
   IXSLTemplatePtr pTemplate(CLSID_XSLTemplate);
   IXSLProcessorPtr pProcessor;
   IStream *pOutStream;

   try
   {
      vResult = pXml->load(_bstr_t("input.xml"));
      vResult = pXslt->load(_bstr_t("translate.xslt"));
   }
   catch(_com_error &e)
   {
      printf(
         "Error loading XML/XSL: %s\n",
         (const char*)_bstr_t(e.Description()));
      exit(-1);
   }

   try
   {
      vResult = pTemplate->putref_stylesheet(pXslt);
      pProcessor = pTemplate->createProcessor();
   }
   catch(_com_error &e)
   {
      printf(
         "Error initializing XSL translator: %s\n",
         (const char*)_bstr_t(e.Description()));
      exit(-1);
   }

   CreateStreamOnHGlobal(NULL,TRUE,&pOutStream);
   pProcessor->put_output(_variant_t(pOutStream));

   vResult = pProcessor->put_input(_variant_t((IUnknown*)pXml));
   pProcessor->transform();

   HGLOBAL hg = NULL;
   pOutStream->Write((void const*)"\0",1,0);
   GetHGlobalFromStream(pOutStream,&hg);
   output = GlobalLock(hg);
   // advance past (and ignore) BOM:
   printf("%S",(((const char*)(output)) + 2));
   GlobalUnlock(hg);

   pXml.Release();
   pXslt.Release();
   pTemplate.Release();
   pProcessor.Release();

   CoUninitialize();

   return 0;
(original source available here)

Here is the command-line to use (with the above program):

convert > output.csv

Here is an example of the output:

Data Table Name: "The Data"
Header Row,Column 1,Column 2,Column 3
Row 1,Alpha,Bravo,Charlie
Row 2,Delta,Echo,Foxtrot
Row 3,Golf,Hotel,India

Happy translating!

Queue Class

If you are programming with .Net, there is a queue class already implemented for you. If you have ever endured a “white board” programming interview, however, you may have discovered that it helps to have implementations already committed to memory for queues, trees, lists, stacks, and tables (among other things). There is a unique opportunity here to take ownership of your success, at the lowest level, and bring your own best work to the table. Sure, you could look up a queue implementation online (like the one illustrated in this post), memorize it, and use it for an interview question or even incorporate it into a program; but your chances of success will be much higher if you develop and test your own queue class.

In the interest of brevity (and the minimization of keystrokes or whiteboard characters), I have used shorthand code in the following example.

This queue class is far from optimal. It assumes an ideal processing environment in which the cost of allocating/deallocating memory is low and can be called repeatedly. A far better implementation would either have a fixed number of possible elements or an increment by which the element count can grow (other than 1 for each enqueue and -1 for each dequeue, as in this example).

Content of Queue.h:

#define O void // for Object pointers (O*)
#define Z NULL // shorthand for NULL
 
class E // represents a queue Element
{
public:
    O* m_pO; // pointer to an Object
    E* m_pN; // pointer to the Next element
 
    E(O* pO) // Element constructor
    {
        m_pO = pO;
        m_pN = Z;
    };
};
 
class Q // represents a Queue
{
public:
    E* m_pF; // pointer to the Front element
    E* m_pB; // pointer to the Back element
 
    Q() // Queue constructor
    {
        m_pF = Z;
        m_pB = Z;
    };
 
    ~Q() // Queue destructor
    {
        while (m_pF) // while the front element is not NULL,
        {
            DQ(); // dequeue elements
        }
    };
 
    E* NQ(O* pO) // Enqueue function
    {
        if (!pO) // if the object pointer is NULL,
        {
            return Z; // do not enqueue
        }
 
        E* pE = new E(pO); // create a new element
 
        if (pE) // if the element was allocated,
        {
            if (m_pB) // if the back element is not NULL,
            {
                m_pB->m_pN = pE; // uptate the next pointer of the back element
            }
 
            m_pB = pE; // place the new element at the back of the queue
 
            if (!m_pF) // if the front element is NULL,
            {
                m_pF = m_pB; // set the front element equal to the back element
            }
        }
 
        return pE; // return the new element pointer
    };
 
    O* DQ() // Dequeue function
    {
        E* pE = m_pF; // get the element at the front of the queue
 
        if (pE) // if the element is not NULL,
        {
            m_pF = m_pF->m_pN; // point the front element at the next element
 
            if (!m_pF) // if the front element is NULL,
            {
                m_pB = Z; // set the back element to NULL
            }
 
            O* pO = pE->m_pO; // get the object pointer from the element
 
            delete pE; // delete the element
 
            return pO; // return the object pointer
        }
        else // if the element is NULL,
        {
            return Z; // return NULL, as the queue is empty
        }
    };
};

Content of a test program (windows console application):

#define WIN32_LEAN_AND_MEAN
#define WINVER 0x0400
#define _WIN32_WINNT 0x0400
#define _CRTDBG_MAP_ALLOC
 
#include 
#include 
#include 
#include "Queue.h"
 
#define I 7
 
_TCHAR szA[] = _T("alpha");
_TCHAR szB[] = _T("bravo");
_TCHAR szC[] = _T("charlie");
_TCHAR szD[] = _T("delta");
_TCHAR szE[] = _T("echo");
_TCHAR szF[] = _T("foxtrot");
_TCHAR szG[] = _T("golf");
 
_TCHAR* items[I] = { szA, szB, szC, szD, szE, szF, szG };
 
int _tmain(int argc, _TCHAR* argv[])
{
        Q q;
    int i = 0;
    _TCHAR* szOut = NULL;
 
    //
    // verify output items match input items
    //
 
    // fill the queue:
    for (i = 0; i < I; i++)
    {
        if (NULL == q.NQ((O*)items[i]))
        {
            _tcprintf(_T("Fail: verify output items match input items; enqueue\r\n"));
            return 0;
        }
    }
    // test the queue contents:
    for (i = 0; (NULL != (szOut = (_TCHAR*)q.DQ())); i++)
    {
        if (_tcscmp(szOut, items[i]) != 0)
        {
            _tcprintf(_T("Fail: verify output items match input items; content\r\n"));
            return 0;
        }
    }
    // test the item count:
    if (i != I)
    {
        _tcprintf(_T("Fail: verify output items match input items; count\r\n"));
        return 0;
    }
    // write pass message:
    _tcprintf(_T("Pass: verify output items match input items\r\n"));
 
    //
    // verify empty queue
    //
 
    // the queue should now be empty:
    if (NULL != q.DQ())
    {
        _tcprintf(_T("Fail: verify empty queue\r\n"));
        return 0;
    }
    // write pass message:
    _tcprintf(_T("Pass: verify empty queue\r\n"));
 
    //
    // verify enqueue after partial dequeue
    //
 
    // fill the queue with 3 items:
    for (i = 0; i < 3; i++)
    {
        if (NULL == q.NQ((O*)items[i]))
        {
            _tcprintf(_T("Fail: verify enqueue after partial dequeue; enqueue\r\n"));
            return 0;
        }
    }

    // dequeue 2 items:
    for (i = 0; i < 2; i++)
    {
        if (NULL == q.DQ())
        {
            _tcprintf(_T("Fail: verify enqueue after partial dequeue; dequeue\r\n"));
            return 0;
        }
    }

    // add 4 more items to the queue:
    for (i = 0; i < 4; i++)
    {
        if (NULL == q.NQ((O*)items[i]))
        {
            _tcprintf(_T("Fail: verify enqueue after partial dequeue; enqueue\r\n"));
            return 0;
        }
    }
    // write pass message:
    _tcprintf(_T("Pass: verify enqueue after partial dequeue\r\n"));
 
    // queue now contains 5 items;
    // the queue destructor should clean these up;
    // otherwise, memory leaks will be detected below:
    q.~Q();
 
    // trigger memory leak detection:
    if (_CrtDumpMemoryLeaks())
    {
        _tcprintf(_T("Fail: verify queue destructor\r\n"));
    }
    else
    {
        _tcprintf(_T("Pass: verify queue destructor\r\n"));
    }
    return 0;
}

Example output:

Pass: verify output items match input items
Pass: verify empty queue
Pass: verify enqueue after partial dequeue
Pass: verify queue destructor

I strongly encourage all programmers, regardless of experience level, to implement and test classes similar to, but more optimal than, this. Use them in actual product code, and skip taking dependencies on pre-fabricated container classes. Run some comparative performance testing. See if you can create something better, faster, and smaller! Dare to go native!