Fast XML parsing using SAX (Simple API for XML)

.
XML parsing using SAX


Introduction

SAX (Simple API for XML) is a fast event based interface for parsing XML files sequentially through a set of callback methods. Unlike DOM (Document Object Model) SAX doesn't require memory for in-memory representation of the XML document thus it facilitates handling of large XML files and is best used for extracting small quantities of data from a large XML file.

Main Benefits:

  • Supports local or web based XML files.
  • Parsing can be aborted at any point (before the whole document is parsed).
  • Includes methods for completing transformations and validation.
  • Lightweight.

Using SAX

It is rather simple to implement SAX. First we implement an abstract COM interface ISAXContentHandler which has members to be called to notify us of different elements of the XML document being parsed. Then we create SAXXMLReader as we would any COM object (using CoCreateInstance) and we set the content handler of the reader (using our aforementioned implementation).
	CoInitialize(NULL);


	ISAXXMLReader* pRdr = NULL;

	HRESULT hr = CoCreateInstance(
					__uuidof(SAXXMLReader), 
					NULL, 
					CLSCTX_ALL, 
					__uuidof(ISAXXMLReader), 
					(void **)&pRdr);

	if (!FAILED(hr)) 
	{
		ISAXContentHandler* pContentHandler = new CSaxContentHandlerImp;
		hr = pRdr->putContentHandler(pContentHandler);

		//SAXErrorHandlerImpl * pEc = new SAXErrorHandlerImpl();
		//hr = pRdr->putErrorHandler(pEc);

		// SAXDTDHandlerImpl * pDc = new SAXDTDHandlerImpl();
		// hr = pRdr->putDTDHandler(pDc);

		if (FAILED((hr = pRdr->parseURL((unsigned short*)pPath))))
			wprintf(L"\nError parsing file, code: %08X\n\n", hr);
		else
			wprintf(L"\n\nSucess\n\n");

		pRdr->Release();
		delete pContentHandler;

	}
	else
	{
		wprintf(L"\nError creating COM object, code: %08X\n\n", hr);
	}

	CoUninitialize();



Now with each new element found a content handler member will be called to tell us about it, for example the following method will be called when a new element is found:
HRESULT STDMETHODCALLTYPE CSaxContentHandlerImp::startElement( 
			/* [in] */ unsigned short __RPC_FAR *pwchNamespaceUri,
			/* [in] */ int cchNamespaceUri,
			/* [in] */ unsigned short __RPC_FAR *pwchLocalName,
			/* [in] */ int cchLocalName,
			/* [in] */ unsigned short __RPC_FAR *pwchRawName,
			/* [in] */ int cchRawName,
			/* [in] */ ISAXAttributes __RPC_FAR *pAttributes)
{
	int nAttrCount = 0;
	pAttributes->getLength(&nAttrCount);


	unsigned short *pAttr = NULL, *pValue = NULL;
	int nAttrLen = 0, nValueLen = 0;


	wprintf(L"Local Name: %s\n", pwchLocalName);


	wstring str;

	for (int i = 0; i < nAttrCount; ++i)
	{
		pAttributes->getLocalName(i, &pAttr, &nAttrLen);
		str.assign((wchar_t*)pAttr, nAttrLen);
		wprintf(L"Attribute Name: %s\n", str.c_str());

		pAttributes->getValue(i, &pValue, &nValueLen);
		str.assign((wchar_t*)pValue, nValueLen);
		wprintf(L"Attribute Value: %s\n", str.c_str());
	}


	return S_OK;
}



References

.
.

Comments are closed on this post.