Copying an XML document in a streaming fashion

I wanted to modify large XML documents slightly. Since they are large, I didn’t want to read them into memory using XmlDocument, but use XmlTextReader and XmlTextWriter to read and write in a streaming fashion. It turned out to be non-trivial. I found this example (http://msdn.microsoft.com/en-us/magazine/cc164142.aspx):
 
XmlTextReader reader = new XmlTextReader(inputFile);
XmlTextWriter writer = new XmlTextWriter(outputFile);
// Configure reader and writer
writer.Formatting = Formatting.Indented;
reader.MoveToContent();
// Write the root
writer.WriteStartElement(reader.LocalName);
// Read and output every other node
int i=0;
while(reader.Read())
{
    if (i % 2)
        writer.WriteNode(reader, false);
    i++;
}
// Close the root
writer.WriteEndElement();
// Close reader and writer
writer.Close();
reader.Close();
 
I found out that this code did not work for me. The problem was that writer.WriteNode moves the reader to the start of the next sibling, so the code will move to the node after the next node. The effect is not as stated in the example. The solution was to replace reader.Read() with !reader.EOF like this:
 
        private Stream ModifyXMLUsingVirtualStream(Stream originalStream)
        {
            // For large messages, use Microsoft.BizTalk.Streaming.VirtualStream.
            // If using this approach, you should also:
            // 1) Move the TEMP folder of the BizTalk Host Instance account to a large and non-OS used drive.
            // 2) Make sure that BizTalk Host Instance account has appropriate permissions (read, write, delete) in the folder.
            VirtualStream outStream = new VirtualStream(VirtualStream.MemoryFlag.AutoOverFlowToDisk);
            XmlTextReader reader = new XmlTextReader(originalStream);
            reader.WhitespaceHandling = WhitespaceHandling.None;
            XmlTextWriter writer = new XmlTextWriter(outStream, Encoding.UTF8);
            // Read root node.
            reader.MoveToContent();
            // Write root node.
            writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
            writer.WriteAttributes(reader, false);
            // Add attribute
            writer.WriteAttributeString(_Prefix, _Name, _Namespace, _Value);
            // Read the rest
            reader.Read();
            while (!reader.EOF)
            {
                writer.WriteNode(reader, false);
            }
            writer.Flush();
            return outStream;
        }
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s