Big XML files, REXML and learning about stream parsers
After taking the easy route and building some XML check test scripts using Ruby and REXML’s DOM access, I decided that I really didn’t want my computer grinding to a halt for a whole day while it parsed a gig and a half of XML. So it was time to try a streaming parser. Unfortunately, the REXML website seemed to be unavailable. Which led me to this very nice tutorial on Jan Vereecken’s blog:
http://www.janvereecken.com/2007/4/11/event-driven-xml-parser-in-ruby
I’m pretty sure it’s nicer than the one on the REXML site, but I will have to wait and see.
Anyway, thanks Jan!