The task was to parse a 70 megabyte file of XML data and collect about 330k integer IDs from it. Each ID gets passed to another worker over a message queue. So far this is a good job for XMLReader and my first attempt was to do it in a classic "push everyhing to an array" way. The result was a runtime ~25 seconds including the file read, but with a memory footprint of about 180 megabyte.
As my primary intention was the distribution of the received IDs via RabbitMQ, so I did not need the whole array of IDs at once. This was the point where yield came into play. The change from the buffer version to the yield version was small, so it was worth a try.
The resulting method looks like this:
public function parse($xmlContent)
{
$xmlReader = $this->xmlReader;
$xmlReader->XML($xmlContent, $this->encoding, $this->options);
while ($xmlReader->read() && $xmlReader->name !== 'data');
while ($xmlReader->next('data')) {
if ($xmlReader->nodeType === XML_ELEMENT_NODE) {
yield (new WrapperObject($this->encoding, $this->options))->parse($xmlReader->readOuterXml());
}
}
}
The calling code is as easy as it could be:foreach ($this->parser->parse($xmlListResult) as $id) {
// do something
}
The result was amazing. 86 megabytes footprint, ~15 seconds runtime. Hope you find some useful ways to bring yield into your own application, too.
No comments:
Post a Comment