Monday, December 23, 2013

A usecase for yield in PHP

With the arrival of PHP 5.5 some time ago came the new yield keyword as part of the language spec. I knew yield from other languages like ruby or python, but I never had a usecase for implementing it myself. One of my actual projects at freshcells now brought me to the point where I do not want to miss it again.
The task was to parse a 70 megabyte file of XML data and collect about 330k integer IDs from it. Each ID gets passed to another worker over a message queue. So far this is a good job for XMLReader and my first attempt was to do it in a classic "push everyhing to an array" way. The result was a runtime ~25 seconds including the file read, but with a memory footprint of about 180 megabyte.
As my primary intention was the distribution of the received IDs via RabbitMQ, so I did not need the whole array of IDs at once. This was the point where yield came into play. The change from the buffer version to the yield version was small, so it was worth a try.
The resulting method looks like this:

public function parse($xmlContent)
{
    $xmlReader = $this->xmlReader;
    $xmlReader->XML($xmlContent, $this->encoding, $this->options);
    while ($xmlReader->read() && $xmlReader->name !== 'data');

    while ($xmlReader->next('data')) {
        if ($xmlReader->nodeType === XML_ELEMENT_NODE) {
            yield (new WrapperObject($this->encoding, $this->options))->parse($xmlReader->readOuterXml());
        }
    }
}

The calling code is as easy as it could be:

foreach ($this->parser->parse($xmlListResult) as $id) {
    // do something
}

The result was amazing. 86 megabytes footprint, ~15 seconds runtime. Hope you find some useful ways to bring yield into your own application, too.

No comments:

Post a Comment