Monday, December 23, 2013

A usecase for yield in PHP

With the arrival of PHP 5.5 some time ago came the new yield keyword as part of the language spec. I knew yield from other languages like ruby or python, but I never had a usecase for implementing it myself. One of my actual projects at freshcells now brought me to the point where I do not want to miss it again.
The task was to parse a 70 megabyte file of XML data and collect about 330k integer IDs from it. Each ID gets passed to another worker over a message queue. So far this is a good job for XMLReader and my first attempt was to do it in a classic "push everyhing to an array" way. The result was a runtime ~25 seconds including the file read, but with a memory footprint of about 180 megabyte.
As my primary intention was the distribution of the received IDs via RabbitMQ, so I did not need the whole array of IDs at once. This was the point where yield came into play. The change from the buffer version to the yield version was small, so it was worth a try.
The resulting method looks like this:

public function parse($xmlContent)
{
    $xmlReader = $this->xmlReader;
    $xmlReader->XML($xmlContent, $this->encoding, $this->options);
    while ($xmlReader->read() && $xmlReader->name !== 'data');

    while ($xmlReader->next('data')) {
        if ($xmlReader->nodeType === XML_ELEMENT_NODE) {
            yield (new WrapperObject($this->encoding, $this->options))->parse($xmlReader->readOuterXml());
        }
    }
}

The calling code is as easy as it could be:

foreach ($this->parser->parse($xmlListResult) as $id) {
    // do something
}

The result was amazing. 86 megabytes footprint, ~15 seconds runtime. Hope you find some useful ways to bring yield into your own application, too.

Thursday, July 11, 2013

The triangle of performance?

I'm thinking a lot about the adoption of a common project management pattern into the world of web performance. The iron triangle, about which  Jeff Atwood already wrote back in 2006,  is something

we all know about from our jobs. Now my question is, if there are other areas of our IT world, where a triangle choice pattern can be found. One, rather unfinished, example I found in the perspective of performance is the relation between data liveness, the response time and the load of the server. The relation is not perfect, as the "Server load" - "Liveness" does not fully oppose the "Response time", but it might be a start for a discussion, if there are such patterns and where they reside. Do you have other examples? I'd like to hear them.

Monday, June 10, 2013

using facebook's React with require.js

I recently stumbled upon facebook's React library, which is a Javascript library for building reusable frontend components. Even if this lib is only at version 0.3.x it behaves very stable, it is fast and is fun to code. I'm a big fan of require.js, so I tried to use React within the require.js eco system. It was not as hard as expected and here are some examples and some thoughts about it.



This is a menu component I use for a twitter bootstrap based project. The eventbus is based on EventEmitter2.





The fact I like most about using require.js together with React is, that you can write a component in one require.js module definition and just export the visible component part. The second advantage is the naming. If you use JSX you can influence the tag names and do some kind of vendor prefixing for your required tags.



In the end, the construction of the menu looks like this:





update:
The way to get things working was quite easy. React is available via Bower, so just add it to your bower.json file:





Afterwards, add it to your require.js config. I decided to change my bower install path to /public/js/components, my application scripts reside within /public/js/app.





After settings things up like this, the previously mentioned examples work like a charm.

Thursday, May 16, 2013

Book review - PHPStorm Starter (Packt Instant)

The nice people at Packt recently asked me to review on of their new books from the INSTANT series, a focussed line of books concentrating on one single topic. Although PHPStorm can become a big one, this book keeps things straight.


I am talking about: Instant PhpStorm Starter


The target group for this book are beginners and I mean real beginners. After the typical "how to download" and "how to install" and The author creates a short introduction of about five steps to bootstrap the reader and get her/him ready to rock. This intro is quite OK, as far as you have never used anything else than notepad.exe before. Most of the stuff is very common and feels a bit like a transcript of the manual.


The chapter "Top features you need to know about" is mostly about code templates, file templates and shorthand operations in the editor. Those hints are useful and even a power user might feel surprised to find the one or the other little trick that she/he did not know about.


The part about the "High-level programming operations for the PHP language" sounds better than it (sadly) really is. Accessing the documentation, instance diagrams and the autoformatter do not count as high level programming functions for me. One thing I use every day in PHPStorm is missing: The function-, file- and symbol search


The last big chapter is about using a VCS with PHPStorm. This is another chapter that I feel unsure about. I am a very intense console user when it comes to VCS, esp. git. This makes it hard for me to judge if this part of the book is helpful for beginners. I think it might be, esp. if they come from the eclipse world. From my point of view, the VCS chapter should have been replaced with a chapter about the plugin architecture of PHPStorm, a careful selection of helpful plugins and some more "navigating through your code" hints. The hint for the structure- and hierarchy view is totally missing.


When I try to imagine the point of view of someone who comes e.g. from the eclipse world, I must say, that it would have been of my very interest, that PHPStorm can handle eclipse project files (this is because IntelliJ Idea, the Java IDE from JetBrains, is a commercial competitor of the eclipse Java Development Tools) very well and that I could use both at the same time without forcing me to choose one over the other.


Conclusion:
If you never used an IDE before you will learn a lot of new things from this book. If you are an experienced IDE user of any flavor, this book will show you some low-level tricks in PHPStorm, but the, from my personal point of view, important things like code navigation are missing.. If you used any other JetBrains IDE product before (like RubyMine, PyCharm, WebIDE, etc), most of this book is useless for you.

Monday, April 8, 2013

MySQL analysis with AnalyzeMySQL

I recently joined the scalability team at trivago and my first job was to get into the actual MySQL schema to discover which parts can be improved regarding performance and space consumption.


trivago's database consists of about 230 tables, many of them having a history of more than four years, some of them up to 7 years. A few of them do have some historic flaws, like having a signed bigint as primary key (because nobody could imagine what space will be needed in a few years), others do have indexes that are not longer used, etc. There are many tasks to do until you get to know the database scheme very well.


In order to support my work I started to write a litte tool, which was meant to perform some basic tasks on the scheme. The first one was to see what col definitions are much bigger than the actual maximum value it keeps.


What came out of this approach was a small framework-ish script collection that parses the table structures and gives programatic access to it. The framework is designed to use a plugin structure to achieve a simple extensibility.


I'll try to write another post to explain the creation of such a plugin as soon as possible. In the meantime I would like to invite you to have a look at the (yet unfinished and unpolished) code and provide me some feedback.


You can find the code on github: https://github.com/xenji/AnalyzeMySQL

Saturday, March 30, 2013

Visualizing trivago's search traffic on a heatmap

While evaluating a message queue for trivago's architecture I wrote a litte demo to present RabbitMQ in one of our internal meetings. The demo should display live searches from the trivago platform and display them on a map using a heatmap layer. The demo should show that the MQ is able to cope the amount of traffic that would hit the queues and this type of technology solves some of our current near realtime problems.


As I was not allowed to touch the live systems code for it, the data needed to come from somewhere else. The accesslog seemed a proper source of information as our scribe logger delivers it in near-realtime. We use two different parameters to determine the location the visitor wants to search for, one is an integer, the other one is a list of integers. A "nice to have" feature is to show the visitors location. The scripts show this feature already, but the screen recording just shows the destinations. Log parsing? AWK to the rescue!



To resolve the IDs to a geo coordinate we did some pre-caching using Redis and a simple ID => geo construct.


The AWK script writes it's result to the STDOUT where a simple node.js scripts gets the content, resolves the geo data from Redis and passes the result to the RabbitMQ exchange. The script already shows the capability to use the geo-lite module and resolve the senders location via geoip lookup.



After sending it to the exchange, the messages are collected from the queue by a simple node.js server daemon. This daemon does only a simple job and pushes the messages from the RabbitMQ via broadcast to all connected websocket clients.



The frontend is as simple as the rest of the stack. On the client-side we use a bit of jQuery, the wonderful gmap3 library and socket.io.



The only limitation with this small toy is the limitation that the browser gives us. I've tried to raise the number of simultaneous data points displayed at the same time, but my MacBook Air (late 2012) went hot on more than 5000 points at the same time. This is related to the interaction with the event-list that controls the heatmap layer.


The average message rate in the video was between 300 and 600 messages per second but my screen recorder is just capable of 10fps, sorry ;).


trivago search destinations on a heatmap from Mario Mueller on Vimeo.

Friday, February 22, 2013

Using tunnelbroker with Apple's Airport Extreme

Inspired by the IPv6 episode from ChaosRadioExpress I tried tunnelbroker.net as a my 6in4 provider, because Unity Media cannot get native IPv6 running for long-term customers (ridiculous: new customers get it without any problem). tunnelbroker enables you to build a static tunnel, which is quite problematic if you have a DSL connection or something similar that changes it's IPv4 frequently. I use my Raspberry-PI, running on ArchLinux ARM, together with this litte ruby script to keep my public IP updated.



X2EZWJNRVT4Q