xenji.com - My Dev Life

Sunday, May 4, 2014

Deploying Symfony2 apps with magephp and file system acls

Releases still seem to be the most frightening point in the chain of development events towards finishing a release. I recently had a discussion with a coworker about the best way to deploy a php app, specifically the rights management on the server. We are using magephp for our deployments, which does the same thing as capistrano back in the days, but is way cooler, because it's written in php and installable via composer.

Based on the discussion, here is my attempt so solve all problems and bring peace to the world:

Our deployment root should be e.g. /srv/www. In this directory, we have eventually a domain dir (which is the reversed DNS name, like com.xenji, but in all cases an application dir, which matches the subdomain it is hosted on (like blog). In this application dir we have the default mage release folder structure: current (a symlink), releases and our own addition "shared". We've borrowed this from capistrano, as we needed something to put the logs and shared config files into.

The final structure looks like this:

lrwxrwxrwx deployer nobody current -> releases/20140305105721
drwxr-xr-x deployer nobody releases
drwxr-xr-x nobody nobody shared

So far, great. We have a structure everybody can remember, all cool. But wait! Nobody? Deployer? Shared is only writable for the server process? Let's peek into the folders:

The releases folder looks quite normal:

drwxr-xr-x 9 deployer nobody 4.0K May 1 16:15 20140501141458
drwxr-xr-x 9 deployer nobody 4.0K May 1 16:30 20140501142942

The shared folder look a bit different in terms of rights.

$> ls -lah shared/app/
drwxr-xr-x 3 nobody nobody 4.0K May 1 10:03 .
drwxr-xr-x 3 nobody nobody 4.0K May 1 10:03 ..
drwxrwxr-x+ 2 nobody nobody 4.0K May 1 10:03 logs

The logs folder is linked into the release folder right before the commands for that release start (generate caches, etc). You might have noticed the plus sign on the right of the folders permission bits. This is the indicator for "There are additional ACLs". You can fetch them with:

$> getfacl logs
# file: logs
# owner: nobody
# group: nobody
user::rwx
user:deployer:rwx
group::r-x
mask::rwx
other::r-x

The group nobody does not have any rights on that folder, but only the deployer user has (in addition to the webserver) the permission to write the logs folder. The same thing is on the app/cache folder. This solves the conflict between the delpoy user and the webserver user.
The application folder looks like this:

...
-rw-r--r-- 1 deployer nobody 90K Apr 7 15:28 bootstrap.php.cache
drwxrwxr-x+ 3 deployer nobody 4.0K May 1 16:30 cache
-rw-r--r-- 1 deployer nobody 1.8K Feb 7 18:08 check.php
drwxr-xr-x 3 deployer nobody 4.0K May 1 16:30 config
-rwxr-xr-x 1 deployer nobody 867 Feb 7 18:08 console
-rw-r--r-- 1 deployer nobody 13 Feb 7 18:08 .htaccess
lrwxrwxrwx 1 deployer nobody 49 May 1 16:30 logs -> /srv/www/some/path/shared/app/logs
...

The original author of Magallanes seems to have stopped the active development on the project, so we've added plenty of features to our fork of Magallanes, check it out on github: https://github.com/freshcells/Magallanes. You will find the ACL task, a new deployment strategy via remote cache (similar to the one you might know from capistrano) and more.

Monday, December 23, 2013

A usecase for yield in PHP

With the arrival of PHP 5.5 some time ago came the new yield keyword as part of the language spec. I knew yield from other languages like ruby or python, but I never had a usecase for implementing it myself. One of my actual projects at freshcells now brought me to the point where I do not want to miss it again.
The task was to parse a 70 megabyte file of XML data and collect about 330k integer IDs from it. Each ID gets passed to another worker over a message queue. So far this is a good job for XMLReader and my first attempt was to do it in a classic "push everyhing to an array" way. The result was a runtime ~25 seconds including the file read, but with a memory footprint of about 180 megabyte.
As my primary intention was the distribution of the received IDs via RabbitMQ, so I did not need the whole array of IDs at once. This was the point where yield came into play. The change from the buffer version to the yield version was small, so it was worth a try.
The resulting method looks like this:

public function parse($xmlContent)
{
    $xmlReader = $this->xmlReader;
    $xmlReader->XML($xmlContent, $this->encoding, $this->options);
    while ($xmlReader->read() && $xmlReader->name !== 'data');

    while ($xmlReader->next('data')) {
        if ($xmlReader->nodeType === XML_ELEMENT_NODE) {
            yield (new WrapperObject($this->encoding, $this->options))->parse($xmlReader->readOuterXml());
        }
    }
}

The calling code is as easy as it could be:

foreach ($this->parser->parse($xmlListResult) as $id) {
    // do something
}

The result was amazing. 86 megabytes footprint, ~15 seconds runtime. Hope you find some useful ways to bring yield into your own application, too.

Thursday, July 11, 2013

The triangle of performance?

I'm thinking a lot about the adoption of a common project management pattern into the world of web performance. The iron triangle, about which Jeff Atwood already wrote back in 2006, is something

we all know about from our jobs. Now my question is, if there are other areas of our IT world, where a triangle choice pattern can be found. One, rather unfinished, example I found in the perspective of performance is the relation between data liveness, the response time and the load of the server. The relation is not perfect, as the "Server load" - "Liveness" does not fully oppose the "Response time", but it might be a start for a discussion, if there are such patterns and where they reside. Do you have other examples? I'd like to hear them.

Monday, June 10, 2013

using facebook's React with require.js

I recently stumbled upon facebook's React library, which is a Javascript library for building reusable frontend components. Even if this lib is only at version 0.3.x it behaves very stable, it is fast and is fun to code. I'm a big fan of require.js, so I tried to use React within the require.js eco system. It was not as hard as expected and here are some examples and some thoughts about it.

This is a menu component I use for a twitter bootstrap based project. The eventbus is based on EventEmitter2.

The fact I like most about using require.js together with React is, that you can write a component in one require.js module definition and just export the visible component part. The second advantage is the naming. If you use JSX you can influence the tag names and do some kind of vendor prefixing for your required tags.

In the end, the construction of the menu looks like this:

update:
The way to get things working was quite easy. React is available via Bower, so just add it to your bower.json file:

Afterwards, add it to your require.js config. I decided to change my bower install path to /public/js/components, my application scripts reside within /public/js/app.

After settings things up like this, the previously mentioned examples work like a charm.

Thursday, May 16, 2013

Book review - PHPStorm Starter (Packt Instant)

The nice people at Packt recently asked me to review on of their new books from the INSTANT series, a focussed line of books concentrating on one single topic. Although PHPStorm can become a big one, this book keeps things straight.

I am talking about: Instant PhpStorm Starter

The target group for this book are beginners and I mean real beginners. After the typical "how to download" and "how to install" and The author creates a short introduction of about five steps to bootstrap the reader and get her/him ready to rock. This intro is quite OK, as far as you have never used anything else than notepad.exe before. Most of the stuff is very common and feels a bit like a transcript of the manual.

The chapter "Top features you need to know about" is mostly about code templates, file templates and shorthand operations in the editor. Those hints are useful and even a power user might feel surprised to find the one or the other little trick that she/he did not know about.

The part about the "High-level programming operations for the PHP language" sounds better than it (sadly) really is. Accessing the documentation, instance diagrams and the autoformatter do not count as high level programming functions for me. One thing I use every day in PHPStorm is missing: The function-, file- and symbol search

The last big chapter is about using a VCS with PHPStorm. This is another chapter that I feel unsure about. I am a very intense console user when it comes to VCS, esp. git. This makes it hard for me to judge if this part of the book is helpful for beginners. I think it might be, esp. if they come from the eclipse world. From my point of view, the VCS chapter should have been replaced with a chapter about the plugin architecture of PHPStorm, a careful selection of helpful plugins and some more "navigating through your code" hints. The hint for the structure- and hierarchy view is totally missing.

When I try to imagine the point of view of someone who comes e.g. from the eclipse world, I must say, that it would have been of my very interest, that PHPStorm can handle eclipse project files (this is because IntelliJ Idea, the Java IDE from JetBrains, is a commercial competitor of the eclipse Java Development Tools) very well and that I could use both at the same time without forcing me to choose one over the other.

Conclusion:
If you never used an IDE before you will learn a lot of new things from this book. If you are an experienced IDE user of any flavor, this book will show you some low-level tricks in PHPStorm, but the, from my personal point of view, important things like code navigation are missing.. If you used any other JetBrains IDE product before (like RubyMine, PyCharm, WebIDE, etc), most of this book is useless for you.

Monday, April 8, 2013

MySQL analysis with AnalyzeMySQL

I recently joined the scalability team at trivago and my first job was to get into the actual MySQL schema to discover which parts can be improved regarding performance and space consumption.

trivago's database consists of about 230 tables, many of them having a history of more than four years, some of them up to 7 years. A few of them do have some historic flaws, like having a signed bigint as primary key (because nobody could imagine what space will be needed in a few years), others do have indexes that are not longer used, etc. There are many tasks to do until you get to know the database scheme very well.

In order to support my work I started to write a litte tool, which was meant to perform some basic tasks on the scheme. The first one was to see what col definitions are much bigger than the actual maximum value it keeps.

What came out of this approach was a small framework-ish script collection that parses the table structures and gives programatic access to it. The framework is designed to use a plugin structure to achieve a simple extensibility.

I'll try to write another post to explain the creation of such a plugin as soon as possible. In the meantime I would like to invite you to have a look at the (yet unfinished and unpolished) code and provide me some feedback.

You can find the code on github: https://github.com/xenji/AnalyzeMySQL

Saturday, March 30, 2013

Visualizing trivago's search traffic on a heatmap

While evaluating a message queue for trivago's architecture I wrote a litte demo to present RabbitMQ in one of our internal meetings. The demo should display live searches from the trivago platform and display them on a map using a heatmap layer. The demo should show that the MQ is able to cope the amount of traffic that would hit the queues and this type of technology solves some of our current near realtime problems.

As I was not allowed to touch the live systems code for it, the data needed to come from somewhere else. The accesslog seemed a proper source of information as our scribe logger delivers it in near-realtime. We use two different parameters to determine the location the visitor wants to search for, one is an integer, the other one is a list of integers. A "nice to have" feature is to show the visitors location. The scripts show this feature already, but the screen recording just shows the destinations. Log parsing? AWK to the rescue!

To resolve the IDs to a geo coordinate we did some pre-caching using Redis and a simple ID => geo construct.

The AWK script writes it's result to the STDOUT where a simple node.js scripts gets the content, resolves the geo data from Redis and passes the result to the RabbitMQ exchange. The script already shows the capability to use the geo-lite module and resolve the senders location via geoip lookup.

After sending it to the exchange, the messages are collected from the queue by a simple node.js server daemon. This daemon does only a simple job and pushes the messages from the RabbitMQ via broadcast to all connected websocket clients.

The frontend is as simple as the rest of the stack. On the client-side we use a bit of jQuery, the wonderful gmap3 library and socket.io.

The only limitation with this small toy is the limitation that the browser gives us. I've tried to raise the number of simultaneous data points displayed at the same time, but my MacBook Air (late 2012) went hot on more than 5000 points at the same time. This is related to the interaction with the event-list that controls the heatmap layer.

The average message rate in the video was between 300 and 600 messages per second but my screen recorder is just capable of 10fps, sorry ;).