Shop Mobile More Submit  Join Login

Recent Journal Exploits

Wed Jun 15, 2011, 11:01 AM by sgrahamUK:iconsgrahamuk:
Q. What happened?

Starting around 05:45 PST on Wednesday 15th June, we started to receive reports of "Journal Exploits" doing the rounds and we began to investigate them.

It quickly became apparent that someone had crafted a link behind the URL-shortening service that caused people who visited it to post a journal on deviantART.

The content of the journal was sometimes an insulting message directed towards the journal's owner.  Other times, it was a copy of the link itself, making it more likely that more people would see it and click on it, causing the exploit to propagate further.

We quickly deployed a temporary fix that blocked the URL at our outgoing link page, to prevent people from clicking on the link if they saw it in someone's journal. While this wasn't a comprehensive fix, it was one that we could deploy within 10 minutes of the matter coming to our attention, while we worked on a full solution.

The simple explanation of the problem being exploited is that the link was crafting a hidden form submit to your journal page. Because it was submitted by your web-browser, it appeared to be you submitting a journal via the normal process, and so would be posted under your name.

This is what's known as a "confused deputy" attack, and we have protection against it as part of our standard suite of security tools. Unfortunately the journal submission process wasn't utilizing it.

Once the problem was diagnosed we added the missing protection to the journal, tested and then deployed it approximately 2 hours after the exploit came to our attention.

We then began the process of cleaning up the journals that were added during the attack.

Q. How many people were affected?

The first journal to have been posted via this exploit appears to have been at 23:50 PST on Tuesday 14th June, some 5 hours before the matter was brought to our attention.

During the 7 hours the exploit was active, just over 2300 journals were posted via it.

Q. Should I change my password? Are my personal details at risk?

You don't need to change your password and your personal details were not exposed in any way.

At no point was your account integrity compromised or your password available to the attacker.

The nature of the vulnerability was strictly limited to being able to prefill a journal form and act as if you had clicked on "submit" within your browser.

The Faving spam counter-measures

Mon Jun 13, 2011, 3:07 AM by allixsenos:iconallixsenos:
The weekend before Memorial Day was an interesting one.

The TL;DR version:
  • A user was banned permanently for running an automated script against the site, faving the universe & more.
  • That kind of mass-faving just to draw attention to themselves provides zero value to the community and causes issues on the site.
  • To curb that behaviour, faving was put under the guard of the same anti-spam filter used throughout the site.
For a more detailed rundown of what happened, including information about the filtering that was implemented, you’ll have to read through the post. ;)

Site issues
At around 6am PST on Sunday, the dT team discovered an increasing amount of database issues affecting one of our servers which stores deviant data (deviations, favourites). On deviantART, all of the user-generated data is spread evenly across a number of servers, so seeing only one of them affected by unusually high load to the point of causing issues raised some eyebrows, especially considering 6am on a Sunday is usually a calm period.

Looking into that specific server, it was immediately noticed that sporadic issues started happening around 1am, reached a steady rate of 20 issues per second at 2am and have been steadily rising by another 20 per second with every hour. By 6am it had reached 100 issues per second.

One hundred issues per second directly translates to 100 failed page views per second. Not good.

The activity log for the problematic server revealed a long list of seemingly identical queries all related to a single deviant, piling on at a rate that could not have been caused by regular use of deviantART, no matter how quick you are with the mouse. The activity? Faving of what seemed to be damn near everything in sight.

The rate at which these requests were coming in told us without a doubt this was not a human-generated activity but a script or a bot running against deviantART, impersonating a deviant. Since affecting the site’s stability, especially in this way, is explicitly against our Terms of Service, the deviant generating these requests was banned, after which the issues immediately subsided and remained at a flat 0 for the remainder of the morning, as can be seen in the following image.

The banned deviant had accumulated almost half a million favourites in the short 2 months they have been a member, which comes out to an average of one fave every 10 seconds for 24 hours a day, every day since they’ve joined. This is not a rate a human being could sustain.

Introducing the velocity filter on Faving
Favouriting deviations was one of the last remaining user actions on deviantART that had no anti-spam measures implemented because, unlike commenting on deviations and deviant profiles, and posting in forums, it gives a very limited exposure to the person doing it, making it very difficult to find motivation to abuse it. Even our own realitysquared recently made a Journal in which he explained that there’s no such thing as “abusing Faves” or giving out too many, and asking deviants to stop reporting instances of that behaviour to helpdesk.

After the incident described above and after looking into the amounts of faves other deviants have accumulated, it was decided a limit should exist, but be set sufficiently high, so that no normal user could trigger it. To accomplish this, a velocity filter - the same mechanism already in place all over the website - was to be implemented for faving. The limits were adjusted for faving to be much more forgiving than for other activity, taking into account the reduced visibility of faves and the fact that it’s legitimately possible to hand out dozens in the time it takes to write a single comment. Our intention was to curb scripted mass-faving (or “favbombing” as the community has come to call it), while allowing near-unlimited faving by genuine deviants, within reason.

At around 4pm the same day, the velocity filter was implemented and the first reports from users hitting the limits started coming in. The limits were obviously set too low, and the affected users were very vocal about it. Taking that feedback, another round of discussion was had and the conclusion was to increase the limits to a value high enough that it would be theoretically impossible to hit even during a faving frenzy. After the changes were implemented, only a handful new complaints were registered, all from deviants with unusually high amounts of faves.

The weekend following the incident, almost a week after the initial rollout, we gathered some stats for this blog post, to give us a view of how many deviants were still hitting the limits. The stats were gathered from Friday night to Tuesday morning, and the velocity filter was hit a total of 30,900 times by 500 unique users. Of that, over 40% of the hits were produced by just 10 deviants, with one deviant in particular being responsible for 10% of the hits that weekend. The user was attempting to favourite at 300 times a minute (or 5 times a second) for several hours. The numbers here indicate the number of times a faving action was denied, once the user has reached the limit and continued trying to fave.

We are still seeing the occasional favbomb run; for example on Tuesday the 7th we saw a noticeable spike in velocity filter activity, caused by one or more deviants hitting the limits 8 times per second, as shown in the image below.

Since that unusual spike subsided, we’re seeing an average of fewer than 1 fave in 5 seconds being rejected, with obvious attempts at favbombing cropping up here and there and quickly giving up.

How much is too much?
It’s important to note that the velocity filter, which as mentioned is the same mechanism providing one of many anti-spam measures to all of dA including comments, devwatch and notes, is not a single number or a “this many in this much time” thing. It’s a special sauce that includes some fancy maths, cooked up by dT’s smartest. Because of this, there is no single answer to the question that will surely come up - “how many faves can I give out?”. There’s still no limit to the absolute number of faves a user can give out and we’re not looking to change that.

The limits are meant to prevent server issues like the ones we experienced on Memorial Day weekend, but also to stop users from using faving purely as a method of drawing attention to themselves at a massive scale while providing zero value to the rest of the community. We hope you’ll agree these changes are for the good of everyone on the site.

- allixsenos & KnightAR

We're all remote

Mon Mar 21, 2011, 10:09 AM by kemayo:iconkemayo:
Many companies allow you to work from home, either full-time or a few days a week. Sometimes it's a privilege, granted to the top performers. Unfortunately this often leads to a situation where the "core team" is in a central office, and the remote developers are marginalized because they're not visible. In that case choosing to use your remote working benefits can be very bad for your career.

For us remote is just the way we work. The deviantART hq is in Los Angeles, and no developers live close enough to visit casually. I think the closest developer is in San Francisco. The nearest developer to me is about 250 miles away. Excepting the search team (who are based in Vancouver), to the best of my knowledge no two developers even live in the same city.

Now, this does come at some cost to us! We have to make an effort to keep our intra-team communication working, whereas a company with all their employees sitting next to each other gets a certain baseline for free.

How do we work?

We use Skype. A lot. The entire company communicates through a number of Skype chat rooms. Each team gets a chat, some topics get a chat, and all projects wind up with one. There's a trac install in use for issue tracking and wiki. Some teams prefer using Basecamp or other tools instead of trac, and they're free to do so.

As an earlier article mentioned all our scattered developers get a virtual machine running a functioning copy of deviantART to develop on, so they're not tied to development servers in a central office.

Agile development

We have a modified agile system in use, which has evolved over time to meet our needs, and works reasonably well for us. We tried standard agile methodologies, and found they were a bit awkward with a remote team. Pair programming or a scrum, for instance, just aren't the same without colocation.

Our system revolves around weekly iterations, of which we are on our 174th since starting this system.

Each week on Monday our teams demo their progress to each other and to their internal customers on one big demo call that everyone attends. The demos are short-and-sweet; 3-6 minutes is standard. The general format is "our expectations for this week were to do X; we did that / didn't do that; here's supporting information".

After the big demo we break off into team-specific calls where the lead developers and their customers work out the next iteration's expectations. Some teams will decide to iterate at a different rate for one reason or another; it's not uncommon to discover that you need to make a quick prototype or view the analytics on a quick change to decide what to do next.

Throughout the iteration it's up to the developers on a given project to decide how they communicate. Some projects wind up doing daily check-in Skype calls, others work entirely in text chats. Since we have developers in widely disparate time zones it's again up to the developers on a project to work out amongst themselves times when they're all available to exchange information; there's no standard "office hours" required by the company. (Though people do gravitate towards Pacific time, it seems...)

Project teams

A normal project consists of a customer, a lead developer, and 0-2 other developers. If it involves changing the site's UI then a member of the UI team is assigned to work with the project, because UI is hard. Similarly if it involves changes to our infrastructure then a member of the Tech Ops team may be attached.

"Lead developer" is a term which can mean almost anything in this industry. To us it means the developer whose job is to coordinate the work of other developers on the team, and to work out with the customer what the project's expectations are. The lead is still an active coder on the project; we've yet to have a project with enough overhead that a pure manager is required.

The "customer" can be almost anyone. We pick someone in the company who we think can represent the needs of the project. So on a purely technical project it might be someone in dt, for a new feature it might be the product lead who championed the feature, for changes to our print store it'd be someone in retail, etc.

Team size is flexible, and depends on the project. Some teams do wind up with just a lead developer, if the expected workload is low. However, we prefer to put more than that on a team to make sure that several people will go over the code produced.


We do have one long-running project team called Reactor. Its role is to fix bugs that aren't related to other active projects, and to implement features that aren't large enough to warrant spinning up a separate project. It's also where we assign all new hires initially, so its lead developer gets to mentor them and introduce them to our codebase. Reactor makes a good training ground since it's guaranteed to drag developers through disparate areas of deviantART.

Face time

Working remotely works well for us, but we like to supplement it with occasional face-to-face meetings to help make sure that everyone knows everyone else. When collaborating over Skype it's easy to just never interact with someone whose work isn't related to your area, after all.

Thus every year for the holidays deviantART flies everyone in to hq for our holiday party. We spend a week working a little and socializing a lot, to cement our sense of "team".

Particularly complicated projects may also involve getting everyone involved together in one place while the general shape of the project-to-come is hammered out. For instance, before the groups project launched in late 2008, the whole team spent a week in Canada debating what groups should be, and wrote the first prototype of the new profile page widget system while they were up there.

Better ways

A pseudo-regular activity in our off-topic developer chat is trying to find a better alternative to Skype for the text-chat portions of our communications. (Skype chat is great if you like the Skype client; if you don't then you're stuck with it anyway.) We've run through a number of options, but it's tough finding one which is:

  • non-technical enough for our non-developers
  • enough better than Skype to make it worth the hassle of switching everyone over

IRC is a common contender, but tends to fail on the latter point.

How have similar systems worked for you? Do you have a better pure-remote development methodology that we've not heard of?

I recently worked on a project that tried to improve the performance of DeviantArt muro.  As part of this project I did a lot of testing and benchmarking of various HTML5 operations.  I learned a lot about where one must be careful when writing web applications that use HTML5’s <canvas> element.  The following is a diary of sorts that I made while working on the project.  Of course your milage may vary depending on the setup of your application, but as you will see, DeviantArt muro realized significant performance improvements when I applied the lessons I learned.  


For those who don't want to read this whole long and rambling article, here are the main rules to live by as suggested by my testing:

:bulletblue: Reduce direct pixel manipulation as much as possible.  Use the line drawing API when possible, and when you must sample pixels get as few as possible.  

:bulletblue: Rendering shadows, especially ones with a high blur, will greatly reduce your performance.  You can draw quite a few shadowless lines in the time it takes to draw a single one with a soft shadow.

:bulletblue: Bundle as many lines as you can into a single call to stroke() as possible.

The Setup

Since I was interested in a scenario similar to what DeviantArt muro will often see, I ran all tests on a canvas that was 1200px wide and 500px tall.  This would be the approximate size of the drawing area if a user with an average sized monitor maximized their browser window.  All tests were run on my laptop (2009 Macbook Pro with 3.06GHz Intel Core 2 Duo Processor, 8GB RAM, and Intel X25-M Harddrive) with minimal other apps running (Terminal, vi, and standard background tasks).  The browsers that I tested on were: Firefox 3.6.13, Safari 5.0.3 (6533.19.4), and Google Chrome 8.0.552.237.  Late in the game a colleague asked me how Firefox 4 beta compared, so I re-ran some of the tests using Firefox 4.0b10.  

Some people will surely ask how the Internet Explorer 9 release candidate stacks up with the rest of the browsers.  I apologize that I did not run the tests using IE9 because I did not have a windows machine handy, and did not feel it would be accurate to run these tests on a virtual machine.

For all the tests I would time how long it took to run a small section of code a bunch of times, and then subtract the time needed to run the same code without the critical bit I was testing.  This would mean that little costs of doing things to prepare the test would not contribute to the time measured by the test itself.  The results shown here are averages of running the tests several hundred times each.  Though I did not calculate standard deviations for each result, I kept an eye on the time distributions to make sure that they did not change too much from test to test.  

These tests were meant to just give me a ballpark idea of what is important and what is not, they were not intended to be official benchmarks or the basis of an academic paper analyzing the algorithms various browsers are using, so please take the results with that grain of salt.  A browser's javascript performance depends on a wide variety of factors that were not part of these tests.  To judge a browser based on their results would be mis-guided.

Drawing Lines

The first thing I tested was basic line drawing: moveTo() a random location on the canvas, lineTo() a different random location.  I did the test once using a single fully opaque color, and again using various random colors and opacities.  This was meant to give me an idea for how much penalty one pays for making a canvas calculate more intricate blending.  I also did the same tests using quadraticCurveTo() and bezierCurveTo() so I could see how much more expensive it is to draw with smooth lines.  Of course, if you are using one of those functions in your app you will also have the overhead of having to calculate the proper control points to use.

There is not much that is surprising here.  The four browsers that were tested performed pretty similarly.  Using more mathematically complex curves comes at a cost.

Next I wanted to see if it makes a difference how often one calls stroke() when drawing with lines.  I ran tests where I compared calling stroke after each line segment was drawn and where I drew a number of line segments and then stroked them all at once.  As can be seen by the approximately linear graph, stroke() takes close to the same amount of time each time it is called.  If you can draw 50 line segments before calling stroke(), you can save in the neighborhood of 20% of the cost of drawing.


Shadows are a really useful tool, not only for drawing actual shadows, but for anything that needs a nice soft edge.  However, that soft edge comes at a really high price.  For a while now, deviantART has realized that WebKit browsers struggle when we use a lot of shadows.  I was really curious to get a handle on just what was going on that made Gecko and WebKit browsers behave so differently.  When one times how long it takes to draw straight lines with various amounts of shadowBlur, a really interesting graph appears.  WebKit browsers can draw small shadows quickly, but as shadowBlur increases their rendering time increases slightly worse than linearly.  Firefox, on the other hand, renders shadows at near constant speed.  If the shadows do not have much blur it is slower than the WebKit browsers, but when shadowBlur gets up to 100 it is four times faster.

Interestingly, Firefox 4 beta now has performance closer to that of the WebKit browsers.  The shadows of the same blur in Firefox 4 are also a lot wider than they were previously (Firefox 3 has always had smaller shadows than WebKit).  I do not know the details, but it would seem that the canvas spec must be settling on a softer, but more computationally intensive shadow as its reference.

Buffer Copying

Earlier profiling that I did showed that the worst performance issues in DeviantArt muro came from having to move buffers around at an inopportune time.  Any complex graphics app is going to have to store and/or move image data around, and I was really curious about what the best way to do this is.

There are a number of different ways to get at the data that is on a canvas.  One can use drawImage() to copy the contents of one canvas to another canvas.  You can get the contents of a canvas as a base64 encoded PNG by using toDataUrl().  You can also get essentially an array of pixels using getImageData().  As can be seen below, the toDataUrl() method is the clear loser; apparently the cost of encoding the data is pretty high.  Which of the other two methods to use is a little less clear until Firefox 4 usage is widespread.  As can be seen, Firefox 3 has some problems getting and putting image data quickly, but WebKit browsers are much faster at that than using another canvas element as a buffer.

An advantage of using getImageData() is that it can sample a portion of the canvas.  I did the same getImageData() test, but sampled squares of increasingly larger size, and for all four browsers tested, getImageData() had close to constant speed per pixel sampled.  Before I had thought that the overhead of getting any pixels would be large, so I would sometimes sample more than I needed if I thought there was information that I would be needing at a later point.  As this graph shows though, it is better to grab only what you need, because you do not pay a noticeable penalty for sampling a second time down the road.

Applying to dA muro

While all this data is somewhat interesting, one must wonder how much the knowledge helps in performance tuning a web application.  Of course everybody’s milage will vary on this, I am sure that there are programmers out there who have a much better intuition for speed optimizations than I do.  They would have written faster code right than I from the get go.  However, I think that the code I started from was probably fairly average as to what one might expect from an experienced coder who was fairly new to HTML5 and used all of the available API’s in interest of making simple and straightforward code in preference over premature optimizations.

The main lessons I learned is that any kind of getImageData() call or canvas copy should be avoided at all costs,  delayed until a “down time” if they cannot be avoided, and if all else fails great care should be used to sample only the pixels that you absolutely need.  It is alright to call lineTo() many more times if it means you can avoid a call to getImageData().

The first place that I tried to optimize was measured by a test that simulates a user making a bunch of short strokes relatively close to one another.  An artist would typically do this if they were cross hatching, stippling, or applying a Van Gogh-esque texture to their drawing.  A lot of the changes that I made are particular to the internals of DeviantArt muro, and a description would not make sense to somebody unfamiliar with our codebase.  However, I will describe two of the optimizations.  When a user draws a line, the new line needs to be pushed into an undo buffer, and it also needs to be reflected in the zoom “navigator” panel that is in the corner of a screen.  These two tasks can not avoid some kind of buffer copying, but smarter buffer copying led to some noticeable speed improvements as can be seen below.  Note that Firefox 4 is not shown in the first three sets of results because I did not start testing it until later (and I did not feel like re-coding all my inefficiencies just to see how much it improved).

The next place I turned my attention was individual brushes that were taking longer than they needed to.  In most cases this came from unnecessary calls to clearRect() or fillRect() (these calls perform similarly to putImageData() calls).  The bulk of deviantART muro’s brushes are now quite a bit faster.  Once again, Firefox 4 is not in these results because I did not benchmark it at the beginning of the project.

Web Workers

Next I looked at the filters in DeviantArt muro.  To give context to why filters are slow, one must understand that an imageData object consists mostly of an array of pixel values with R, G, B, and A values stored separately.  Thus, the imageData array has width*height*4 elements.  Most filters need to look at the data surrounding a pixel in order to determine the new value for a pixel.  Let’s say that the filter looks in a radius of 3 pixels (so a square of 7 pixels to  a side), that means it needs to know the value of 196 array entries in order to color a single pixel.  Javascript array lookups are not particularly fast, so applying a filter to a large canvas can be painfully slow.

I figured that the good news about the filter problem was that it is something that can be easily parallelized. Most modern computers have at least 2 processor cores, so it is a shame to leave one of those idle while a single browser UI thread is churning away.  So, I prototyped a change that split the canvas into several chunks and passed the filtering off to some web worker threads.

The first problem I ran into is that web workers do not have a concept of shared memory.  Data passed to and from them must go through calls to postMessage().  An article on the Mozilla blog indicated that internally these messages are passed as JSON strings.  This is a problem for a task like filters that are operating quickly on a very large data set.  The cost of JSON encoding is not small compared to the cost of the actual computation.  Note also that in WebKit browsers you cannot assign an array reference into an imageData’s data, so you have to pay the penalty of doing a memory copy from the JSON decoded array into an imageData object.

The results of my experiment were mixed.  Firefox 3 was quite slow before the change, and sped up by a factor of 3 when it was parallelized.  Safari, on the other hand, spent a long time churning before the threads even started executing (I cannot be sure, but I suspect that this was while the JSON encoding was happening), and then for some unknown reason the multiple threads each took a lot longer than the single UI thread.  Chrome’s threads ran very quickly at first, but then it sat for a little while before returning the data to the UI thread.

I spent a little bit of time trying to debug these issues, but eventually gave up.  From my experiences I would say that web workers are a cool technology that will be really useful someday, but at the moment some browsers are not quite ready for this particular use case.

Below you can see the CPU utilization of the two cores of my machine when the filtering code is running in a single thread vs when it is running in web worker threads.

This article assumes that you are familiar with the concepts of MVC programming in the context of web development. If you are not, wikipedia has a decent overview in their entry about it.

The problem

An issue that MVC doesn’t address directly is how to efficiently group data access calls together across scopes. By “grouping data access calls”, I mean going from something like this:

To something like that:

When following MVC strictly, data access must happen in controllers. If two controllers need similar data from the same database table, in most MVC implementations the separation of code between controllers won’t even make it possible to group these similar data needs together.

Better MVC implementations manage to identify these common data needs across controllers and group the calls. For instance a meta-controller could iterate through all the controllers of a page and ask them what their data needs are. The meta-controller can then efficiently group the data access calls together, reducing the round trips to the data stores.

We already follow that strategy on modularized pages of deviantART, like user profiles, where each widget a user can install and configure (friends list, favourite deviations, featured artwork, etc.) has its own controller. The profile page itself has a meta-controller that coordinates data access for the controllers of all widget instances. If two widgets on a user profile need complimentary data about the same artwork, the call will be combined into a single memcache get or database query.

That strategy is only possible because we know which widgets are on the page very early in the page generation, and consequently, which controllers will be involved. There are many situations where knowing that information in advance is impossible, or requires rewriting the code so much around this specific need that a massive layer of complexity is added for the sake of saving a few round trips to the database. It’s the age-old dilemma of maintainable easy-to-follow code versus less legible code optimized for data efficiency.

Our solution

Faced with this cross-controller data access grouping issue once again, we tried to push the boundaries imposed by MVC with a new approach. This time, instead of letting data access optimizations dictate the structure of our code, we decided that data access would get out of the way and be dealt with as late as possible in the page generation. The idea here is that sometimes you can’t tell what your data needs are until the logic in your controllers has run, or after you’ve already fetched some other data first.

The core of what allows us to perform that trick is something we call “datex”. Datex is a stream of intertwined DOM output and PHP objects. Instead of echoing, the views of our MVC code return datex. Whenever a piece of output is dependent on some data that can be grouped (for instance, basic user data), we “output” a PHP object in the datex stream. That object contains information about the data dependency and which partial view will render the DOM once the data needed is fetched. This way we make the data-dependent parts of our MVC code asynchronous.

As it’s being built, our page’s output looks like the following, a mix of DOM that could be rendered right away, and PHP objects that contain information about partial views that will be rendered later.

Then, once all the MVC code (in the traditional sense of the term) has run, we resolve this datex stream in passes. With each pass we can very easily group similar data access needs together.

Some objects are fully resolved on the first pass and return DOM, others return more datex with new objects in them. These are the situations that were very hard to optimize in more classic MVC implementations, when controllers that have received their initial data then require another round of data access.

After an initial pass, if we still have DOM and objects in the datex stream, we go through the same process again, and once more we can group similar data access together for efficiency. This process can go on for a few passes, until all PHP objects have eventually resolved to pure DOM. When the datex stream only contains DOM, we know that we can output the results, as they are the fully constructed web page.

Going from datex to fully resolved DOM

An example of a real-world usage is thumbnails on deviantART. Each thumbnail requires a memcache get. By grouping all thumbnails output on the page and treating them at once, we get all the data we need for rendering these thumbnails in a single memcache multi_get call.

Of course, that’s possible with template-based MVC, but it forces you to structure all your code around the fact that you need to know exactly which thumbnails will be on the page at the beginning of the page generation. With our system, you don’t need to know where thumbnails will be needed, any code deep inside a view can request a thumbnail at any time, by simply adding a thumbnail object to the datex stream. The system guarantees that if the data access needed for that thumbnail can be grouped with a similar one originating from a completely separate view, far in scope, it will be.

Another advantage this technique has over templates/string replacement is that our PHP objects in datex can have declared dependencies between each other. We can also prioritize them regardless of their order in the DOM. If the ad banner at the top of the page depends on content at the very bottom of the page to be resolved into DOM first, this can be provided automatically, without having to structure the view code any differently than the logical output order of the page. It allows us to bring that logical top-to-bottom DOM order back into our views, where in the past we had the data logic take over and shuffle the order based on data optimizing needs and interdependencies. With datex you can read the views’ code and follow the order of the final output, even when the actual resolution of the objects might happen in a completely different order, optimized for data.

Additional benefits

So far we’ve only scratched the surface of what datex allows us to do. It’s a new way of thinking about building the page output. As such, it requires a new mindset to identify where and what it could be used for.

For a while we’ve bundled AJAX requests that resulted in rapid user actions together, in order to reduce the impact of network lag on these small requests. The backend of these independent AJAX calls can now seamlessly have their data access grouped, and this happens across completely separate scopes. Each AJAX request remains completely independent and isolated, but simply by virtue of happening near the same time, datex allows us to group similar data fetching calls that happen in them. In a similar fashion, one could design a very service-oriented architecture and have the services return datex.

Another very powerful aspect is the ability to prioritize resolutions of some objects over others in the datex stream. When working with asynchronous data stores, one can decide to resolve the objects depending on these asynchronous calls first, then spend time resolving other objects that depend on synchronous data stores; therefore making better use of time that might have been spent waiting for asynchronous data to come back.


One limitation people might see in this technique is that our output can’t be string-manipulated anymore as it’s being constructed. Since datex is a mix of strings and PHP objects, it’s impossible to do preg_replace() on it, for example. I believe that this limitation is in fact a great strength, because when developers start transforming DOM partials with string manipulations in PHP, they’re often trying to solve a frontend CSS/JS problem with backend PHP, or are just being lazy and prefer to do a preg_replace rather than adapt the existing code to support the new need. By forcing developers to avoid these patterns of string manipulation over partial views, we get cleaner code where content-dependent DOM transformation is forced to be in CSS/JS scope or to be rewritten to avoid any string replacement.


Since we didn’t want to heavily patch PHP to achieve this, we handle datex with a homegrown PHP library. Most of the functions in our code that produce DOM output thus return datex instead of echoing. We merge the datex streams into each other as the page is constructed, resulting in a tree of datex. At the very end of page processing, we resolve all the objects found in the datex tree in passes until we’re left with pure DOM, and we output it.

It’s possible that this technique already exists in other languages or frameworks; we just couldn’t find anything similar when we came up with the idea. It would certainly be more efficient and lighter in syntax if this technique were supported at the language level.

If you know of techniques or technologies that tackle the same issue of grouping data access calls across scopes elegantly, please share, we’re always interested in comparing our experience with others. And if there isn’t anything like it, we hope that sharing this technique will be useful to others who face the same problems.

Edit: I've provided useful complimentary information in response to questions on Hacker News about this article.

Recent Journal Entries

We're Hiring Developers

We're looking for talented web developers to join our team! :la: Interested? Check out…

Journal Writers