Mixing DOM and PHP objects for fun and profit

Feb 3, 201110 min read

Deviation Actions

Published: Feb 3, 2011

9.1K Views

This article assumes that you are familiar with the concepts of MVC programming in the context of web development. If you are not, wikipedia has a decent overview in their entry about it.

The problem

An issue that MVC doesn’t address directly is how to efficiently group data access calls together across scopes. By “grouping data access calls”, I mean going from something like this:

To something like that:

When following MVC strictly, data access must happen in controllers. If two controllers need similar data from the same database table, in most MVC implementations the separation of code between controllers won’t even make it possible to group these similar data needs together.

Better MVC implementations manage to identify these common data needs across controllers and group the calls. For instance a meta-controller could iterate through all the controllers of a page and ask them what their data needs are. The meta-controller can then efficiently group the data access calls together, reducing the round trips to the data stores.

We already follow that strategy on modularized pages of deviantART, like user profiles, where each widget a user can install and configure (friends list, favourite deviations, featured artwork, etc.) has its own controller. The profile page itself has a meta-controller that coordinates data access for the controllers of all widget instances. If two widgets on a user profile need complimentary data about the same artwork, the call will be combined into a single memcache get or database query.

That strategy is only possible because we know which widgets are on the page very early in the page generation, and consequently, which controllers will be involved. There are many situations where knowing that information in advance is impossible, or requires rewriting the code so much around this specific need that a massive layer of complexity is added for the sake of saving a few round trips to the database. It’s the age-old dilemma of maintainable easy-to-follow code versus less legible code optimized for data efficiency.

Our solution

Faced with this cross-controller data access grouping issue once again, we tried to push the boundaries imposed by MVC with a new approach. This time, instead of letting data access optimizations dictate the structure of our code, we decided that data access would get out of the way and be dealt with as late as possible in the page generation. The idea here is that sometimes you can’t tell what your data needs are until the logic in your controllers has run, or after you’ve already fetched some other data first.

The core of what allows us to perform that trick is something we call “datex”. Datex is a stream of intertwined DOM output and PHP objects. Instead of echoing, the views of our MVC code return datex. Whenever a piece of output is dependent on some data that can be grouped (for instance, basic user data), we “output” a PHP object in the datex stream. That object contains information about the data dependency and which partial view will render the DOM once the data needed is fetched. This way we make the data-dependent parts of our MVC code asynchronous.

As it’s being built, our page’s output looks like the following, a mix of DOM that could be rendered right away, and PHP objects that contain information about partial views that will be rendered later.

Then, once all the MVC code (in the traditional sense of the term) has run, we resolve this datex stream in passes. With each pass we can very easily group similar data access needs together.

Some objects are fully resolved on the first pass and return DOM, others return more datex with new objects in them. These are the situations that were very hard to optimize in more classic MVC implementations, when controllers that have received their initial data then require another round of data access.

After an initial pass, if we still have DOM and objects in the datex stream, we go through the same process again, and once more we can group similar data access together for efficiency. This process can go on for a few passes, until all PHP objects have eventually resolved to pure DOM. When the datex stream only contains DOM, we know that we can output the results, as they are the fully constructed web page.

Going from datex to fully resolved DOM

An example of a real-world usage is thumbnails on deviantART. Each thumbnail requires a memcache get. By grouping all thumbnails output on the page and treating them at once, we get all the data we need for rendering these thumbnails in a single memcache multi_get call.

Of course, that’s possible with template-based MVC, but it forces you to structure all your code around the fact that you need to know exactly which thumbnails will be on the page at the beginning of the page generation. With our system, you don’t need to know where thumbnails will be needed, any code deep inside a view can request a thumbnail at any time, by simply adding a thumbnail object to the datex stream. The system guarantees that if the data access needed for that thumbnail can be grouped with a similar one originating from a completely separate view, far in scope, it will be.

Another advantage this technique has over templates/string replacement is that our PHP objects in datex can have declared dependencies between each other. We can also prioritize them regardless of their order in the DOM. If the ad banner at the top of the page depends on content at the very bottom of the page to be resolved into DOM first, this can be provided automatically, without having to structure the view code any differently than the logical output order of the page. It allows us to bring that logical top-to-bottom DOM order back into our views, where in the past we had the data logic take over and shuffle the order based on data optimizing needs and interdependencies. With datex you can read the views’ code and follow the order of the final output, even when the actual resolution of the objects might happen in a completely different order, optimized for data.

Additional benefits

So far we’ve only scratched the surface of what datex allows us to do. It’s a new way of thinking about building the page output. As such, it requires a new mindset to identify where and what it could be used for.

For a while we’ve bundled AJAX requests that resulted in rapid user actions together, in order to reduce the impact of network lag on these small requests. The backend of these independent AJAX calls can now seamlessly have their data access grouped, and this happens across completely separate scopes. Each AJAX request remains completely independent and isolated, but simply by virtue of happening near the same time, datex allows us to group similar data fetching calls that happen in them. In a similar fashion, one could design a very service-oriented architecture and have the services return datex.

Another very powerful aspect is the ability to prioritize resolutions of some objects over others in the datex stream. When working with asynchronous data stores, one can decide to resolve the objects depending on these asynchronous calls first, then spend time resolving other objects that depend on synchronous data stores; therefore making better use of time that might have been spent waiting for asynchronous data to come back.

Limitations?

One limitation people might see in this technique is that our output can’t be string-manipulated anymore as it’s being constructed. Since datex is a mix of strings and PHP objects, it’s impossible to do preg_replace() on it, for example. I believe that this limitation is in fact a great strength, because when developers start transforming DOM partials with string manipulations in PHP, they’re often trying to solve a frontend CSS/JS problem with backend PHP, or are just being lazy and prefer to do a preg_replace rather than adapt the existing code to support the new need. By forcing developers to avoid these patterns of string manipulation over partial views, we get cleaner code where content-dependent DOM transformation is forced to be in CSS/JS scope or to be rewritten to avoid any string replacement.

Implementation

Since we didn’t want to heavily patch PHP to achieve this, we handle datex with a homegrown PHP library. Most of the functions in our code that produce DOM output thus return datex instead of echoing. We merge the datex streams into each other as the page is constructed, resulting in a tree of datex. At the very end of page processing, we resolve all the objects found in the datex tree in passes until we’re left with pure DOM, and we output it.

It’s possible that this technique already exists in other languages or frameworks; we just couldn’t find anything similar when we came up with the idea. It would certainly be more efficient and lighter in syntax if this technique were supported at the language level.

If you know of techniques or technologies that tackle the same issue of grouping data access calls across scopes elegantly, please share, we’re always interested in comparing our experience with others. And if there isn’t anything like it, we hope that sharing this technique will be useful to others who face the same problems.

Edit: I've provided useful complimentary information in response to questions on Hacker News about this article.

Like the idea of working on stuff like this? Check out the open developer positions at deviantART.