Shop Mobile More Submit  Join Login

At deviantART, every developer gets their own virtual machine. Our VMs are our private laboratories. In them we work, test, and experiment, with no fear of harming the site.



The VM is also the most significant change we've made to our deployment chain in the past 6 years.






How Does a Virtual Machine Improve Development?
Fewer Commits

If you have to commit to the repository on a staging server every time you need to test a change, you end up with a lot of commits. Not only does this take time, but it creates a lot of noise. Before the VM, a single change could result in as many as 20 commit messages. We found it very difficult to keep track of what was changing on the site. Now, developers tend to commit code in single logical chunks with meaningful messages.



Reduced Contention for the Staging Server

Staging acts as a final check for code that's already been tested on a VM. It uses the production databases and other production daemons, and it gives us a good idea of what performance will be like. But it's a shared resource, and when it's locked it can very quickly prevent other developers from getting work done. Fewer, more well-tested commits keep the staging server in a known-good state for longer periods of time.



Freedom to Experiment

Some code changes would be too tedious or dangerous to test on the staging server. Branches can help in some cases, but you also need to isolate or create copies of resources (such as the database), which can be impractical and time-consuming. Better to test the changes on a machine that doesn't have a connection to anything in production.



This isolation of the VM also helps developers learn new systems faster. Developers can break their VMs like kids breaking apart their parents' expensive electronics to figure out how they work. The worst that can happen is that the developer has to build a new VM.



How the VM is Made

Almost all deviantART servers run off of a common base netboot image. We took this netboot image, converted it to a VMWare disk image, and added a boot loader. Then, we configured it to download and install the same binary packages used by the live servers. Most packages require some conditional tweaks in their configuration files, but they're all relatively straightforward.



Creating a new VM is mostly automatic. A developer downloads a disk image, adds it to VMWare, and then runs a setup script from within the VM. The script downloads the correct versions of the necessary binaries, configures them, and starts any associated daemons. All that's left for the developer is to mount the source code directory on their host via SSHFS.



What to Do About Databases

Accessing the production databases from the VM is out of the question. The VM cannot be allowed to make changes to any production resource. Additionally, the added latency would make testing on the VM much more tedious. For example, if a page on production runs 45 queries in 30ms, the same page on the VM would take no less than 2 seconds to run the same queries (assuming 50ms of round-trip latency). Instead, we have to bring a copy of the databases to the VM.



Imagine a parallel universe in which deviantART only has a handful of members. That's essentially what the VM is. We take more than 2TB of database data and extract the approximately 50MB used by just the developers' accounts.



We wrote a program that pulls this data from the production database servers nightly. The program is driven by a configuration file editable by developers. The file begins with a list of usernames; only data on these users will be pulled. To protect users' data, we only add developer usernames. Next, the file defines how various data is related (sort of like foreign key relations, but not quite). These relationships are used to define the subsets of data required to be pulled from each of the database tables.




As the program runs, it outputs a .sql (text) file resembling the output of mysqldump. When complete, it places the file on a server and rotates the previous files as backups. At any time, a developer can run a command to automatically update their VM's database with the latest data (or optionally roll it back to some point in the past).



What to Do with Resources

Even after reducing the database to 50MB, the associated image files would still be too large to download and store locally. Luckily, we don't need to. Instead, requests for images and other non-CSS/JS files are proxied (transparently) by Apache to the live site. For us, this is simplified by the fact that these files are hosted on distinct subdomains, but you can achieve the same with any regular directory structure.



Sometimes, though, you don't want the request to be proxied. For example, when testing uploads and submissions, we don't want to have to upload the test file to the site itself. Instead, we upload it to the VM. Apache, instead of proxying the request immediately, checks to see if the file exists locally first and serves it from the VM filesystem if it does.



Hosting the VM on a Separate Domain

Originally, we edited our /etc/hosts file to point deviantart.com to the VM when we wanted to work with it. But doing so was quite a hassle, especially considering you had to switch browsers or use a cookie switching plugin to avoid cookie collisions. Also, it wasn't always clear if you were browsing the VM or the live site. So we decided to move the VM to its own domain: deviantart.lan.



deviantART is unusual in that it has millions of subdomains: 1 for each registered member, as well as a collection of reserved subdomains. Enumerating all the possible subdomains in /etc/hosts would be tedious and error prone. So instead, we setup an instance of tinydns on the VM to answer requests for *.deviantart.lan. Tinydns can be configured to then proxy DNS requests for domains it's not responsible for to another DNS server. Or, if you're using OS X you can create a file in /etc/resolver to tell OS X to pass any (and only) requests for deviantart.lan domains to the VM.



Avoiding Code Changes

Making the VM respond to deviantart.lan posed a new problem. All of our code was written with the assumption that we're responding to deviantart.com. We first tried removing all literal occurrances of the string "deviantart.com" from the code and replacing it with a constant/global variable reference (which could vary between the VM and production). But there were too many places in the code to change, and it would be easy to forget and commit a literal "deviantart.com" later.



We decided to move back to solve the problem: we'd translate the name at the HTTP level. We looked first at using an Apache directive or extension. Unfortunately, none handled rewriting both the body and the headers (necessary for catching cookies and redirects) in both directions. So we wrote a daemon: tcpbf.



tcpbf

tcpbf is our "TCP Bi-directional Filter". It's a simple program that allows us to make regexp-based replacements in the requests to and responses from the VM. For example, we change "deviantart.lan" to "deviantart.com" on the way in and vice versa on the way out. Additionally, we strip any Accept-Encoding header on requests to prevent the response coming back as gzipped data (which we couldn't do text replacement on).




This explains why we chose ".lan" as the new TLD for the virtual deviantART: keeping it to 3 letters keeps the Content-Length of the response correct. If we didn't, we'd have a mismatch that would break browsers that use chunked transfer encoding, like Chrome and IE.



To handle SSL traffic, we have 2 copies of stunnel running with tcpbf sitting between them. That way both the browser and Apache see SSL packets while tcpbf sees cleartext.



Getting to 100% Accurate Emulation

The VM isn't perfect. Sometimes it's missing data. Occassionally changes are made on production that aren't compatible with it. Sometimes we hit an edge case on the VM we wouldn't have hit on the productions servers due to a difference in resource limits, and vice versa. Luckily, the emulation doesn't have to be 100% accurate to produce benefits. The most important thing we did was make it good enough that developers wanted to use it. Ever since everyone has adopted it, it's been improving naturally over time.




Add a Comment:
 
:iconmwhh73:
mwhh73 Featured By Owner Dec 3, 2010
I used to use a very similar setup with VirtualBox, but for now I'm back on the raw iron until I've solved the performance problems with this setup.

Are you guys working with IDEs like Netbeans or Eclipse on the host system? If so, how do you cope with the extreme slowness of SSHFS? I assume, you have a pretty large codebase, and for features like code completion and the likes the IDE needs to scan all project files to update its database. On a large project, this can take a lot of time (and at least on MacOS X, the SSHFS also eats a *lot* of CPU power in the process).

I experimented with other options, like having the code on the host OS and mounting it from inside the VM via Samba, but while that improves IDE responsiveness significantly, it opens an entirely different can of worms (say, file permissions or the host OS having a case-insensitive filesystem while the Linux guest doesn't...).

What's your take on this?
Reply
:iconmwhh73:
mwhh73 Featured By Owner Dec 3, 2010
I used to use a very similar setup with VirtualBox, but for now I'm back on the raw iron until I've solved the performance problems with this setup.

Are you guys working with IDEs like Netbeans or Eclipse on the host system? If so, how do you cope with the extreme slowness of SSHFS? I assume, you have a pretty large codebase, and for features like code completion and the likes the IDE needs to scan all project files to update its database. On a large project, this can take a lot of time (and at least on MacOS X, the SSHFS also eats a *lot* of CPU power in the process).

I experimented with other options, like having the code on the host OS and mounting it from inside the VM via Samba, but while that improves IDE responsiveness significantly, it opens an entirely different can of worms (say, file permissions or the host OS having a case-insensitive filesystem while the Linux guest doesn't...).

What's your take on this?
Reply
:iconsparklum:
SparkLum Featured By Owner Oct 14, 2010
I actually found myself reading that |D

Nice setup you guys have for that :eyes:
Reply
:iconseiscuatro:
seiscuatro Featured By Owner Oct 9, 2010
Hi! Is that tcpbf publicly available? It would be great for some similar concerns that we have.
Reply
:iconjekor:
jekor Featured By Owner Oct 9, 2010
Not currently, but maybe we'll make it available now that we know there's some interest.
Reply
:iconrehael:
rehael Featured By Owner Oct 9, 2010  Hobbyist Photographer
:w00t: Kudos to you for sharing the technical details. I love reading stuff like that. And seeing that you share more and more of it makes me… :aww:

I wonder — do you plan releasing some of your tools (e.g. tcpbf dæmon) as an OpenSource projects, following the herd?
Reply
:iconjekor:
jekor Featured By Owner Oct 9, 2010
I hope so. We'd like to share more with the broader development community, but maintaining and improving the site for the deviantART community is our top priority. We'll see what happens.
Reply
:iconmcdonaec:
mcdonaec Featured By Owner Oct 9, 2010
This is actually a lot like what Vagrant does with Virtualbox.
Reply
:iconjekor:
jekor Featured By Owner Oct 28, 2010
Neat. The Ruby community seems to be all over automated deployment. Thanks for the link.
Reply
:iconninjagrape:
NinjaGrape Featured By Owner Oct 9, 2010  Student General Artist
Hmm... Interesting! :D
Reply
:iconexe-q-tor:
exe-q-tor Featured By Owner Oct 8, 2010  Hobbyist General Artist
very very interesting, kudos!
Reply
:iconbranchewski:
Branchewski Featured By Owner Oct 8, 2010  Hobbyist Photographer
I love you guys. The Lone Gunman! :laughing:
Reply
:iconfourteenthstar:
fourteenthstar Featured By Owner Oct 8, 2010
I love it when you talk nerdy to me, $jekor :eyes:
Reply
:iconjekor:
jekor Featured By Owner Oct 8, 2010
Now you say so....
Reply
:iconzilla774:
zilla774 Featured By Owner Oct 8, 2010  Professional Interface Designer
mah humps, mah humps, my lovely lady database-dump.sql

:#1:
Reply
:iconyumeruby:
yumeruby Featured By Owner Oct 7, 2010  Hobbyist General Artist
This was very interesting to read!

We use VM's here too, though they are running in Hyper-V as we work in a Windows Server environment. We keep one VM for each component of the system, one for web servers, one for database, one for media streams, etc. We use the VM's to locally test drive the system as a whole before shipping it off to the client for acceptance testing. It is also good for installers, because we can just roll back to a fresh install before tests. We are a very small team, so locking is no problem yet, but your example is inspiring. :)
Reply
:iconjekor:
jekor Featured By Owner Oct 8, 2010
Neat. I did not realize just how widespread this is until I wrote about it, but it's good to see.
Reply
:icondeltawolf:
deltawolf Featured By Owner Oct 7, 2010
Nifty! I do something similar with my own rollouts. I didn't go as far as writing a daemon to rewrite HTTP headers though :D personally I just edited hosts for dev2.mysite.com and make all links relative, except for the CDN which doesn't really matter anyways.

Excellent though, I love this kind of article. :)
Reply
:iconjekor:
jekor Featured By Owner Oct 28, 2010
Hey, whatever works ;)
Reply
:iconepochwolf:
epochwolf Featured By Owner Oct 7, 2010
Very cool. I got to bring this idea up at work. Using sshsf to a shared dev server is getting beyond painful. :)
Reply
:iconjekor:
jekor Featured By Owner Oct 28, 2010
I feel your pain.
Reply
:iconoggyb:
oggyb Featured By Owner Oct 7, 2010
This is a great article. Not many internet companies would go into any detail at all about the way they work and what technology they use to get the job done.

Thanks for an interesting read :)
Reply
:iconazz-tech:
Azz-Tech Featured By Owner Jun 24, 2012  Hobbyist Photographer
yeah, that's right. at least with this, if anyone decides to apply for a position, its not all new to them. they know what they're up for and what they need to be able to use.
Reply
:iconswatkins:
swatkins Featured By Owner Oct 7, 2010
thanks, interesting ideas
Reply
:iconwynnesome:
wynnesome Featured By Owner Oct 7, 2010  Professional Photographer
Fascinating. Some of this is a couple of steps beyond my level of knowledge, but I can make general sense of the explanations.
Reply
:iconadanalvarado:
adanalvarado Featured By Owner Oct 7, 2010  Hobbyist Interface Designer
Great post. I've worked with a few vm's myself to aid development. we used the esx server so it could host 2 vm's. It didn't come out as planned, didn't know you had to put limits on the resources, or else both machines will crawl when they start using resources that the other one needs.
Reply
:iconjekor:
jekor Featured By Owner Oct 8, 2010
Ah. I think multiple VMs is the next logical step for testing, although you'd be surprised just how many services you can run on a single VM with low resources when you have small datasets. Thanks for the comment.
Reply
:iconparallellogic:
parallellogic Featured By Owner Oct 7, 2010
~To protect users' data, we only add developer usernames
Ah, I was going to say you'd likely run into statistical bias (and potentially overlook some bugs) if you're only looking at admin accounts - like if you're doing testing on, say, ad-placement you wouldn't see how it affects standard user accounts (which is what it sounds like you're doing with some of these tests).

~These relationships are used to define the subsets of data required to be pulled from each of the database tables.
So not only are you only pulling select admin data, but you're also only pulling data that you'll need for testing? Just making sure I'm understanding that correctly.

~To handle SSL traffic, we have 2 copies of stunnel running with tcpbf sitting between them. That way both the browser and Apache see SSL packets while tcpbf sees cleartext.
So is that: the browser encypts the data to SSL, stunnel decrypts it, the Apache replaces the URL requests, encrypts it, sends it to the tcpbf switch which reads the .lan extension and directs it to the VM where it is decoded and used by the VM?

How do you do tests under load? If you're testing something like how the comments system would handle under 1 thousand comments coming in in a single millisecond, how can the VM emulate that? Or are you working under the precondition that the the servers will spread that data out over time so that spikes in resource usage won't be a problem?
Reply
:iconjekor:
jekor Featured By Owner Oct 7, 2010
I agree. We have the potential to be blinded to non-admin accounts. We do have some non-admin test accounts (operated by developers) for testing stuff like ad placements, but it's still not authentic out-in-the-wild user data.

So, to be more concrete about the relationships in the data: We know the user IDs we want to grab. We then specify in the config file how to derive deviation IDs from those (deviations submitted by those users, pretty straightforward). Slightly more complicated is how to get the comment IDs on that deviation, etc.

We don't test load directly on the VM. We monitor PHP script execution time via sampling on the live servers (perhaps the topic for another blog post), so we have a good idea of performance overall once something launches. Before launch, we generally have a good gut sense of where potential performance problems will be. We can either create micro-benchmarks for the code or analyze the database queries in question. We also have a rough idea of how fast our VM databases perform compared to the production servers, and we can populate a VM database with a lot of test data to get a pretty good idea of how it will perform at scale.
Reply
:iconparallellogic:
parallellogic Featured By Owner Oct 7, 2010
~We do have some non-admin test accounts
Ah, I was thinking you were filtering by user symbol or something like that, but then the userID config file was discussed and I didn't make the connection.

~Slightly more complicated is how to get the comment IDs on that deviation, etc
So it's trickier to pull comments from the database based on user ID number? I'd presume you'd be using something akin to how you pull recent comments for the Recent Activity page - or is that not a datebase query? You just add items to the Recent Activity list as they are created rather than querying the database for recent items?

~perhaps the topic for another blog post
I'd really love to hear about that; I'm really curious about what kind of stats you're collecting, what kind of trends and stories have been drawn from it too.

~good gut sense of where potential performance problems will be ... We also have a rough idea of how fast our VM databases perform compared to the production servers
Ah, it sounds like you're extrapolating a bit, but sounds rigorous none-the-less.
Reply
:iconjpmeister:
jpmeister Featured By Owner Oct 7, 2010
My god, I think we, the common deviants take the site for granted. I understood 50% of what you said, but even the stuff I didn't understand gave me the logic to realize that you do so much stuff for us. That in the future, more people will write code using this easily adopted new way of writing code. What was once considered too dangerous to try can be learned through the breaking down of scripts, more people coding in new ways further improving the site. Do I have the bland idea? :D

And refraising the original sentence to be a little nicer... Thank you Devious Technologies for making our lives easier everyday! :heart:
Reply
:iconpyritie:
Pyritie Featured By Owner Oct 7, 2010  Hobbyist Digital Artist
as a computer science student, this is all very fascinating :la:
Reply
:iconaladrin:
aladrin Featured By Owner Oct 7, 2010
Thanks for the insight! It looks like you guys have thought this out quite well. :)
Reply
:iconsparkzbolt:
SparkZBolt Featured By Owner Oct 7, 2010  Hobbyist Digital Artist
Neat! I'm glad your using safe ways to prevent data loss.
(I'm not saying much because I didn't get that much of it. :P)
Reply
:iconurbanoantunes:
urbanoantunes Featured By Owner Oct 7, 2010  Professional Interface Designer
i wish i could understand what you just said :\
Reply
Add a Comment:
 
×

More from DeviantArt



Details

Submitted on
October 7, 2010
Link
Thumb

Stats

Views
9,730 (1 today)
Favourites
1 (who?)
Comments
37
×