deviant art

Deviant Login Shop  Join deviantART for FREE Take the Tour
At the close of my first week with the deviantART dev team, I thought I would take a moment to reflect on my observations.  Now that I've got my VM environment set up (which is awesome, btw…  anyone w/ a complex PHP development environment and more than a handful of developers should invest in this route) and have already fixed a few bugs on the site, I'm taking a moment to step back and reflect on my first week.

Getting set up

The set up process was actually a breeze: download the vm, follow the instructions in the "Getting Started" page on the internal wiki, and within less than an hour I had a copy of nearly the entire deviantart.com website running on my laptop.  This is truly amazing, especially when I consider at my last job it took a new developer 1-2 days, sometimes more, to download and set up local copies of Apache, PHP, Java, Ruby, MySql, and all the necessary libraries, plus get everything configured and working, even with clear instructions and scripts to help automate the process.

Other than that, with my deviant account upgraded to administrator privileges, access to email, the code repository and developer wiki, in practically no time I was ready to take a stab at a ticket or two, faster than I think I've ever been set up in a new company before.  Low overhead!

Organization

The deviantART dev team is is a very modern, well run, distributed team with developers collaborating all over the world, as far away as Europe and South America in a variety of time zones (all speaking English, thankfully!).  Working within a flat organization hierarchy, team members are encouraged to raise ideas and ask questions with little reason to fear stepping on anybody's toes or going over somebody's head.  Most communication happens over chat (instead of email, which is growing outdated).  And while I find chat to be a distraction at times, I recognize it is incredibly efficient at disseminating knowledge rapidly.  The team does a good job of archiving its knowledge on the internal wiki pages.

As a new developer facing an advanced codebase over ten years in the making, I obviously had (and still have!) a number of questions.  Hitting up any number of the tech team's chat rooms for help usually results in me getting an answer within just a few minutes.   I am very impressed how happy everyone on the team is to help.  This is a culture of collaboration which makes me feel very much a part of a team, even as I write this many thousands of miles away from the main corporate office.

To maintain unity, everyone is brought together once per week in a massive conference call to review the progress of all active projects.  I was weary of such a large call at first, wondering how so many people could collaborate effectively on a single call, however the participants are very respectful of an implied etiquette not to interrupt or "talk over" when someone has the "floor" and is speaking.  During the week, self managing project teams meet as needed.  

Cool technology

Delving into the internals of what makes deviantART tick reveals a treasure trove of really cool, "state-of-the-art" technology.  These are the guys behind the first skinnable mp3 players and online music communities from way back, and they are very talented, forward thinking computer scientists.  deviantART started building social networking technology before there was really much social networking technology around, and I find that there's innovation underneath the covers that predates modern convention in almost every corner.  

I've been delving deep into the JavaScript layer, for example, and I find a very advanced dependency framework, with dynamic loading and a super slick event handling system, never mind what Mike D's been doing w/ deviantART muro  …  that's hot!  

As I dig into more layers, from the front end JavaScript through to the app layer, search engine, database, server infrastructure, and more, I see more evidence of very modern, progressive thinking.  I'd love to pour into the details of it all like the component driven architecture system, or the developer runtime environment, but all I can say right now is to stay tuned to the dT tech blog as I know the developers are planning to share some of the details moving forward!

Close interaction with a passionate community

What impresses me the most so far is how well integrated the tech team is not just with each other, but with the users of the site.  This part is really cool, and I think it's the most valuable part of the deviantART business, which clearly puts community building and user satisfaction at the top of the priority chain.

I fixed a rather simple bug with the ShoutBox, a small public chat room that hangs off the side of the main chat area, as well as optionally on group pages.  When I entered the ShoutBox, just to confirm the ticket I was assigned, I was immediately identified as a developer and greeted from the users there.

See, deviantART's website places a unique character identifier in front of each user's username indicating their role on the site.  For example, a tilde ('~') will represent a common user.  An asterisk ('*') will represent a premium member.  There's others I have yet to understand like equals sign ('=') and more.  My user, $saladoche, has a dollar sign, which means I have admin privileges on the site.   Additionally, when you go to my deviant profile page, you'll instantly see I'm a member of the #dt tech team (note the pound sign to identify a group.  For more, see  this FAQ entry about symbols)

So basically, when I entered the public ShoutBox, everybody could see I was one of the site's admins, and greeted me kindly like anyone else, but since they could see I was a developer, one of the users informed me of an additional glitch that was interfering their experience, different than the ticket I was assigned.  I told everybody I'd get on the case.

A few hours later, after making my way through the code, finding the bug, and fixing it in my local VM, I pushed the change live and returned to the ShoutBox to let everybody know it was fixed (actually, my partner $kemayo , who assisted me with my first deployment, let them know before me, but for the purposes of this blog post, I'll take credit ;)) Everybody was so happy.  Instant gratification -- I think the person who reported the problem to me was still in the ShoutBox.  I've worked on a number of websites, including top-ten portals (from eons ago), and small local community sites, but have never found one where the developers are so close to the users as they are at DA.  This is awesome, I thought to myself, and powerful.

This is the very customer-first mentality that the likes of Tony Hsieh recently wrote a book about, happening right here in front of my eyes.  Developers interacting with customers, and putting the customer needs at top priority.  That's rare!  Most places hide the developers as far away from the customers as possible.  In some cases, that's a good thing.  However, in other cases, it works really well to have the developers interacting with the customers, especially in cases where the developers are really passionate about the product their building, which is exactly what I find at deviantART.

Open to ideas

I posted my first journal entry commenting that it took me three days to figure out how to change my avatar picture on the site.  I got a few comments on that journal post (I didn't think /anybody/ would read it), but what surprised me most was that the user interface team brought it up in the weekly developers meeting and committed to fixing it, as they'd heard other people with the same problem.  (Since then I've raised a few more issues internally and they are **on it**, working to solve problems and improve the user experience, with some really sleek items in the pipeline)

A work of art

My dad, a long time computer programmer, imparted to me that software development is very much an art form, an ideology I believe more and more with each passing year.  Donald Knuth expressed this concept in the title of  his book, influencing a number of famous computer scientists to agree (Richard Stallman, Guido van Rossum, Bjarne Stroustrup, just to name a few…)

Certainly my happiest realization of being a part of the deviantART dev team is that I've discovered deviantART's community of artists includes not just the users who contribute digital works on the public facing side, but also the software developers contributing their digital works on the back end.  When I take this into consideration, that I'm contributing to a piece of art, suddenly the work becomes so much more enjoyable.  

And here my conclusion lies: deviantART the site, and the community around it is, by its own composition, a work of art.  From the digital media the users contribute, to the code the developers develop, to the digital interactions that take place between everyone involved, it is all creative in nature, and artistic.  I think the more I take this philosophy to heart, the more I will enjoy my job, and the more the website, company, and community as a whole will benefit.  And that feels fabulous.

Think you've got what it takes?

If you're passionate about programming, and being creative, this is surely the place to be.  I'm happy I joined the team; now I've got to go be creative.  Hasta lluego...

Quoting Donald Knuth, "A programmer who subconsciously views himself as an artist will enjoy what he does and will do it better."

deviantART muro and HTML5

Mon Oct 18, 2010, 9:34 AM by $mudimba:iconmudimba:
I was recently asked by the folks at HTML5Rocks.com to write a "case study" about deviantART muro and how it uses HTML5.  For those of you who don't know, HTML5 is a new web standard that quite a lot of the tech community is excited about.  Here is the article for your reading pleasure:

Case Study: HTML5 in deviantART muro



Embedding deviantART Media in Your Applications Made Easy


One of the most frequent activities in social applications is link sharing. Wouldn't it be nice if your application could display an image thumbnail or some info from the linked resource? Normally developers would use clunky proprietary APIs, if those are available at all, or resort to screen scraping the pages to extract bits and pieces of information. And they’d have to do it differently for each service. Needless to say it's a hassle.

oEmbed solves this common problem nicely.
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
Many large services such as YouTube, Flickr, Hulu, Vimeo, Viddler, Qik, MyOpera, and now deviantART support this API. Our implementation of oEmbed API makes it simple for application developers to embed media from deviantART from having just a link to a deviation page. For example, a request like this:

http://backend.deviantart.com/oembed?url=http%3A%2F%2Ffav.me%2Fd2enxz7
Will return you this JSON response:
{"version":"1.0",
"type":"photo",
"title":"Cope",
"url":"[link]",
"author_name":"pachunka",
"author_url":"[link]",
"provider_name":"DeviantART",
"provider_url":"[link]",
"thumbnail_url":"[link]",
"thumbnail_width":300,
"thumbnail_height":450,
"width":448,
"height":672}

All that the application developer needs to do is to read the desired values from this response, be it thumbnail or preview url or author name, and output an image tag or text on the webpage.



Calling this API from Your Applications

By far the simplest way to use our oEmbed API is from client-side javascript using jQuery. Your application would need to recognize that a link in status post is to a deviation page at deviantART and use this JSONP (JSON with padding) format oEmbed API call:


Of course, you can choose to use the API from the server side, parsing links and doing API calls when status updates are posted or possibly formatted for display. In that case, you would probably choose to use JSON or XML format. The oEmbed API specification page lists a variety of libraries for most popular languages and frameworks that simplify an implementation. And needless to say, if you choose to embed media from deviantART using this API, it's a very small step to provide the same integration with many other services that support it, such as Flickr and YouTube. Only the link format detection and url endpoints are specific to the services.



Who Uses deviantART oEmbed API?

Some of you may have heard that deviantART was one of several launch partners for the new version of Twitter. A big part of that announcement was the ability to display images and videos in the Twitter stream. Content from deviantART was there, from day one. How did they do it? By using our publicly available oEmbed API. Twitter detects that the URL present in the status update is a link to a deviation on deviantART, hits our API JSONP endpoint, reads the response and displays the preview image.





API Documentation
See oEmbed API  specification page
deviantART provider implementation:
API JSON endpoint is: http://backend.deviantart.com/oembed
API XML  endpoint is: http://backend.deviantart.com/oembed?format=xml
API JSONP  endpoint is: http://backend.deviantart.com/oembed?format=jsonp&callback=yourCallback
<dl>
<dt>URL scheme for url parameter is:</dt>
    <dd>http://*.deviantart.com/art/*
    http://fav.me/*
    http://*.deviantart.com/*#/d*
</dd></dl>


At deviantART, every developer gets their own virtual machine. Our VMs are our private laboratories. In them we work, test, and experiment, with no fear of harming the site.



The VM is also the most significant change we've made to our deployment chain in the past 6 years.






How Does a Virtual Machine Improve Development?
Fewer Commits

If you have to commit to the repository on a staging server every time you need to test a change, you end up with a lot of commits. Not only does this take time, but it creates a lot of noise. Before the VM, a single change could result in as many as 20 commit messages. We found it very difficult to keep track of what was changing on the site. Now, developers tend to commit code in single logical chunks with meaningful messages.



Reduced Contention for the Staging Server

Staging acts as a final check for code that's already been tested on a VM. It uses the production databases and other production daemons, and it gives us a good idea of what performance will be like. But it's a shared resource, and when it's locked it can very quickly prevent other developers from getting work done. Fewer, more well-tested commits keep the staging server in a known-good state for longer periods of time.



Freedom to Experiment

Some code changes would be too tedious or dangerous to test on the staging server. Branches can help in some cases, but you also need to isolate or create copies of resources (such as the database), which can be impractical and time-consuming. Better to test the changes on a machine that doesn't have a connection to anything in production.



This isolation of the VM also helps developers learn new systems faster. Developers can break their VMs like kids breaking apart their parents' expensive electronics to figure out how they work. The worst that can happen is that the developer has to build a new VM.



How the VM is Made

Almost all deviantART servers run off of a common base netboot image. We took this netboot image, converted it to a VMWare disk image, and added a boot loader. Then, we configured it to download and install the same binary packages used by the live servers. Most packages require some conditional tweaks in their configuration files, but they're all relatively straightforward.



Creating a new VM is mostly automatic. A developer downloads a disk image, adds it to VMWare, and then runs a setup script from within the VM. The script downloads the correct versions of the necessary binaries, configures them, and starts any associated daemons. All that's left for the developer is to mount the source code directory on their host via SSHFS.



What to Do About Databases

Accessing the production databases from the VM is out of the question. The VM cannot be allowed to make changes to any production resource. Additionally, the added latency would make testing on the VM much more tedious. For example, if a page on production runs 45 queries in 30ms, the same page on the VM would take no less than 2 seconds to run the same queries (assuming 50ms of round-trip latency). Instead, we have to bring a copy of the databases to the VM.



Imagine a parallel universe in which deviantART only has a handful of members. That's essentially what the VM is. We take more than 2TB of database data and extract the approximately 50MB used by just the developers' accounts.



We wrote a program that pulls this data from the production database servers nightly. The program is driven by a configuration file editable by developers. The file begins with a list of usernames; only data on these users will be pulled. To protect users' data, we only add developer usernames. Next, the file defines how various data is related (sort of like foreign key relations, but not quite). These relationships are used to define the subsets of data required to be pulled from each of the database tables.




As the program runs, it outputs a .sql (text) file resembling the output of mysqldump. When complete, it places the file on a server and rotates the previous files as backups. At any time, a developer can run a command to automatically update their VM's database with the latest data (or optionally roll it back to some point in the past).



What to Do with Resources

Even after reducing the database to 50MB, the associated image files would still be too large to download and store locally. Luckily, we don't need to. Instead, requests for images and other non-CSS/JS files are proxied (transparently) by Apache to the live site. For us, this is simplified by the fact that these files are hosted on distinct subdomains, but you can achieve the same with any regular directory structure.



Sometimes, though, you don't want the request to be proxied. For example, when testing uploads and submissions, we don't want to have to upload the test file to the site itself. Instead, we upload it to the VM. Apache, instead of proxying the request immediately, checks to see if the file exists locally first and serves it from the VM filesystem if it does.



Hosting the VM on a Separate Domain

Originally, we edited our /etc/hosts file to point deviantart.com to the VM when we wanted to work with it. But doing so was quite a hassle, especially considering you had to switch browsers or use a cookie switching plugin to avoid cookie collisions. Also, it wasn't always clear if you were browsing the VM or the live site. So we decided to move the VM to its own domain: deviantart.lan.



deviantART is unusual in that it has millions of subdomains: 1 for each registered member, as well as a collection of reserved subdomains. Enumerating all the possible subdomains in /etc/hosts would be tedious and error prone. So instead, we setup an instance of tinydns on the VM to answer requests for *.deviantart.lan. Tinydns can be configured to then proxy DNS requests for domains it's not responsible for to another DNS server. Or, if you're using OS X you can create a file in /etc/resolver to tell OS X to pass any (and only) requests for deviantart.lan domains to the VM.



Avoiding Code Changes

Making the VM respond to deviantart.lan posed a new problem. All of our code was written with the assumption that we're responding to deviantart.com. We first tried removing all literal occurrances of the string "deviantart.com" from the code and replacing it with a constant/global variable reference (which could vary between the VM and production). But there were too many places in the code to change, and it would be easy to forget and commit a literal "deviantart.com" later.



We decided to move back to solve the problem: we'd translate the name at the HTTP level. We looked first at using an Apache directive or extension. Unfortunately, none handled rewriting both the body and the headers (necessary for catching cookies and redirects) in both directions. So we wrote a daemon: tcpbf.



tcpbf

tcpbf is our "TCP Bi-directional Filter". It's a simple program that allows us to make regexp-based replacements in the requests to and responses from the VM. For example, we change "deviantart.lan" to "deviantart.com" on the way in and vice versa on the way out. Additionally, we strip any Accept-Encoding header on requests to prevent the response coming back as gzipped data (which we couldn't do text replacement on).




This explains why we chose ".lan" as the new TLD for the virtual deviantART: keeping it to 3 letters keeps the Content-Length of the response correct. If we didn't, we'd have a mismatch that would break browsers that use chunked transfer encoding, like Chrome and IE.



To handle SSL traffic, we have 2 copies of stunnel running with tcpbf sitting between them. That way both the browser and Apache see SSL packets while tcpbf sees cleartext.



Getting to 100% Accurate Emulation

The VM isn't perfect. Sometimes it's missing data. Occassionally changes are made on production that aren't compatible with it. Sometimes we hit an edge case on the VM we wouldn't have hit on the productions servers due to a difference in resource limits, and vice versa. Luckily, the emulation doesn't have to be 100% accurate to produce benefits. The most important thing we did was make it good enough that developers wanted to use it. Ever since everyone has adopted it, it's been improving naturally over time.



Chaos Gerbils: An Explanation

Sat Sep 25, 2010, 1:08 AM by $kemayo:iconkemayo:
#hq already gave a non-technical summary of these events. We thought that you might enjoy a little more detail, though.

I warn you ahead of time that this entry assumes some basic familiarity with the techniques associated with data storage on large websites. If you've never heard the term "shard" applied to databases before, this might not make sense in places. In a later post we're going to do an overview of how deviantART's servers are structured which will cover this.

Breaking down

Last Monday at about 1:30am, people started noting some really weird things happening on deviantART. Their comments would appear somewhere other than where they were left. Their message centers filled up with notification of journals from people they didn't watch. They found inexplicable notes in their Message Center. Polls they posted had the options from someone else's poll. All sorts of things were failing: posting deviations, creating galleries and collections, almost anything that involved making something new on the site. Obviously, something was wrong.

Being that it was the middle of the night for most of us, it took a little while for reports to filter through helpdesk tickets and reach someone who could react. The only people were a few of our more insomniac and/or European developers: `20after4, $kouiskas, and $pachunka. All that had been reported so far was that people in #devBUG were getting journals in their message center from people they didn't watch. The natural assumption was that it was something to do with the new features we were having #devBUG beta test at the time. Before long the mixed up comments were noticed, and more reports from non-#devBUG people came in, so that was ruled out.

At that point we knew that something was really messed up, so the site was thrown into read-only mode. This is something we try to avoid doing, as you might imagine. We get somewhere close to 100,000 new deviations every day, well over a million comments, and thousands of new deviants... and the site being read-only shuts all of that down. But data corruption is serious business.

With some time to look into the issue in relative peace we found out that data with the same id was appearing on different database servers in our cluster. Since we shard the data by user, this was an indication that things were seriously mixed up; apparently we'd been giving out the same id to multiple pieces of content.

With this clue we investigated the source of our ids. Since we're using sharded servers, we can't rely on auto_increment for these, so we store sequence values on one database server and increment them whenever we assign an id. Looking at these, we saw that some of the sequence values had decreased at about 1:30am, when the trouble started. We then worked out that at about this time some fairly routine database maintenance had occurred, which involved swapping that server with its backup. These servers are supposed to be identical, and their replication was up to date when the swap occurred.

As far as we can tell, what happened was a failure in statement-based replication. When we update the sequence values we don't set them directly to a new number, we just send "value = value + 1", and rely on MySQL's LAST_INSERT_ID to get the new value. So if occasionally one of these queries just didn't get replicated, the backup's sequence values would slowly fall behind.

We immediately manually added a fairly large number to all of the sequence values, making sure that they were all above the largest-observed id value in use, and took the site out of read-only once we were sure that stopped new data corruption.

Cleaning up

Now we were stuck with the problem of fixing as much of the broken data as we could.

(This is about where I woke up, and can legitimately claim to be part of "we" in this story. Because I'm in charge of our "Reactor" team, I got to coordinate the cleanup effort.)

Fortunately, because of our sharded servers, the problem was often just that different data with the same id existed on multiple servers, and the wrong server was being read from. In those cases the solution was to find the duplicate ids, and to assign new ids to some of them. That was pretty easy; I wrote a quick tool to find the duplicate ids, and the new ids were seamlessly handed out.

However, some of the older parts of the site maintained data on a non-sharded server. This old code also often didn't check whether or not the initial insert succeeded... resulting in data loss when further queries went ahead. Journal entries got mixed up terribly, for instance. In these cases we unmixed them as well as we could, generally putting one of the entries back together, and deleting the shattered remnants of the other.

Notes turned out to be especially bad, because they had mixed together all the recipient lists for the original notes, and we had no way of telling what the original list was. Notes are some of our very few totally private pieces of information, and we absolutely couldn't risk anyone reading a note that wasn't meant for them. So we deleted the 2,552 notes that were sent during the incident.

Comments were intimidating just because of their scale. 33,212 comments were jumbled up, the text for one comment appearing in the thread where another was supposed to be. Luckily this turned out to be mostly fixable; 32,861 comments were put back in their proper place, and we only had to delete 351.

This could have been a lot worse. We were initially worried that deviant credit cards might have been getting mixed up, but this turned out to not be possible. Similarly, user/group widgets and privileges could also potentially have been mixed up... and we were happy to discover that they were immune.

Prevention

As you might imagine, this has made us start looking into ways to avoid problems with replication in the future. We'd never seen a replication bug this subtle before, and the preventative measures we had in place were for more blatant issues. To patch up this particular hole we've put some automatic monitoring on the sequence values, so that if the master and backup drift out of alignment we'll know immediately. More generally we're evaluating switching to row-based replication. We've also been considering moving to a system like Twitter's Snowflake to get ids, without having to rely on database integrity.

Because it's not paranoia when MySQL really is out to get you, while we were cleaning up we added error checking to a lot of old code which was assuming that its inserts couldn't possibly fail. This means that if our sequences ever do fall back again in spite of the precautions we mentioned there won't be any corruption occurring before we can respond.

This incident has added some twists to how we're going to investigate similar problems in the future. Normally we start out by looking at the last code to be launched before an issue began, since that's probably related. We also didn't immediately think it would be the database maintenance, because the sequence slip meant that timestamps on affected content were from before the maintenance. Now we know that this sort of replication bug is possible we can check for it early, instead of looking for it as a last resort.

A lot of credit must go to our insomniac heroes, `20after4, $kouiskas, and $pachunka who quickly realized how screwed we were and made the judgement call to pull the plug on the site until we could fix it.

Recent Journal Entries

We're Hiring Developers

We're looking for talented web developers to join our team! :la: Interested? Check out [link]

Journal Writers