December 1, 2015
It’s easy to recall the things that went well during your startup. Here are three stories about times when things weren’t going so well.
In the first two years of Justin.tv’s existence, we had a bus number of 1: Emmett was the only person who knew how the application servers worked, likewise Bill for the chat servers and Kyle for the video system. Because we worked pretty much all the time, this was deemed an acceptable situation (we had bigger problems, for example: having no revenue and a very unstable product).
Because we were young and terrible managers, we had an “unlimited vacation” policy, which translated into passively discouraging people from taking vacation. Still, some people knew their limits and took some time off, including my cofounder, Kyle, who one weekend planned a trip to Tahoe.
Key people leaving on weekends was always scary to us because weekends were always when we had traffic spikes (and because we were live video, peak to baseline traffic could be 35:1). In preparation, Kyle fortified the video system, and promised us he would be online in the event of any emergencies.
If something can go wrong on the Internet, it always will, and the Friday that Kyle left the video system went down. Because our site was live video, and because our backend was extremely unstable, we were always in a PTSD-inducing constant state of stress. Whenever things broke we would do the professional thing, and proceed to completely lose our shit. For the first two and a half years of the company’s life we went through cycles of one month of growth, and one month where the site would constantly break, invariably because we hit a limit on disk write speed, bandwidth, CPU, memory, or we forgot to up the number of file descriptors (a sign behind Emmett’s desk read “HAVE YOU CHECKED THE FILE DESCRIPTOR LIMITS?”).
The video system remained uncooperative, and so we used the only tool in the toolbox: we picked up the phone and called Kyle. No answer. We tried many more times: at least once per minute for the next ten minutes. Still nothing. The video player on our site spun circles on an otherwise black screen. Emmett tried logging into the video system and attempted to forensically diagnose the problem.
Time went by and our anxiety increased exponentially. Emmett was not getting anywhere. Kyle was still not picking up his phone. Then, Michael had an idea: we had Kyle’s vacation rental address. We should get someone to drive over there and get him on the phone.
There was no uber, instacart or postmates back then, so we went with the original on demand service: pizza delivery. Michael started calling up pizza places in Tahoe. The conversation went something like this:
“Hi. Can you send a driver to [Kyle’s address] and deliver a message?”
“What kind of pizza do you want?”
“We don’t want any pizza. Can you send someone to that address?”
“You need to order a pizza.”
“Ok, we’ll pay for a pizza, can you send someone to that address now?”
“What kind of pizza?”
“I don’t know, a large cheese”
“Ok, it will take 15–20 minutes to cook”
“!! Don’t wait for the pizza. Just charge us and send someone now.”
“You sure you don’t want the pizza?”
“Ok, it will be $22.90, ok?”
“Ok, what message do you want me to send?”
“The website is down.”
Fifteen minutes later, Kyle got a knock on his door. Groggy from a nap, he answered to a pizza delivery guy, who read “the website is down” to him off the back of a paper receipt. Of course, it took Kyle all of 45 seconds to log in to our backend and run the video server restart script to get the site running again.
After things were up and running, we got a call from Kyle: “Why didn’t you guys send the pizza? I’m hungry.”
Of course, that unstable video system hacked up by Kyle went on to become the fourth largest peak consumer of bandwidth in North America, many iterations later.
Our first really big series of broadcasts was with the Jonas Brothers, right as they were breaking out as stars. One member of their management team at Hollywood Records had heard about Justin.tv and thought live video would be a good way to promote the band online. He got in touch with us and they agreed to do a series of promotional broadcasts on the site around the time of their album release.
Little did we know that we were completely unprepared for the traffic apocalypse that is teenage girls.
The first broadcast had some downtime and latency problems, but overall the experience was ok, the fans seemed receptive, and the band agreed to continue.
The second broadcast was set to be much bigger. The Jonas Brothers’ team had been promoting it for longer and it was closer to their album release. We tried to take what we learned from the first broadcast and improve our system.
There is a concept in computing called the “thundering herd problem,” where many processes wait for the same event. When that event occurs, the processes are woken up, but only one can be served, and the rest go back to sleep, only to wake again and request access to the resource. This takes CPU cycles, eventually grinding the system to a halt.
This was the first time we really learned about the thundering herd problem. The site went down 30 minutes before the broadcast was scheduled to start, as many fans had gone to the page in advance and started doing stuff on the site: signing up, logging in, favoriting the Jonas Brothers’ channel page. All these dynamic actions, plus the constant refreshing of the page to check if the stream was up (our equivalent of processes “going to back to sleep, only to wake up again”), created a massive strain on the application servers, and the site proceeded to fall over. Concurrently the video system fell over, because there were too many simultaneous requests, and we couldn’t bring online enough video servers to serve them.
While Emmett and Kyle scrambled to statically cache the page, shut down any and all dynamic features on the site, and live release production changes to the video player to manually control when requests were made to the video system, Michael and I took turns on the phone with Jonas Brothers management explaining what was going on (or trying to). At first, when the site was down before the scheduled broadcast, we told them we had taken the site down for maintenance to make sure everything worked (I’m not proud for lying about this — I literally had no idea what to say). As time ticked by and the broadcast start time came and went, we ran out of excuses and started just telling whoever from management happened to be calling (different people were calling us angrily every few minutes), that they should call whoever wasn’t on the phone at the moment for the most up-to-date update (Michael and I were standing in the same room, sweating bullets).
Just at that moment, while Michael and I were at our peak freak-out, our office manager, Arram (who went on to found ZeroCater), walked by and casually said something that’s stuck with me to this day: “Officers don’t have morale problems.”
I wish I could say that I realized the truth of those words and immediately pulled myself together, and provided an example of calm and stability to a very stressed out team. Instead I think I screamed out something along the lines of “What the fuck are you talking about?!” and continued bemoaning whatever I’d done in a previous life to deserve having been so close to startup success only to see our prospects swirling down the drain.
After what felt like decades, Kyle and Emmett got the site in a functional state and the broadcast proceeded, late by 25 minutes or so. In retrospect, not the end of the world. Hollywood Records lost all faith in us and did the rest of their promotional broadcasts on Ustream. Eventually, we got better at scaling.
Any founder of a social site on the Internet knows that there exists a constant baseline traffic from perverted degenerates looking for filth. On Justin.tv, this manifested itself as a constant stream of (presumably) men going into the chatrooms of any channel that had a woman on camera and saying things spanning from the very awkward (“show feet”) to the outright horrible. We had built moderation tools to allow broadcasters to ban the trolls from chat, but, like terrorists, they kept finding ways in: creating new accounts, using new IP addresses, and figuring out ways to access the site despite our best efforts.
This behavior didn’t just extend to chat. In fact, you could bet that pretty much any text field that allowed user entry would be filled with sex-related terms. Our on-site search was no different; several of the top search terms were “porn”, “sex”, etc.
One day I had what I thought was a brilliant idea. Our community moderation tools had been very effective in taking down any x-rated content from the site, and yet, people were still searching for it. Why shouldn’t we just auto-redirect any searching a porn-related query to a cam girl site? We would kill two birds with one stone: the user would get content they wanted (or at least be closer to it), and we would get them off Justin.tv.
As an afterthought: it turns out that because they monetize users so well, cam sites have hefty affiliate fees. Why not stick an affiliate code in our redirect and see if we made any money off them?
The search redirect was easy to build, and we implemented it and then promptly forgot about it. Until, almost a year later, an email from Jason Kincaid of TechCrunch came in, asking for a comment on how we were making money from redirecting users to a porn site.
Michael and I (the founders in charge of the “business” side of the company at the time) didn’t know how to respond. Morally, we thought we were in the right, but realized the optics of anything that sounds like “making money off of porn” sounds bad. We emailed Jason asking for a couple hours to respond. In the meanwhile, we pulled the redirect off the site.
Of course, he hit publish thirty minutes later, before we could come up with anything intelligent to say about it.
What I learned: if you’ve done something you think the public is going to react badly to, you can’t delay it, hide from it, or ignore it. You have to address it head on and take your beating. Always take your beating.