The easiest way to crash your site

The easiest way to crash your site.

With all the latest trends in the web such as “Social Sharing”, “Distributed marketing campaigns” or just simple website trackings, people tend to forget about one tiny simple fact:

Dependency on 3rd party vendors stability.

If your website is your sales channel or even your only product you must have the ambition to keep it running and fully functional 24/7 and with 99,99% uptime per year.
Everything below that is a significant loss of revenues. My employers shop (www.otto.de) has 2 orders per second on average. If it had 99,9% (instead of 99,99%) uptime per year, well, calculate yourself. A year has 31.536.000 seconds. Talking about 0,09% downtime a year thus means 7,8hrs or 28.382,4 seconds of unavailability. Sounds fair. But 28.382,4 seconds of downtime compared to to 2 orders per second, this would mean a loss of 56.764 orders !

Now we get closer to the point I want to make.
Just 0,09% of our website being “not fully available” can mean a loss of thousands of orders and thus a lots of revenues. And that is the only loss you are aware of !
Because there are red-lights bumping up in your data centre and your ops are running around like mad chicken trying to get stuff back online again.

What about “downtimes” or “times of being unavailable” you dont notice ?!

That is the real interesting part.

Downtimes based on your own infrastructure are easy to spot and to measure. Your VP SiteOps may already generate a precise report for the C-Level guys saying that “we were up an running this month like 99.97%”.

Fine.

But after everyone embraced himself for being so available, the marketing departments requires a multi-channel-tracking-javascript from company XYZ, the CorpCom guys want some fancy new G+ share buttons and your business intelligence department requires a new tracking lib being served from the Foo CDN. So you just stabilized your own infrastructure and are proud of the 99.97% but introduced several new Point of Failures: Third Party Content.

Let me give you 3 rules which are very important to be totally aware of:

1) Third Party content is not within your control

2) In General servers will fail. Every server fails. There is no 100% uptime. And uptime with 98% CPU load is also uptime

3) Thus third Party servers will fail and if you haven’t done your homeworks, you will fail too. No matter how fancy your infrastructure fail overs are. And then think of 1)

Homeworks 

So lets do our homework and understand what will fail !!

Lets assume we have two different types of Third Party Code. JavaScript and CSS. We leave out backend stuff here, because they are usually provided with good test coverage and failovers. If e.g. you want to use some marketing tracking stuff on your website, usually the “marketing tracking provider” asks you to put his <script> block just right below the opening <html> tag in order to work properly.

Now we mix some ingredients together:

– every <script src=””> tag is loaded *synchronously* by the browser. Always. Per definition. That means, the browser does not (!!) continue evaluating the website unless it has downloaded the src file and has evaluated it.

– Third Party Servers or network connections can crash, fail, delay and so on

– And finally some salt to the soup: you want to track your pageviews because you get affiliate money or whatsoever

 A very easy and abstracted code example would be

<html>
  <head>
   <script src="http://www.thirdparty.com/tracking.js"> </script>
  </head>

  <body>
    Your websites content
    <script> affiliate.trackAndGenerateMoney();</script>
  </body>
</html>

now as mentioned above, lets imagine thirdparty.com or any of the magic between the client and the server of thirdparty.com is broken and the HTTP request there does not succeed. This leads to the first script-block loading for like 30-60seconds (default browser timeout).
Until the browser aborts the pageload, the user gets to see a blank page with a loading indicator.
Repeat: Just because you embedded thirdparty javascript and THEY are down, YOUR users sit in front of a blank page.
Usually users wait for an average of 10seconds until they leave.

The users that get to see such a fail abort:

– with the worst possible user experience and thus

– with lower return rates and less recommendations

– with most likely no generated revenues

– and even worse, you dont know it, because your tracking did not fire 🙂 . Consider: the user aborted the pageload before the tracking fired.

The situation is not so much different when including third party CSS.


<html>
  <head>
    <link rel="stylesheet" href="http://www.thirdparty.com/some_widget_magic.css">
  </head>
  <body>
    Your websites content
    <script> affiliate.trackAndGenerateMoney();</script>
  </body>
</html>

In the common browsers, loading these from an unavailable server, the browser wont even start rendering the page at all (see e.g. http://www.phpied.com/rendering-styles/) until some browser timeout triggers (commonly 30 seconds). This gives the user the bad experience of a white screen. Again, you wont event notice as your tracking relies on dom:ready and thus wont fire.
And interesting question would be: What happens if a third party webfont is being referenced from your own stylesheet ? But that would be too much here.

Example

Here is a tiny video I made from the very popular website www.smashingmagazine.com. This will give you a good visualization about the effect of a broken third party server.

Please find on the left handside the page with everything working(your website and the third party webservers) and the right handside the situation were two third party servers (Affiliate partner and twitter) are down and don’t respond.

Got it ?!

Not so nice, right ?!

So how can be safe with regards to SPOF (Single Point of Failures) & Third Party fails ?

Tools & Tips

#1 Choose your Third Party Providers wisely ! Ask them whether their script snippet, css include or webfonts loads *async* . If the reply is like “uhm, what ?” or “well, this is not possible”, choose another partner. Seriously

#2 Think about embedding such code to your platform. Do you really need that ? Can you provide the feature yourself ? Could you at least host stuff in your own infrastructure ?

#3 Install a browser plugin such as “SPOF-O-MATIC” (https://chrome.google.com/webstore/detail/spof-o-matic/plikhggfbplemddobondkeogomgoodeg) You can easily see if your page has the potential to fail. And it is fun to browse around the web and see how blind website owners are. Even companies, where the website is the only revenue channel.

 #4 browse your websites code (locally in your dev environment) for external references such as the above. Replace any occurence of a third party reference with http://blackhole.webpagetest.org. This site generates a 30s lasting request that magicly simulates a third party downtime.

 #5 Pro and advanced tip: Change your /etc/hosts file and redirect request to facebook, googleplus, twitter and urlofyour3rdpartyprovider.com to http://blackhole.webpagetest.org. Honestly, while at work, you shouldnt browse FB and G+, so why not working all day while simulating they are down ?! You will be astonished how many websites appear to be broken or even down while we only simulate FB and G+ are down 😉

My personal favs are #1 and #5.

#1 I want to work with awesome people. And if someone gives me code that could crash my site, he is not trustworhty.

 #5 It is ever so great to see the impact of a well sorted SPOF and if you browse your product/website frequently over the day, you will immediately see SPOFs before they take down your site.

In the end, dont trust anyone but your own devs, ops, devops and talk with your third party vendors about SPOFs

Über

@BjoernKaiser, Frontend Evangelist, Webperformance jedi, Teamlead Software Engineering @ otto.de, dad of two, husband of one.

Tagged with: , , , , , ,
Veröffentlicht in Architektur, Development

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s

%d Bloggern gefällt das: