Yesterday's outage
I'm generally quite happy with the site here at Dreamhost. Even so, every now and then, the site(s) become(s) unresponsive and goes down. We're being told that our site was using up a lot of memory, which crashed the (private) server. After a reboot, most of the time the site works fine again.
Most of the time I miss the "event"; I do visit/check up on the site at least once daily, on average, but these outages always seem to happend when I am not looking. Well, yesterday I finally happened upon an unresponsive IVRPA site, so I could check it out first hand. The following graph is somewhat shocking:

Normally, the red line (memory usage) hovers around the 50-100 Mb. At around 8am (server time), things started to go haywire. I stumbled upon the site around 11:30. The downward spike after 12:00 is where I rebooted our server, which helped for a couple of minutes, but then the big spike to 1500 Mb happened. That's a good 15-30 times our normal memory usage! At that point I decided to put the main site in maintenance mode and have our servers cool down. As you can see that had an almost immediate effect. I had to get some sleep (server-time != my local time), and this morning (local time) I put the main site back online without so much as a hickup. We're running at our normal 50-100 Mb again.
Looking through the server logs, I'm affraid what happened was that we were getting spidered by all conceivable web crawlers (search engines) at the same time. Argh! Yahoo, Google, MSN, AskJeeves (still alive?), Exabot (what?), Baidu (who?), they were all having a party at our server. What irks me though, is that we are not robust for this type of spidering. Our Drupal 4.7 codebase is starting to show its age, we really need to step up to D5 and beyond. Interestingly, our mysql server remained unaffected; even at (semi-) high load, we're not overloading the db.


Re: Yesterday's outage
Submitted by Thomas Rauscher on Mon, 2009-10-12 23:07.Why not limit the Apache processes so that all process still can fit into the main memory? This will give the spiders a server error and maybe they slow down their spidering.
This measure helped me with the PanoTools wiki (now up for 199 days!) that suffered from the exact same problem a year ago.
Re: Yesterday's outage
Submitted by Aldo Hoeben on Tue, 2009-10-13 08:03.Thanks for the tip. Interestingly, afaik, the processes already are limited; both in time and in memory usage. But I'll dive some further into httpd.conf...
Re: Yesterday's outage
Submitted by Thomas Rauscher on Tue, 2009-10-13 19:10.I limited to a really low number because the spiders tend to hit the pages with the biggest memory usage, several at time. I also found that this has no real impact on the response time because blocking slows down the concurrent threads any way.
Re: Yesterday's outage
Submitted by Aldo Hoeben on Tue, 2009-10-13 19:53.So... are you talking about RLimitMem in httpd.conf?