Yikes, two Leaseweb server crashes in a row!
by: Freaking Wildchild @ 09 Mar 2009 [tags: crash IT LeaseWeb ]I’ve been having a terrible headache of crashing disk drives lately. For sure when this happens for twice in a row! Once at 2 march 2009, the second at 8 march 2009. This with the brand new HP DL 180 server hosted at Leaseweb.
Mental note: whatever you administrate:
KEEP BACKUPS READY! All time!
The first crash at 22:45 was swiftly taken care of, the disks were changed inside 24 hours with Western Digital, the dataloss was quite minimal. After working for 5 consecutive days all data was recovered, including my work for a second server which was ordered through Leaseweb. The restoration process was visible by miles, since the changes from ArtistBlog.ME/beta were implemented at once at the production side of ArtistBlog.ME.
The second crash happened just a day before writing this, this time at night. At 05:00 I’ve noticed the server was going very slow and I was able to send myself the latest snapshot of the backup which was ready to be transported. The server was brought back online at 18:00 thanks to Leaseweb support and after a bit of MySQL & permissions troubles the ArtistBlog.ME component was back on-line in no time later.
Leaseweb has responded me “the HP controller firmware seems to be buggy, which will be upgraded tomorrow morning.” Hopefully this firmware will fix the server with an automagical potion of fairy dust so it’ll be enchanted to the use of the ArtistPlug.ME network for many years. I’ve already demanded for a parallel solution if this server would corrupt it’s data one more time, to get a kosher prompt replacement of the server; because this could hurt customers with the second server i’ve been preparing at Leaseweb.
So, Leaseweb, let’s really hope so this HP DL 180 server will be healed up with the firmware for reliabilities sake…
update (19 june 2009): Until now the update has been working great! (holding wood). Leaseweb support has been swift and easy and the server has lived long and happy since that famous HP controller firmware update!
You’ll find a bit of reference material underhere!
The first crash … where it started …
Mar 2 23:12:46 mother kernel: cciss: cmd f7002500 has CHECK CONDITION sense key = 0×3Mar 2 23:12:46 mother kernel: Buffer I/O error on device cciss/c0d1p1, logical block 136478722
Mar 2 23:12:46 mother kernel: cciss: cmd f7002750 has CHECK CONDITION sense key = 0×3
Mar 2 23:12:46 mother kernel: Buffer I/O error on device cciss/c0d1p1, logical block 139591683
Mar 2 23:12:46 mother kernel: lost page write due to I/O error on cciss/c0d1p1
[...] And it doesn’t get better than that afterwards ! [...]
Mar 2 23:13:20 mother kernel: EXT3-fs error (device cciss/c0d1p1): ext3_journal_start_sb: Detected aborted journalMar 2 23:13:20 mother kernel: Remounting filesystem read-only
Mar 2 23:13:21 mother kernel: cciss: cmd f70006f0 has CHECK CONDITION sense key = 0×3
Mar 2 23:13:21 mother kernel: Buffer I/O error on device cciss/c0d1p1, logical block 166789121
Mar 2 23:13:21 mother kernel: cciss: cmd f7000940 has CHECK CONDITION sense key = 0×3
Mar 2 23:13:21 mother kernel: Buffer I/O error on device cciss/c0d1p1, logical block 166789172
Mar 2 23:13:21 mother kernel: lost page write due to I/O error on cciss/c0d1p1
Mar 2 23:13:21 mother kernel: cciss: cmd f7000b90 has CHECK CONDITION sense key = 0×3
Mar 2 23:13:21 mother kernel: Buffer I/O error on device cciss/c0d1p1, logical block 166793477
Mar 2 23:13:21 mother kernel: lost page write due to I/O error on cciss/c0d1p1
Mar 2 23:13:21 mother kernel: cciss: cmd f7000de0 has CHECK CONDITION sense key = 0×3
Mar 2 23:13:21 mother kernel: lost page write due to I/O error on cciss/c0d1p1
Mar 2 23:13:35 mother kernel: cciss: cmd f7000000 has CHECK CONDITION sense key = 0×3
Mar 2 23:13:55 mother last message repeated 2 times
Mar 2 23:13:55 mother kernel: EXT3-fs error (device cciss/c0d1p1): ext3_find_entry: reading directory #83396003 offset 0
The second crash …
similar problems, this was for me the hint this was NOT the hard disk but rather the controller!
Mar 9 04:32:20 mother kernel: cciss: cmd f7100000 has CHECK CONDITION sense key = 0×3Mar 9 04:32:20 mother kernel: EXT3-fs error (device cciss/c0d2p1): read_inode_bitmap: Cannot read inode bitmap – block_group = 6656, inode_bitmap = 218103809
Mar 9 04:32:20 mother kernel: EXT3-fs error (device cciss/c0d2p1) in ext3_new_inode: IO failure
Mar 9 04:35:37 mother kernel: cciss: cmd f7100000 has CHECK CONDITION sense key = 0×3
Mar 9 04:35:37 mother kernel: EXT3-fs error (device cciss/c0d2p1): read_inode_bitmap: Cannot read inode bitmap – block_group = 6656, inode_bitmap = 218103809
Mar 9 04:35:37 mother kernel: EXT3-fs error (device cciss/c0d2p1) in ext3_new_inode: IO failure
Mar 9 04:37:40 mother kernel: cciss: cmd f7100250 has CHECK CONDITION sense key = 0×3
Mar 9 04:37:40 mother kernel: EXT3-fs error (device cciss/c0d2p1): read_inode_bitmap: Cannot read inode bitmap – block_group = 6656, inode_bitmap = 218103809
Mar 9 04:37:40 mother kernel: EXT3-fs error (device cciss/c0d2p1) in ext3_new_inode: IO failure
Mar 9 04:39:21 mother kernel: cciss: cmd f7100000 has CHECK CONDITION sense key = 0×3
Mar 9 04:40:36 mother kernel: cciss: cmd f7102500 has CHECK CONDITION sense key = 0×3
Related posts:
- Cannot create new quotafile ? How to fix a nasty quotaproblem on Debian Etch. quotacheck:...

update (19 june 2009): Until now the update has been working great! (holding wood). Leaseweb support has been swift and easy and the server has lived long and happy since that famous HP controller firmware update!
I ended up here because someone left a link to a post in my blog regarding LeaseWeb.
Let me tell you – the problems I had with them never ended, and switching servers to LiquidWeb was the best decision I ever made. You lose search engine rankings when your server goes down, I lost clients because my server went down – and sadly, I actually consider myself lucky because I didn’t lose everything. I had to restart my site’s database that was created in 2002 because their servers are useless.
Please – if you have another problem, consider changing. Usually I give people a link with an affiliate link where I make money, but just so you know this is genuine – I’m not.
Until now, knocking on wood, I’ve got no problems with that server at all. I hope their firmware update really aided. My cron-bash-automatic backup process sends a file every 6 hours to be sure. This firmware sure has to give something worth for a server because a crashing server is worth no-server! For now, everything is running smooth .. let’s hope the best…
I must be honest, I’ve had very good customer care at Leaseweb till now. The reaction, even at night, was splendid and taken care of. Last discussion I have been demanding for a parallel solution because a server crashing three times in a row is no reliability at all for a business. I’m system administrator myself and know out of experience things can go ackwards between controllers & disks…
Let’s hope this firmware really fixes everything, so this network will saved from it’s solitude demise without disks on the internet! For now, I’ll put my trust in Leaseweb once more, hoping these problems will get fixed; preferably earlier than too late! Else I’ll be yanking the alarm bell to get the server replaced instantly; since I am paying for “on-line” hosting, not an “off-line dedicated server”. I keep my braces tight and hope this problem will be prevented with the second server ordered through Leaseweb.
check out http://onlinemoneylab.com/another-bad-opinion-about-leaseweb and http://www.eyeonsilicon.co.uk/2008/08/26/the-tale-that-is-my-bad-experience-with-leaseweb/ !