Kyle's life in technology

Tuesday, March 07, 2006

WTF: why is clustering & redundancy so hard?

We've had way worse availability of our storage systems since we switched from per-box RAID to a SAN.  And cute little notes from the vendor like, "We've discovered a flaw in our firmware.  If you don't [take your production systems out for two hours] and upgrade it within the next 7 business days, we will no longer support you."  Maybe we'll switch vendors.

We're also using a database clustering technology.  Cool thing though is that when one node goes down, it takes the other one out without fail.  Very consistently.  From a major reputable vendor.  Why?  Got me.  Getting better I think though, upgrades, etc.  Too bad it depends on the aforementioned SAN.

We're using an application cluster from a reputable vendor.  Unfortunately, the tool that manages nodes/instances in the cluster tends to hang a lot.  Which may or may not mean a node is down.  When a node goes down, it is typical for us not to know about it other than seeing our performance fall into the basement.  And of course, when we try to fix, we can't, because the node manager is down.  We upgraded two major versions last year, and it got worse, not better.

Can I just say WTF?????


Post a Comment

<< Home