Does storage have to be a headache?

One of the biggest challenges we face at work is managing storage. While we don’t run the day-to-day operations of, say, the EMC gear, we are responsible for engineering and maintaining some very high traffic and mission critical storage at the host level. Due to the nature of our Matrix management structure, this means that we have to be experts on SAN, filesystem, OS, and even said EMC gear. Since we also provide storage for databases and “generic” network filesystems, the team has to be aware of the limitations and requirements of both of those. Throw in an extreme sensitivity to cost, and you have a real engineering challenge.

On the the network file side, we’re currently using NFS, warts and all. With high traffic and high use – like, say, SVN or CVS with active development and testing on an NFS mount – things get hairy. File locking seems to be an issue with high numbers of users and/or traffic on Linux as opposed to Sun. Keeping very large filesystems across multiple sites in sync is also a lot of work, even with the best Veritas has to offer.

The filesystems and their hosting servers require 5 9s of availability. Another wrench – high availability usually means a cluster solution. Most cluster solutions are not easy to manage. As you add more levels of complexity to a system, you have the paradox of more moving parts making your HA solution more delicate.

So, what to do? Right now, I am thinking of using some new technologies – and some not so new – and move away from the “We’ve always done it this way”. We need to create a new framework, and then use the technologies to lay upon that framework. I also want it to be architecturally simple.
Read More »