Total Cost of Ownership in the System Administration World

In the System Administrations world the total cost of ownership (TCO) is the price you pay for equipment from before its purchase to after its demise. There are always hidden costs that you can not anticipate, but through diligent evaluation and selection you can limit these unforeseen costs and minimize your TCO.

My main focus for this post is going to be vendor selection, specifically vendor plurality.

Unlike Google of Facebook, most Systems Administrators buy their server equipment from the following vendors: HP, IBM, Dell, or through a re-seller of Supermicro. Each vendor offers different product lines and while one product line may be more reliable than another, as a whole let’s assume they are all as reliable within some standard deviation. Why then would you ever buy equipment from more than one vendor? Price. IT and System Administration are sunk costs, they are the cost of doing business (ignoring IaaS,PaaS, & SaaS). A cost conscious company will want to get the best price on hardware.

Why You Should Support More Than One Vendor

Depending on your computing needs you could be spending millions on hardware, one companies margin could be tens of thousands of dollars. It is a good practice to always get competing quotes from different vendors (not just different re-sellers). That way you keep the vendors honest (or at least more honest).

You can save a significant amount of money this way, which you should obviously make sure your bosses know.

But what hidden costs arise?

Why You Should Support the Least Amount of Vendors as Possible

For every vendor you support it requires your time as a system administrator. To name a few considerations:

  • different RAID controllers with different syntaxes
  • different OOBM (imm,idrac,ilo) with various nuances
  • support from vendors can differ widely in process and competence
  • automated hardware monitoring is different. While most vendors support IPMI, there are always little differences
  • all new vendor equipment needs evaluation and a familiarization

What am I to do?

There is no optimal solution, only what works for you. I recommend limiting the number of supported vendors to the absolute minimum while still keeping a 2nd around to make sure the other vendor’s pricing is honest and competitive.

Each vendor adds to a server’s TCO. Therefore limiting supported vendors, minimizes every servers’ TCO.

What I learned from Hurricane Sandy as a System Administrator

I just experienced my first real Business Continuity Plan (BCP) event. In the past my company has simulated BCP events to test our response and capabilities. The simulated events were always minor, like a fiber cut. These are a couple observations and lessons I learned being involved in support as a Systems Administrator.

1) Your one stupid decision will be brought to light. Two major data center providers had major issues because of one poor decision. They both had located their generator fuel pumps in the basement. Generators in NYC/Manhattan are typically located on the roof due to the cost of real estate. Pumps are used to transport the fuel from the street level up 15-50+ stories. Basements are the first places to flood, thereby making your fuel pumps useless, eventually leading to your generators running out of fuel.
Articles to read:
Flooded NY data centers survive Sandy on generator power, fuel deliveries | Ars Technica

New York Data Centers Battle Back from Storm Damage » Data Center Knowledge

Hurricane Sandy Topples New York Data Center, Gawker, Gizmodo |

2. Man power is important but tough to guarantee. You need the right people in the right places to keep things running or to fix things that break. But those same people have families and responsibilities. Key people may be busy dealing with more important matters. Business is important but life is paramount.

3. Make sure everyone has remote access and uses it periodically. After Sandy, employees’ needed to work from home because our offices still did not have power. That morning 10% of the company opened tickets requesting help with remote access. This could have been avoided if we had encouraged employees to work from home periodically, thereby ensuring remote access.

4. Have more than two of important infrastructure or have more than one backup. A BCP event causes havok and in that chaos it will turn your redundant services into single points of failure (SPOF). You might have had two VPN servers before you lost power to an office, but afterwards you have a significant SPOF.

5. Be diligent in your BCP testing and preperation. Test those generators, test them again, and test them on real load a third time.  You might have designed a system to have redundancy, but you need to be sure they built what you designed. I know many fiber paths that were designed to be redundant, but collapse together at some points (usually the last X feet). Be strict and follow up.

Set Outlet State on APC PDU

For various reasons I needed a script to change the state on an APC PDU. You need to make sure to set your private string and enable SNMP. For some reason snmpset doesn’t work when someone is logged into the management.

PRIVATE='<fill this out>'
IP='<fill this out>'
SNMPCMD="/usr/bin/snmpset -v 2c"
PROG=$(basename ${0})
usage() {
        cat <<EOF
        "${PROG} [-h] -i <ip address> -o <outlet number> -s <on|off|reboot>"
        -h: usage|help
        -i: ip address of apc powerstrip
        -o: outlet # from 1-8
        -s: desired state (on,off,reboot)
        exit 1
while getopts hi:o:s: f
    case "${f}" in
        h) usage
        i) IP=${OPTARG}
        o) OUTLET=${OPTARG}
#test for mandatory cmdline args
if [ -z "${IP}" ] || [ -z "${OUTLET}" ] || [ -z "${STRING_STATE}" ] ; then
        echo "missing necessary commandline arg"
# 1 = on, 2 = off,  3 = reboot
case "${STRING_STATE}" in
        [Oo][Nn]) INT_STATE=1
        [Oo][Ff][Ff]) INT_STATE=2
        reboot | REBOOT) INT_STATE=3
        *) echo "${STRING_STATE} not a valid option"; exit;
if [ "${OUTLET}" -lt "1" ] || [ "${OUTLET}" -gt "8" ]; then
        echo "${OUTLET} is out of bounds, must be 1<outlet<8"; exit;
"${SNMPCMD}" "-c ${PRIVATE}" "${IP}" "${OID}${OUTLET}" integer "${INT_STATE}"

I am on the Board of Directors for the League of Professional System Administrators

This past week I started my two year tenure on the Board of Directors for the League of Professional System Administrators (LOPSA). My reasons behind running for the board can be found in an earlier post.

I volunteered to become the new Local Chapter Committee Chairman. In this role I will have the opportunity to shape and direct LOPSA’s continuing effort to expand and support local chapters throughout the country. We currently have eight local chapters with a couple more in the up-and-coming phase. I have assembled a great committee with a number of volunteers who are gracious enough to donate time and effort.  This will take up most of my time.

I have also been put in charge of planning and coordinating LOPSA-Live, a periodic IRC members forum. During a LOPSA-Live the Board makes announcements and updates the community on LOPSA’s goals and initiatives. This is usually followed with a Q&A. I am working toward having these every other month. This is one of the main ways the Board communicates with LOPSA’s membership.

I believe we have a great set of people on the Board who will work to strengthen LOPSA and further its goals.

My online learning experience with Coursera

I have been working in the financial industry for six  years now and the closest thing I took to a finance class was economics. I also have not taken a class or course since college ended just as long ago. When someone mentioned that Coursera was offering an Introduction to Finance class on our internal company forums, it was an opportunity to both learn something about finance and try out online learning.

Coursera is “a social entrepreneurship company that partners with the top universities in the world to offer courses online for anyone to take, for free.” They offer a wide range of topics from a large number of schools.

Introduction to Finance is taught by Gautam Kaul, a professor at the University of Michigan. It is ten weeks long and purports to, “This course will introduce you to frameworks and tools to measure value; both for corporate and personal assets. It will also help you in decision-making, again at both the corporate and personal levels.”

What I like

The way the course is structured allows me to accomplish the necessary work within my schedule. There is usually about 1-2 hours of video lectures divided into roughly 20 minutes chunks, followed by a assignment that takes another 2 hours.  This allows me to fit it into my schedule. I do some at lunch time, then more after the kids go to bed. Professor Kaul is enganging, considering he is lecturing to a video camera. He does manage to say some pretty wacky stuff and has his own facebook page.

What I dislike

The weekly assignments contain questions that are considerably more difficult than the examples done in class. I would expect some increased degree of difficulty. One anonymous student who had decided to drop the course explained,

“The difficulty of some of the problems far exceeded any of the examples Gautam provided in the online lectures. At least one of the examples provided during the lectures should have been at the same level of difficulty of these difficult homework exercises. (An analogy: The lecture taught me how to doggie-paddle the length of the pool. But I felt like at least two of the homework exercises were asking me to backstroke the length of a lake. And I wish we had been shown some backstroke techniques during the lectures.)”

Unfortunately because of the online nature, the course doesn’t give you an answer sheet after the assignment is due. If you get a question wrong, you aren’t given the right answer or even an explanation of what you did incorrectly. The Professor has made it clear that a considerable amount of work goes into creating the questions for each week. They can’t use questions from textbooks because of copyright. They have to come up with new original questions. If they gave out an answer sheet, they would not be able to use the questions again in a later iteration of the course. But I don’t think that is a valid excuse. I find it a major issue to give out assignment but no solutions.


Unfortunately the problems do not end there for this new experiment of free online education. There have been reports of widespread plagiarism in other courses. Regardless, I think that what Coursera and its ilk offer is awesome and should be encouraged. Today’s colleges are scarily overpriced and only help out those who are mature enough to take advantage of the situation. Too many colleges churn out 6 year party animals with useless degrees. They focus is on certification, not education.

I will be taking a long break after this course is complete, but have already browsed Coursera’s catalog to see what I could take next.

Pertinent Articles

The New Public Ivies

Why Would Someone Cheat on a Free Online Class That Doesn’t Count Toward Anything?

Class Central: A complete list of free online courses offered by Stanford’s Coursera, MIT and Harvard led edX (MITx + Harvardx + BerkeleyX), and Udacity

Edgespace Colocation

I made my first webpage in 6th grade. It was horrible and had tons of animated gifs and blinking text, but I wrote all the HTML myself in notepad and hosted it on AOL. Since then I have written, hosted, and been responsible for countless websites. Since college I have been helping out various groups with their webhosting needs. Edgespace Colocation is an effort to standardize and codify my various hosting responsibilities.

Right now I host or help with a number of non-profit sites. I donate my time, expertise, and resources.

If you know of a business of non-profit who needs help I would be grateful if you could refer Edgespace.

The Dreaded Work Slump

I’ve been an athlete all my life and to an athlete the idea of a slump is accepted. No one told me that it happens in the non-athletic world as well.  In an athletic slump a basketball player can’t hit free throws, a baseball player’s batting average drops, you are performing below your average. In the business world this translates to malaise and general dissatisfaction with your daily work. Personally this means regular tasks drag on and seem more difficult, project stall, and my attention wanders. just did a short piece on getting out of a slump @ How Can I Overcome a Work Slump?

I wanted to add a couple things that have helped me get out of slumps.

Change your scenery. If you usually work out of the office and have the opportunity to work from home, take it. Work out of your home office for a week. Changing your surroundings can help you refocus and keep your attention from straying.

Do something physical. A lot of us sit at our computers all day without much in terms of breaks or physical activity. Use this chance to get back into the habit of working out. Take a 20 minute break from work to walk around or hit the gym. Cardiovascular workouts like running, biking, or erging give your brain and body a chance to focus on something physically demanding.

As a systems administrator my job does entail a decent amount of monotony. I have to sometimes do the same thing multiple times to different servers to fix similar issues. This can get boring, but it can’t be ignored. Thus it follows that professional system administrators are uniquely susceptible to slumps. Realizing this and identifying that you are in fact in a slump are the first steps to getting out of one.

Ignoring the fact that this does happen won’t keep it from happening. Willful ignorance isn’t a solution.

Ubuntu 12.04 LTS, LVM, KVM, and Fun

Occasionally my job affords me the opportunity to do some really cool stuff. For a myriad of reasons I am building myself a Linux KVM host using Ubuntu 12.04 LTS. I hurried through the install prompts and didn’t pay enough attention to the partitioning. Therefore I ended up with 96GB of swap space and a 40GB root partition.

My goal was to have no swap, a small root, and a large partition for the KVM guests. I disabled swap and destroyed the swap LVM partition.  I then rebooted into the always handy Rescue Is Possible (RIP) using PXE and resized the root partition and root logical volume, crossed my fingers and rebooted. Voila! It all came back up perfectly configured for my uses.

Now onto creating KVM guests…