I’ve been in the IT world long enough to see some stupid stuff. A lot of really stupid stuff. I’ve done phone support, database administration, desktop support, and a bit of programming. IT really isn’t that hard, and some common since will go a long way. So now, I’m going to rant.
Where I work isn’t exactly important. It’s a company that is large enough to have its own “Computer Helpline” and separate support divisions for network, Exchange, desktops, and so on. Of course none of these groups communicate or even work very well together. I’m sure you’ve heard the phrase similar to “one hand doesn’t know what the other is doing.” Yeah, this place is a prime example of that. It’s so bad, that when I was hired in, I wasn’t hired in to any of the IT divisions. No, I along with two other people were hired in because the manager we support had no faith that the people that are supposed to be doing this job could actually do it… and so far, they’ve continued to fail time and time again.
#1 – Microsoft Exchange Server – Microsoft Exchange isn’t the most difficult product to set up. Even with a few thousand users. Our setup is fairly simple. An exchange server, and a Storage Area Network that contains the mailboxes.
On Monday, we started experiencing issues with mail being slow and the connection between Outlook and Exchange disconnecting sporadically. We (users) would have been surprised except that this kind of thing happens all the time. Seriously, at least every other month. Right now, we have about a three hour delay between sending a message and when the recipient will actually receive the message. Performing a search is nearly impossible as Outlook restarts the search every time the connection to Exchange is lost. Viewing your calendar or accepting a meeting request usually results in Outlook crashing and needing to be restarted. Which then takes forever to come back up and load your mailbox from the server.
Yesterday late afternoon, the manager over all of the dysfunctional IT departments posts a message stating that they have identified the issue and will have it resolved by Saturday. Really? What kind of business would it be acceptable to have basically zero email service for an entire week??
It get’s better once you read the manager’s explanation.
“The SAN for the email network has been nearing capacity for some months. Additional hard drives were ordered and received which will double the capacity of the SAN. Unfortunately, replacement of the hard drive system requires about twelve (12) hours due to the need to back up the system data, replace the hard drives, reload the data, and test the system.”
Wait, what? So, they are just going to replace the drives in the SAN rather than adding more drives to it? You’ve known about the issues for months (yes, plural) and yet have failed to do nothing about it?? 12 hours to perform a back up, replace the drives, blah blah blah… does that mean that Exchange profiles aren’t being backuped right now? You know, they make software that will integrate with Exchange and perform real time backups.
#2 – Backups and hard disk space – I mentioned the Exchange issue first because it was the most recent example of failure of our IT staff, but also because it highlighted another issue. Space.
These days storage is cheap. At home between my Mac and my primary PC (which hasn’t been on in a month) I have 5TB of combined space. Once upon a time, I would have been in awe over so much storage. These days, I don’t think anything about it.
A few months ago, we brought two some new servers. These servers were internal to the department I support and not maintained by the IT department (more on that later). The servers consisted of a mirrored 320GB set for the OS and a mirrored 500GB set for Oracle database and application servers. Simple, right? We asked for a location on a central server where we could push database dumps and other files so they could be backed up along with the normal network stuff. When we told them that our servers had a combined space of about 1.6TB they’re jaws literally hit the floor. We tried to tell them that out of everything that needed to be backed up, there was only about 50GB worth of data. The response we received was “we don’t have anything that can back that up.” Really? Why do I feel like I’m living in the IT stone age?
#3 – Network Engineers – Years ago, I had plans on becoming a MCSE / MCSA and quite possibly even a CCNA / CCIE. That hasn’t happened yet, but thankfully our IT department has certified network engineers. Thankfully? Ha, have you been following along?
Those new servers I was talking about a moment ago, we actually tried to play nice and have our servers online and managed by the IT department. We sent them up, gave them the OS license, and expected to be up and running in a day or two. – Myself and the two others that I work with can bring up Windows Server, with Oracle and have it configured hardened and ready to go in less than a day. – Two weeks go by, and we get a call that one of the servers aren’t working. Funny, they worked just fine when we tested them in our office. So I head up there along with one of the guys I work with.
Turn the system on, all the lights come up, keyboard lights blink, no display on the monitor. Try to hit CTRL-ALT-DEL and nothing happens. No lights on the keyboard when we hit num or scroll lock. No, it cannot be this simple, can it? We swap the keyboard and mouse around (it was hooked to a KVM so no color coded ports) and what do you know, the system boots up just fine. Really, two weeks and they couldn’t figure that out?
It doesn’t stop there. Remember, the servers had a 320GB mirrored RAID set and a 500GB mirrored RAID set. Both RAID sets are controlled by individual Adaptec cards. This was all explained to the same network engineer that couldn’t figure out the keyboard/mouse issue. We get a call that the OS is installed, but it doesn’t see the 500GB RAID set in Windows (Server 2003). Excited to see what complicated issue this was, all three of us now head up to take a look at the server.
When you first boot the system it does the normal POST stuff, gives you the options to enter the BIOS, and then your get the configure options for the first RAID card. If you don’t do anything for the first card, you then get options for the second RAID card. Yes, our dear network engineer failed to setup the 2nd RAID set. After doing the network engineer’s job we informed her that the other server was identical. Yes, we would eventually be back to set up that server as well.
The servers were soon removed from their location, and brought back to our office where they remain operational. We were later informed that the they were not use to dealing with “these types of servers” and that they “usually only deal with HP servers.”
They’ve been heard calling our servers “Wal-Mart Servers” which just makes us laugh. Our Wal-Mart server’s have a better uptime and reliability than their centrally managed HP servers. Even when one of the arrays on our primary application server died, we were back up and running in less than 8 hours. Most of that time was simply waiting on the RAID to rebuild. We didn’t wait for a weekend, we did the task during the evening and user impact was minimal. If things went south, we could have easily turned on our identical backup server changed a few URLs and been working like normal before morning.
These are hardly the only experiences I’ve had with the IT group. Sure, there are a handful of people in those groups that actually know a thing or two about what they are doing. However, the ratio is something like 25:1. Odds are not in our favor.
“I have come to expect a whole new level of incompetence.”
- Bryan McDaniel
Other issues that are all to common in my workplace:
- Failure to bring up a test system before performing software upgrades
- Pushing out untested router firmware that disables every location
- Implementing new security policies without checking software requirements
- Conflicting security requirements and procedures
- Failure to think about the future and plan accordingly
- Overloading the knowledgeable people to the point where they quit and go elsewhere
- Useless paperwork that provides no benefit stands in the way of getting tasks accomplished
- Failure to listen to the users
- Lack of communication between IT departments
- Spending more money than needed, because no one wants to work together
- Not having people in roles that they are best qualified for
- Using SAP. This is a monstrous complex beast. It may be good at something, but it sucks at everything we use it for.
That’s it. I’m done with my rant now. Going to go bang my head on the wall for a bit and then write a new program that does something available in SAP but faster and prettier.