Friday, April 4, 2008

Cool Technology of the Week

One of the challenges of being a CIO is the "application is slow, can you fix it" phone call. Generally, the network is blamed first, but there are many layers that all need to be examined - desktop, network, server, storage, database, active directory, internet service provider etc. For example, a complaint about email slowness can be caused by a multitude of factors.

We recently worked with our electronic health record infrastructure partner, Concordant, to do an end to end application performance analysis.

The tools they employed were:

WhatsUp Gold for Network and Server Monitoring
Windows Performance Monitor for Server and Client Monitoring
OPNET Ace for End to End Network traffic analysis
Computer Associates eHealth for Network Monitoring

The general approach they used covered three domains. They began by identifying and defining the problems from a user perspective. This helped to identify issues related to system performance versus non-technical issues that amplified the technical issues and affected user perception of performance, e.g. training, improper usage of the application. They used multiple subject matter experts to focus on the different domains to ensure they had the in-depth knowledge to evaluate each of them.

The three investigation domains and key focus areas within each domain were:

Client
End User Observation & Interviews
Client device performance analysis
Device configuration & Log review
Device specification analysis per application vendor recommendations

Network
WAN link utilization
Device performance analysis
Device configuration & Log review
Packet Loss & Latency analysis
Traffic Analysis

Infrastructure
Server and Storage performance analysis
Device configuration & Log review
Service and Process performance analysis
Device specification analysis per application vendor recommendations

The findings from the assessment did not identify a "magic bullet" issue that caused performance issues, but instead identified multiple smaller issues that combined to impact system performance.

In my experience of troubleshooting complex IT systems, I've found that the comprehensive approach outlined above works very well.

If I had to choose one simple approach to determine the cause of application performance issues, I would:

1. Check to see that the desktop, the server, and the database all have their network cards set to Auto, since performance problems are often network card duplex mismatches

2. Install OPNET agents on the client and server. More often than not, OPNET rapidly identifies root causes of application performance issues.

Based on my positive experience with OPNET, including in this particular project, I'm naming OPNET as the cool technology of the week. Now I can respond to the "application is slow" question with an OPNET answer.