Q&A: Testing the Reliability of Microsoft Windows Server 2003

REDMOND, Wash. — April 6, 2005 — Customers need hard, comprehensive data to help them make strategic IT decisions. Microsoft’s “Get the Facts” campaign focuses on highlighting factual, third-party information and evidence that can aid companies when making decisions about their IT solutions.

To help customers answer common questions about the differences in reliability between Microsoft Windows and Linux, Microsoft recently commissioned VertiTest, the independent testing and analysis division of Lionbridge (Nasdaq: “LIOX”), to compare the reliability of Microsoft Windows Server 2003 to Red Hat Enterprise Linux AS 3.0. Results of the test show that:

  • Windows Server 2003 is more effective at troubleshooting.

  • Windows Server 2003 is easier to configure and maintain.

The complete test report can be viewed at: http://www.veritest.com/clients/reports/microsoft

To learn details of how the test was conducted and to better understand the results, PressPass spoke with Katrina Teague, Vice President, Solutions for VeriTest.

PressPass: Can you tell us about VeriTest?

VeriTest: VeriTest provides independent testing solutions that enable our clients to maximize revenue opportunities and reduce costs. We provide our clients with solutions that address their challenges and goals around key issues including interoperability, performance, quality assurance, and certification.



Katrina Teague, Vice President, VeriTest

We are an independent, third-party organization. We conduct tests fairly and deliver unbiased results. Our fair testing practices are publicly available at http://www.veritest.com/about/fair_testing_statement.asp. Ethical concerns are of great importance to us. Clients turn to us for our testing expertise, objectivity and our ability to provide validation and testing free of the constraints or bias that can compromise in-house testing.

The study we’re talking about today is an example of competitive analysis, which, like performance analysis and some of our other services, is performed in our labs. We perform work for our clients on-site at one of their locations or in one of our 10 labs across the U.S., Europe, and Asia. This study was conducted in our test lab in Research Triangle Park, NC.

PressPass: What kind of conclusions can you draw from your tests about the reliability of Windows Server 2003 versus Red Hat Linux AS 3.0?

VeriTest: Reliability in this context was taking an environment that was in a known unreliable and un-robust state and documenting the processes that were followed to improve that environment. During this, we measured the impact in terms of time on task and end-user service loss.

The report includes a lot of specific detailed analysis. In this test, the general conclusion is that the Windows Server 2003 environment allowed the participants to get more work done in less elapsed time than the comparable Red Hat Enterprise Linux AS 3.0 environment.

PressPass: Who are your clients?

VeriTest: Our clients include software and hardware manufacturers as well as the end users deploying these products in an enterprise environment.

PressPass: You do a significant amount of commissioned research. Can you explain what that means?

VeriTest: When a client contracts with us to conduct testing, we consider it commissioned work. At the beginning of every report we publish — in the very first paragraph — we state who commissioned the study and identify the sponsoring vendor. Disclosure is a very important aspect of the way we work. We’ll say, “Company XYZ commissioned VeriTest to perform this set of services” and we’ll describe the services. The report will go into depth as to what testing actually occurred and will describe our methodology in sufficient detail so that other parties can recreate our tests and confirm our results.

It’s very important to have an open methodology that can be understood by the people who are reading the report. You need to understand the context in which the results were achieved. Our reports are extensive, like the one for the Windows-Linux comparison that runs to 60 pages. We don’t just produce a marketing blurb that says, “A is better than B.” We provide in-depth methodology describing the test procedures and test metrics along with the results and an analysis of the results. The sponsoring party has the ownership of the final test report and decides how to disseminate the results publicly.

PressPass: How did you develop the methodology for the Window-Linux test?

VeriTest: The idea was to simulate the activities of an IT administrator in a typical medium-sized business of about 200 seats. We designed a test environment that simulated the equipment and software for a real-life, medium-size business and configured the environment in what we call a “failure-prone state,” representing a neglected environment that didn’t have any redundancy set up, didn’t have any patches, didn’t have any data-access security configured. Then we brought in administrators to go through typical tasks of improving the reliability and robustness of the IT environment while they were supporting a simulated user population and dealing with problems and issues that come up under normal conditions.

The environment was built in both a Windows Server 2003 flavor and a Red Hat Enterprise Linux AS 3.0 flavor. We brought in experienced Windows administrators to work on the Windows environment and experienced Linux administrators to work on the Linux environment and to deal with the types of issues that come up in each of the respective platforms. We wanted to show a slice of life of an administrator working in the environment of each platform and to identify strengths and weaknesses of each platform.

PressPass: How was the test conducted?

VeriTest: The test had two components. First was what we called “proactive tasks,” which were background activities primarily involved with improving the robustness of the test environment. This would be on the order of performing system backups, installing patches, implementing data access security, implementing redundant services, configuring devices and setting up remote access. We designed those tasks at a high level that really didn’t have specific requirements for either Windows or Linux. They were generic tasks that a typical administrator on any platform would be faced with. Every administrator in the test implemented the same background activities.

We also set up what we called “reactive events,” which were problems that occurred in the test environment during the time that the administrator was working on it. Again, these were defined in such a way as to be generic problems that weren’t specific to one operating system or another. They were also typical problems that an administrator would see in day-to-day activity.

As the administrators were performing the proactive tasks, we had a test proctor who had remote access to the test environment introduce what we called “trouble tickets.” The proctor went into the test environment and specifically broke things: Caused a printer to stop working. Caused a particular system service to stop working. Deleted particular files. After these activities were performed, the proctor sent an instant message “trouble ticket” to the administrator simulating a member of the user population saying, for example, “Oh, I’ve accidentally deleted a file, can you restore it for me?”

As we sent a trouble ticket, the administrator acknowledged that he or she had received it and that became the highest priority activity for them to work on. They would switch from doing whatever proactive tasks they were working on at the moment to executing the reactive event and diagnosing and troubleshooting the problem and attempting to fix it. Once they resolved the problem, they could move back to doing the proactive tasks that they had been working on before. So the test simulated the life of an administrator where you’re trying to get some projects done but in the meantime you have to deal with issues and problems from your user community.

PressPass: How could you be sure that the test didn’t favor one platform or the other?

VeriTest: We tested the test. First we did some initial testing of the methodology with internal staff from VeriTest. Then, before the actual test began, we brought in Windows administrators and Linux administrators and did a dry run of the test methodology to make sure that the administrators on both platforms were able to execute the activities. We got their feedback to see if the environment was configured in a reasonable way for the specific platform and to see if the components of the methodology were reasonable based on their experience. The answer from the dry runs was, “Yes, it is reasonable and fair.” So based on that feedback we moved forward with the production runs of the test.

PressPass: How did you choose the IT administrators who ran the test environments?

VeriTest: We needed 18 Windows administrators and 18 Linux administrators. We had a structured process for selecting these folks. We went out through job posting services and recruiting services to get resumes with an administrative aspect to them. We looked for folks who had been hands-on administrators for either a Windows or a Linux environment for at least a couple of years.

We looked at the resumes and validated that the applicants were what we considered to be an IT generalist. We weren’t looking for people who were directory-services experts or messaging-application experts. We wanted people who had been doing generalized IT administration in the respective platforms.

We weeded through the resumes based on the IT-generalist criteria, and then performed a phone screen with each participant, asking them a set of standardized questions to more deeply probe their experience in areas of administration such as e-mail, DNS, DHCP, file management, printer management and security. People who successfully passed this screening process were then administered a written test with a set of standardized questions that were broken up into what we considered easy, medium and difficult ranges.

Folks who missed too many easy questions were disqualified. And folks who correctly answered too many difficult questions were also disqualified, because, again, we were searching for applicants in the middle, generalists range. Then, based on the three steps of the screening process, we chose participants with equal skill sets on both sides. We brought in some advanced and some less experienced participants on each side, but the majority of the participants were of a solid intermediate skill level.

PressPass: How did you run the actual test?

VeriTest: Each administrator worked by himself or herself. We had multiple test environments that were identical, allowing us to test multiple administrators per week. Administrators came in and were given an orientation where they were provided with information defining the goals and processes to follow during the test. Then we did some hands-on work with them individually. We issued some sample tasks and sample trouble tickets to make sure that they followed the processes to implement those properly.

Following the orientation, each administrator spent about 26 hours, spread out over four days, doing the actual test. At the end of the process, we administered exit interviews to go over the tasks that they had completed and to gather their insights and feedback on the test environment and the actual events that they had executed over the last four days.

For reproducibility, there was a prescribed schedule of events that is defined in the test report. We issued the same events at the same time to every participant. Participants typically had three or four reactive events to deal with per day, spread evenly between the morning and the afternoon sessions.

PressPass: How did you gather test data?

VeriTest: We collected data from a number of sources. The participants maintained online journals documenting feedback and their processes. Instant-message logs were saved between the participants and the test proctor that documented back-and-forth questions and issues associated with dealing with problems from the simulated user community. We also had a set of probing scripts that were running on the proctor machine, testing the availability of various services on the servers within each administrator’s environment. So we could see when particular services went down and when they came back up again. We used that data along with the journal information and the instant message logs and the exit interview information to pull together results indicating time spent on task, processes followed and particular issues that the participants ran into.

Related Posts