REDMOND, Wash., May 3, 2004 — What’s so remarkable about the reliability and availability of InDIMENSIONS’s Web sites is that nobody remarks about those subjects anymore.
The music-industry clients of InDIMENSIONS, a Toronto-based business development company, “are focused on their brand’s image both online and offline,” says Colin Bowern, executive vice president for technology. “When you have an artist’s management call you up and ask you why their site is down, it’s pretty embarrassing.”
After enduring intermittent Web server outages and generally unstable performance with a succession of Linux-based hosting providers in recent years, InDIMENSIONS last fall switched to a hosting partner with a platform built on the Microsoft Windows Server 2003 operating system and other Microsoft technologies. Since then, “Web-site uptime has been a complete non-issue for our clients,” Bowern says.
That’s sweet music not only to InDIMENSIONS, but also to Mario Garzia and his colleagues on the Microsoft Windows Reliability Team. The team is responsible for ensuring high reliability across the Windows operating system as well as assisting various Microsoft product development teams in making their technologies more reliable.
For IT professionals, computing systems are considered reliable when they are predictable, require minimal maintenance and are continuously available — essentially, run with minimal interruptions so users can access the resources they need in a timely manner. The team’s work on Microsoft Windows Server 2003 has helped the operating system earn praise from companies like InDIMENSIONS for its high reliability, stable performance and virtually uninterrupted availability.
“The team contributes directly to the reliability of Windows and also acts as a catalyst for reliability improvements carried out by others in the Windows product group,” Garzia says of the roughly 50 developers, product managers and testers who comprise the Windows Reliability Team. “Our efforts focus on improving the products during their development life cycle as well as after they are shipped to customers.”
In the case of Windows Server 2003, he says, much of the team’s work has included collecting reliability data from customers’ server systems and surveying IT professionals within those customer organizations to identify issues affecting reliability. Team members then analyze the data and decide which problems are the biggest priorities, evaluate whether they stem from the operating system or another product, and look for ways to address them through new features or architectural changes.
In some cases, Garzia’s team can help supply a timely remedy. In other instances, the nature of the reliability issue dictates a longer-range approach. “We always look for opportunities to improve the current version of the product,” he says. “In addition, we work toward more comprehensive solutions, such as new features or changes to the architecture, to include in the next version of the operating system.”
Throughout the pre-launch phases of Windows Server 2003, and during the year since Microsoft released the product to general availability in April 2003, the Windows Reliability Team contributed to several projects aimed at minimizing server downtime due to either planned maintenance or an unplanned event such as an outage.
Tracking Events
Measuring the reliability and availability of an organization’s servers, and being able to quickly troubleshoot issues, is a linchpin of IT administrators’ jobs. Recognizing this need, the Windows Reliability Team developed the Shutdown Event Tracker feature for Windows Server 2003, as well as a complementary product called Microsoft Reliability Analysis Service (MRAS), which provides comprehensive reliability monitoring and reporting capabilities for IT professionals.
Customers can opt to enable the Shutdown Event Tracker to respond whenever a server running Windows Server 2003 is shut down and restarted, whether for planned or unplanned reasons. The tool prompts a system administrator to describe what happened in the course the event. “Over time, this record can help IT pros better understand where their maintenance efforts are going and identify opportunities to improve availability,” says Garzia.
MRAS – which can be deployed to collect and analyze event log data from Microsoft Windows 2000 and Windows Server 2003 – interacts with the latter’s improved instrumentation features as well as the Shutdown Event Tracker to generate even more comprehensive breakdowns of the reasons behind events that affect server availability. In addition to providing this information for internal analysis, customers can choose to allow MRAS to feed reliability data back to Microsoft to help the Windows Reliability Team zero in on solutions to reliability and availability issues. The current version of MRAS is available to selected customers that are sharing data with Microsoft for product improvement purposes. It will be available to a broader customer base toward the end of this year
During each phase of the Windows Server 2003 development process, as part of the product shipment criteria, the team evaluated the technology as it ran in a production environment that simulated the mix of servers and the volumes of traffic that are present in typical customer deployments. For example, the management team of the Microsoft.com Web site contributed to the evaluation by running a pre-general-availability version of Windows Server 2003 on its production servers for several months. Throughout this process, monitoring tools measured the reliability of the servers and provided continual feedback to Garzia’s team regarding the times when a server had to be shut down, the reasons why the shutdowns occurred, and the actions required to restore Windows Server 2003 to normal operation.
“Feeding this data into the continuous improvement of pre-release versions of Windows Server 2003 helped ensure that the product was truly production-ready for customers by the time it shipped,” he says.
Windows Server 2003 customers attest to its strengths in reliability, availability and overall performance.
Keeping an Online Business Afloat
For Rentvillas.com, an online property-booking service that helps visitors locate and rent vacation homes in western and southern Europe, the reliability and productivity gains from running Windows Server 2003 and related Microsoft technologies have kept the business afloat amid downturns in travel over the past two years. Before the company relaunched its Web presence in mid-2002 with applications built using the Microsoft Visual Studio .NET development system and the Microsoft .NET Framework — which facilitates the use of XML Web services — the Rentvillas.com site periodically endured outages of up to a full day. Depending on the season, that cost up to US$20,000 per day in lost revenue as well as strained relations with customers and property management partners.
Those problems disappeared as soon as Rentvillas.com deployed its current architecture, in which two Windows Server 2003 servers with Internet Information Server 6.0 support the Web site applications and connect to a Microsoft SQL Server database. Ken Pina, the company’s chief technology officer, says the network load-balancing capability of Windows Server 2003 allows his team to deploy new site functionality, as well as do routine maintenance on the servers without ever shutting down the Web site. While updates to the site sometimes require Rentvillas.com to stop processing customer requests online, these planned interruptions occur perhaps once every three months, never last more than an hour and no longer cause a complete outage of the site.
With fewer reboots and 99.9 percent overall uptime for the company’s Web presence -which includes a European partner’s Web site that runs on the same Microsoft technology base — Pina and his team have time to create dedicated sites for additional partners as well as create new functionality that allows Rentvillas.com travel advisers to handle more requests per hour. “We’re able to focus more on our people and less on our machines, which leads to better offerings for our customers.”
Key Issue: Web-site Uptime
At InDIMENSIONS, Bowern says he is relieved to be free of the intermittent Web site outages that plagued the company in its former Linux-based environment.
“Uptime was a major issue: The servers would go down under high loads, and in general we noticed a lot of fluctuations,” he says. “We’re a small company dealing with entertainment brands that have a big reputation in the market, so we have to continually deliver high-quality service in order to retain their trust.” When that goal became too difficult to achieve in the Linux environment, InDIMENSIONS started looking for a new provider last year and chose Interland Inc., an Atlanta-based company that specializes in Web hosting services for small and medium-sized businesses.
Since the move to Interland and its BlueHALO shared hosted platform in September 2003, InDIMENSIONS has seen 99.95-percent uptime for the Web servers running on Windows Server 2003. “That’s the best availability that we’ve ever had,” says Bowern, and the only interruptions have been for patches or other planned maintenance. “The speed of the reboots makes them virtually unnoticeable to our user communities,” he says.
The hours that Bowern and his team no longer spend troubleshooting issues with server availability — plus the weeks of work required to switch hosting providers in the past — are now dedicated to creating new and improved Web site features for InDIMENSIONS clients. “For us, availability and reliability are even more critical than introducing new functionality,” he says. “Windows Server 2003 and Interland have proved that they can deliver that uptime.”
30 Countries, 99.99-percent Uptime
The value of server reliability and availability is no less crucial for Enterasys Networks, which provides Secure Networks solutions for enterprise customers worldwide. With offices in more than 30 countries, Enterasys relies heavily on having its e-mail system, enterprise data centers and other network services available continuously in all time zones.
In migrating its Microsoft Windows NT 4.0 and Windows 2000 Server-based infrastructure — which included Active Directory servers, Exchange 2000 servers, ISA Server, SQL Server 2000 and numerous other systems — to Windows Server 2003, Enterasys has averaged more than 99.99-percent uptime as well as other server performance benefits.
“Network security and reliability is the key to our business, and I don’t think we’ve ever had an unplanned shutdown that was related to Windows Server 2003,” says Rich Casselberry, head of IT system administration for Enterasys. Casselberry adds that planned server maintenance is much faster with the new operating system, and he has noticed that patches and upgrades require reboots less often than before. “Also, through improved scalability and consolidation, we’ve reduced our server environment from 700 to 310 boxes, so patches are not only faster but there are less of them to do,” says Casselberry. “Because we’re doing less maintenance work, our staff is handling at least 45 percent more projects for the company than we were a year ago.”
All of which encourages Microsoft’s Garzia and the rest of the Windows Reliability group. Meanwhile, they’re continuing to develop improvements that will be incorporated into the Windows Server 2003 Service Pack 1 release due later this year, as well as for the next version of the operating system, code-named “Longhorn.”
“Building the greatest possible reliability into our products is really part of everyone’s job at Microsoft,” says Garzia. “My team is just at the core of that effort, and works to ensure we don’t miss any opportunities.”