Any chief information officer concerned with computer system reliability and application availability in the competitive financial markets is thinking one of two things: either make failure impossible, or make recovery very fast. Thanks to virtualisation, the trend is shifting toward fast recovery.
Virtualisation is an architectural strategy that makes the particular operating system dependencies of software applications less special, if not trivial. In also making hardware resources more fungible, virtual machines help key revenue-generating applications stay up.
Steve Randich, CIO of Citigroup, says: “I’m leaning to the recovery school of thought. There’s an incredible by-product when you focus on recovery: the hardware and software you buy becomes less important.”
This is true because robustness no longer needs to be engineered, at great expense, into the application or its operating system. Robustness hardly even figures.
The CIO’s goal is near-perfect uptime, also known in vendor parlance as “high-availability”. Basically, this means systems stay on; but in the financial market environment what constitutes “up” is also related to capacity performance. In an order-driven business where opportunities are captured and lost in fractions of a second, demands on compute power are dynamic and critical. Particularly during market trading hours, volume, queuing, capacity and latency (delay) issues define “up”. If a system is on, but running slowly, it is, in effect, “down”.
Good uptime on Wall Street today means “five nines” – 99.999 per cent up and working to specification. Mr Randich says: “It was four nines. Now, five nines: that’s a couple of minutes down per year. There’s room for maybe one failure, and not a significant one.”
One equation expressing the recovery-oriented conception of uptime suggests that five nines reliability only permits a half dozen reboots (of two minutes or so each) per year.
Virtualisation makes this feasible. Before, if an application quit or a server or workstation failed, recovery time could take from a few minutes to half a day or more. Now, virtualisation makes counting nines seem passé.
Virtual machine software can be installed on many different hardware types, from an old junk whitebox PC to a high-end Solaris workstation or a blade server, running the gamut of common operating systems as “host”, from Windows to Linux to Unix.
Many individual virtual machines (VMs) can be set up on any single host bed, which is most typically a server. Each VM can run, in turn, a choice of operating systems and applications – each physically discrete and unable to compromise the other local virtual machine’s use of the same hardware resources.
The beauty of virtual machines is their ease of provisioning and they can even be moved around, backed up, or diverted to adjacent or remote systems.
VMWare is a leading provider of virtualisation software. Xen, another name, is an open source project being commercialised by the company, XenSource. Its high-availability products are helping Wall Street firms manage systems and reduce recovery times. Most hardware vendors, including IBM, HP, Dell, Fujitsu Siemens, are working with VMWare and Xen on a wide range of solutions; XenExpress, an entry-level VM product, can be downloaded free of charge.
VMWare’s marketing director, Bogomil Balkansky, explains: “In a non-catastrophic case of a single server failing, we can automatically restart a new virtual machine on other servers already running somewhere. The time is essentially a one or two-minute reboot of the identical operating system and application configuration into the new virtual machine.”
Mr Balkansky also indicates that VMWare’s Distribution Resource Scheduler (DRS) constantly monitors capacity utilisation and can configure new VMs on spare resources automatically if, for example, orders or web traffic should back up or an application becomes bogged down.
Near-perfect uptime can be achieved with less expensive commodity systems – such as VMWare or Xen running on a Linux host – as well as fancy ones.
One company taking virtualisation to the extreme is Egenera. Vern Brownell, the company’s founder, CEO and head strategist, started the company in 2000 to address the requirements he had faced as CIO of Goldman Sachs. “Our idea was to reduce complexity even further.”
Egenera’s Processing Area Network (Pan) concept goes beyond virtualising the resources on single pieces of hardware: it runs virtual machines across banks of processor and memory units and also turns separate storage systems and networks – no matter where they are located – into virtual resources that can be managed and allocated by dragging and dropping icons in a remote management console.
Separating processor and memory resources from disc storage and networking resources, Egenera leverages Storage Area Network (San) or Network Attached Storage (Nas) from other vendors – whatever the customer is using. This makes enterprise-wide system development, provisioning, disaster recovery and fail-over of large and even small applications easy and automatic.
Mr Brownell confirms that some of the bulge bracket Wall Street firms, where high-availability is important, manage their large global trading systems using Egenera architecture.
Yet virtualisation helps small operations, too. There are cases where smaller hedge funds and private equity firms are mixing and matching operating systems to run even their more commonplace Unix, Linux or Windows applications on a consistent base.
A security expert who requested anonymity described one Wall Street shop that is running old Windows applications using the Windows operating system as a “wrapper” around a single application, all running virtually on very cheap and stable Linux servers.
“When the application quits – as it does sometimes,” he says, “recovery time is only as long as it takes a trader to kill the window on their desktop and open another.”
Even frequent failure is not a problem because recovery time becomes so short. In quite a few cases, this is more cost-effective than re-writing old applications to a more stable OS environment. Rainmakers do not worry anymore about the blue screen of death. Serious uptime is but a window and a click away.

TECHNOLOGY