WAN: Chipping away at long-distance data

20 August 2012 Philip Hunter Wide-area network optimisers use smart ways to gain more value from WAN links through new ICs. The dedicated wide-area network WAN optimisation device is alive but not particularly well, judging by the emerging strategies of…

20 August 2012

Philip Hunter

Wide-area network optimisers use smart ways to gain more value from WAN links through new ICs.

The dedicated wide-area network WAN optimisation device is alive but not particularly well, judging by the emerging strategies of leading vendors. They are all busy leapfrogging over each other to be seen as pioneers of an emerging communications software age, where all the critical processes involved in speeding-up transmission of data over the WAN between applications for enterprises run on generic processors such as the Intel i7.

This may not seem a good start for a review of integrated circuit (IC) developments in WAN optimisation; but the link between respective optimisation processes and the underlying silicon found in such devices is compelling, especially as the demand for these products is growing. It is just that, increasingly, it is not what has been regarded as traditional dedicated silicon, either an ASIC (Application Specific IC), FPGA (Field Programmable Gate Array), or System on Chip (SoC).

The distinction between generic processors and dedicated ASICs has been blurring ever since the first floating point accelerator was incorporated in early integrated circuits in the mid to late 1970s, and now so-called generic chips used in PCs, tablets, and smartphones, for instance, incorporate an array of graphics, video, maths, security, and network processing functions that at some time in their history were performed by bespoke chips. This multi-functionality, combined with huge increases in raw power, are factors that have turned WAN optimisation vendors away from dedicated silicon, as was indicated by Qing Li, chief scientist at one of the field’s main vendors, Blue Coat Systems.

“These generic Intel processors are almost becoming giant SoCs (System on Chips) themselves,” says Li. The term SoC is a slight misnomer, as no system is ‘purely a chip’, and there will always be some subsidiary or additional components such as flash memory or a hard-drive; but it is useful in that it highlights the degree of convergence between generic and dedicated silicon.

First came the ASIC, which initially comprised purely custom logic designed to minimise surface area and therefore cost, while reducing energy consumption and maximising performance, because the signals have less distance to travel in executing the task than on a generic processor. However, the ASIC could not be reused, and so development cost had to be weighed against the likely volume of production and time to obsolescence. The FPGA then evolved as a compromise by allowing some field upgradeability, by reconfiguring connections between logic components of the chip, relevant for tasks where it is possible to foresee in broad terms how a standard or process will evolve. The SoC goes further by integrating multiple components of a complete self-contained system (such as a mobile phone), all on a single chip substrate, with the same advantages of an FPGA of low manufacturing cost and energy efficiency, combined with some programmability.

One trend uniting ASICs, FPGAs and SoCs is increased use of common logic components slightly confusingly known as IP cores (‘confusingly’, because the ‘IP’ refers to Intellectual Property rather than the Internet Protocol, although they are often both). This means that, for a small sacrifice in size and performance, an ASIC can be made substantially from common components that reduce its development cost. The distinction then between, say, a dedicated SoC and a generic processor comes down to the remaining existence of some custom logic in the former, while some of the SoC’s IP cores may also be more specialised than could be justified for the latter.

When it comes to WAN optimisation there is still a role for dedicated silicon, and as before this tends to be at the cutting-edge of speed, with 10Gb and then 40Gb Ethernet transmission being current examples. The difference now is that dedicated hardware no longer tends to be required for the cutting edge in terms of functionality, which is why there has been a flight to software by vendors in the field, which to an extent now prefer to call their products WAN accelerators rather than WAN optimisers.

But WAN optimisation vendors – as we shall continue calling them for consistency – still need to be aware of the hardware architecture at a higher level, given that processor performance is now being stepped up by integrating multiple cores into a chip, rather than increasing the clock rate, which would consume more power and generate greater amounts of heat. This makes little difference for typical desktop productivity applications where the multiple cores make it easy to support multitasking, with processes assigned to different cores.

Some tasks though require the combined power of multiple cores, and this includes WAN acceleration across gigabit networks, but to do this the task has to be split up so that it can be run in parallel. If one task has to finish before the next one begins, then the whole process can only exploit a single core. So, as Blue Coat System’s Li pointed out, WAN optimisation is reviving the field of parallel processing, in which an application is divided into multiple components that can be executed independently of each other.

“Sometimes you have to completely redesign the software stack to take advantage of multi cores,” explains Li. “So there is renewing interest in old parallel processing algorithms.” Originally such algorithms were designed to exploit multiple chips within large-scale architectures as an alternative to single processor supercomputers for high-performance computing applications such as weather forecasting and seismic exploration. Now for WAN acceleration the same principles are being applied with suitable modifications for multiple cores within a single chip (see box out left for a glimpse at how this has been done for Blue Coat’s own ProxySG WAN accelerator).

Data reduction and virtualisation

The ability to exploit multicore architectures to increase WAN performance is one area where vendors are seeking to differentiate themselves at present. Another major area that has grown in importance recently is data reduction and de-duplication. This is a big field where compression plays a huge role, particularly for video, where the bit-rate required to deliver full high-definition pictures to a large screen TV is only 0.5 per cent of that consumed by the original pictures coming off the camera.

For enterprises the big drivers for data reduction are virtualisation, remote disaster recovery, and cloud computing, which are accelerating the rate of growth of network traffic and creating much larger pipes, requiring more sophisticated and higher performance acceleration techniques. WAN optimisation as a generic technology was first conceived in the 1990s, largely to deal with relatively low-volume branch office traffic as far as enterprises were concerned, but now has to address large links, increasing the need for, and benefits of, substantial data reduction.

“Robust data reduction is where the separation between different vendors starts,” says Donato Buccella, CTO at Certeon, one of the leading WAN acceleration vendors, which like the others has recently made the transition from a hardware to software focus.

One of the new twists here brought by the growth in virtualisation and cloud computing is network de-duplication, to avoid as far as possible sending the same data more than once over a given wide-area link. This involves storing the same content at each end of a link, trading disk-drive capacity for bandwidth, and checking data before transmission to ensure it has not been sent. If it has been sent, then the system at the receiving end is informed that it already has the required content and can therefore replay it locally.

This does not necessarily involve dedicated processing, but to handle large volumes of data at wire speed increasingly now calls for solid state (aka flash) storage rather than hard-disk drives. “The way WAN’optimisation uses disks is akin to very’high transaction rate databases,” notes’Buccella. “For high-end implementations we recommend SSD configurations.” In practice this is usually Intel SSDs, according to Buccella, because they are better suited for high-end applications. “Most other vendors specialise in small devices for laptops.”

Another big trend in WAN optimisation that has driven the change towards a software-based approach is virtualisation, which by definition requires products that are ‘hardware agnostic’ as the aim is to separate applications from the underlying platform. “We are seeing a strong demand for virtual WAN optimisation solutions that are easy to deploy on various hardware devices,” reports Jeff Aaron, vice president of marketing at Silver Peak, another market in this technology. “Our Virtual Acceleration Open Architecture (VXOA) was designed to be completely hardware-independent, enabling it to run on any hypervisor for maximum flexibility.”

As with the Blue Coat Proxy AG, Silver Peak has still had to be aware that the hardware will often comprise multicore chips, and may also have multiple processors. “VXOA [does have] the ability to leverage multiple underlying multicore processors,” Aaron says.

Latency issues

One other big change in the WAN optimisation scene is the growing importance of latency. This has long been critical for broadcasters, pay-TV operators, and increasingly telcos (telecommunications companies), for delivering video when there is usually no point retransmitting a dropped packet because the video frame it pertains to has already been played. But with data centres becoming distributed across multiple sites and reliant on fast network communications for real-time or near-real-time processes apart from video, latency has risen up the agenda in WAN optimisation. Some of the functions involved in latency reduction are now done in software, including various TCP acceleration methods like ‘read-aheads’ and ‘write behinds’.

These are used for protocols such as CIFS (Common Internet File System), formerly known as SMB (Server Message Block) dating back to the early days of PCs in the 1980s, where again memory is traded for bandwidth. In this case the aim is to anticipate data that may be required later, and transmit it at a time when there is plenty of bandwidth, or in batches to reduce requests and acknowledgements.

One latency tool that does still use dedicated silicon in some cases is Forward Error Correction (FEC). The immediate impact of FEC is to slow traffic down by inserting extra bits to buffer against signal loss; but the effect is to reduce the number of IP packets that need retransmitting because of errors, and therefore to cut overall latency. FEC is still a research topic as algorithms continue to be refined; but, whatever technique is used, dedicated hardware can be required for high-speed networking. FEC requires wire-speed operation, because it operates on the whole packet stream in real time.

More generally, dedicated hardware will continue to be required at the cutting-edge of speed, as Certeon’s Buccella agrees. “The role of silicon is to trailblaze for the next iteration,” says Buccella. “This means 10Gb, and also last mile interfaces tend to be silicon based, but the application layer will be software-based.”

Over time, meanwhile, it looks like the distinction between dedicated and general hardware will be at the level of the board that goes into a rack rather than at the silicon level, as generic cores and components continue to proliferate. One example of this trend is the recently-launched Hitachi WAN Accelerator.