The future of bandwith: securing supercomputer networks

26 June 2012 Dan Goodin Thanks to super-charged networks like the US Department of Energy’s ESnet and the consortium known as Internet2, scientists crunching huge bodies of data finally have 10Gbps pipes at the ready to zap that information to their peers anywhere…

26 June 2012

Dan Goodin

Thanks to super-charged networks like the US Department of Energy’s ESnet and the consortium known as Internet2, scientists crunching huge bodies of data finally have 10Gbps pipes at the ready to zap that information to their peers anywhere in the world. But what happens when firewalls and other security devices torpedo those blazing speeds?

That’s what Joe Breen, assistant director of networking at the University of Utah’s Center for High Performance Computing, asked two years ago as he diagnosed the barriers he found on his organization’s $262,500-per-year Internet2 backbone connection. The network—used to funnel the raw data used in astronomy, high-energy physics, and genomics—boasted a 10Gbps connection, enough bandwidth in theory to share a terabyte’s worth of information in 20 minutes. But there was a problem: “stateful” firewalls—the security appliances administrators use to monitor packets entering and exiting a network and to block those deemed malicious—brought maximum speeds down to just 500Mbps. In fact, it wasn’t uncommon for the network to drop all the way to 200Mbps. The degradation was even worse when transfers used IPv6, the next-generation Internet protocol.

“You’re impacting work at that point,” Breen remembers thinking at the time. “So when you’re trying to transport 200 gigabytes up to a terabyte of data, or even several terabytes of data, you can’t do it. It becomes faster to FedEx the science than it does to transport it over the network, and we’d like to see the network actually used.”

With technologies developed or funded by the National Energy Research Scientific Computing Center, ESnet, the National Science Foundation, and others, the University of Utah set out to find a new security design that wouldn’t put a crimp on bandwidth. Called “Science DMZs,” the architecture puts the routers and storage systems used in data-intensive computing systems into a “demilitarized zone” that is outside the network firewall and beyond the reach of many of the intrusion detection systems (IDSes) protecting the rest of the campus network.

Unconstrained bandwidth

“What we’re trying to do with the Science DMZ concept is formalize the idea of: secure your campus, secure your student systems, secure your dorm networks, everything that you need to run the business of your network or your institution,” said Chris Robb, director of operations and engineering at Internet2, an alternative Internet maintained by a consortium of universities, governmental organizations and private companies. “Lock that down as much as possible, but for the love of God, give your researchers access to unconstrained bandwidth.”

The idea is simple. Move the gear storing and moving data as close as possible to the network edge, preferably into the data center itself. Unplug stateful firewalls and in-line IDSes. And install devices that give detailed information about the rate of data flows traversing the system so any bottlenecks that develop can be diagnosed and fixed quickly.

It may seem counterintuitive at first to run high-performance computing systems outside the firewall. It’s tempting to compare the idea to medieval warfare in which the equipment, archers, and other most-prized assets are kept outside their castle walls—a bad idea. But frequently, the threats facing high-bandwidth systems carrying gigabytes of data concerning the Bolivian tree frog differ dramatically from those facing the point-of-sale terminals that process student credit cards. If a typical enterprise or medium-sized business network is a bundle of drinking straws, science networks are three or four firehoses. The idea of Science DMZs isn’t to ignore security, but to adapt it to an environment that’s free of e-mail, Web servers, and e-commerce applications.

To 10Gbps… and beyond

After rebuilding the University of Utah’s high-performance computing (HPC) network over the past 18 months, Breen said that bandwidth has shown dramatic improvements. The system now achieves overall rates of 10Gbps, with single end-to-end connections regularly reaching 5Gbps. The university is in the process of transitioning to a 100Gbps network, and Breen estimates that lofty goal could be accomplished in the next 18 months.

Indiana University Chief Network Architect Matt Davy has also achieved similar results by following a similar path. He embarked on it more than ten years ago, before Science DMZs were even a part of the engineering vernacular.

The segmented subnetwork that hosts his university’s HPC and high-bandwidth storage systems has its own external connection to Internet2. It has non-stateful firewalls that run mostly on Linux servers, and it also relies on access control lists to block IP addresses or port numbers observed to foster abuse. The system relies on Cisco Systems’ NetFlow analysis tool to spot patterns of attack.

The network, which is able to completely fill its 10Gbps connection, puts a cluster of IDS devices into passive mode, as opposed to the much more common “in-line” mode (when data passes directly through them). A passive IDS can’t monitor every packet, but it digests enough that it can quickly instruct routers to drop connections that are judged to be malicious. More recently, Davy’s team improved the architecture by building custom-designed load balancers that run on the OpenFlow switching specification. Their solution works with an IDS cluster of 16 10Gbps-connected servers.

“We greatly increased our ability to catch that traffic and analyze it,” he said. “If you normally make all your traffic flow through the intrusion detection system, that single box has to scale up to that level, whereas we’re using a passive system that’s offline and then we’re clustering it. So we can have a whole cluster of servers analyzing that data so we can scale it up.”

During a recent proof-of-concept demonstration, he used the architecture to achieve 60Gbps data flows that traveled from Seattle to Indiana University’s Bloomington data center. The university is also in the process of upgrading to a 100Gbps connection.

The University of Utah’s journey began about a decade after that at Indiana University, but using many of the techniques formalized by ESnet and NERSC, it has covered much of the same territory. The segmented HPC network Breen oversees doesn’t use stateful firewalls, either. In their place, engineers have deployed passive IDSes that assume many of the same functions by adopting some cutting-edge technologies.

One of them already in place is known as remotely triggered blackhole (RTBH) routing. Using scripts that query NetFlow data, RTBH identifies malicious IP addresses and blocks them in close to real-time by bypassing them in the routers’ border gateway protocol (BGP) tables. An added benefit: RTBH relies on the same NetFlow-query scripts Breen’s decommissioned firewalls used.



“The end result of doing that is the ability to blackhole bad sources or bad destinations, people that are attacking you from the inside or outside,” Breen said. “You’re able to send them to /dev/null or say, ‘This is bad; flush it down the toilet,’ so to speak.”

The breakthrough of RTBH is that it gives users much of the benefit of stateful firewalls without the traditional penalty to bandwidth. But the system’s analytics still don’t provide as much granularity as Breen would like. That’s why his team is in the process of testing a new IDS originally developed by Vern Paxson while at the Lawrence Berkeley National Laboratory and now maintained by developers at the International Computer Science Institute in Berkeley, California, and the National Center for Supercomputing Applications in Urbana-Champaign, Illinois. It’s called the Bro Security Network Monitor. It keeps detailed records of application-layer state, but unlike the RTBH, Snort-based IDSes, and commercial security packages the HPC system already has in place, its analytics can be mapped to behavioral patterns of end users, so automated security scripts can respond in real-time.

“Now it can find all sorts of interesting things, like people that are very slowly scanning you or people that are very slowly probing you in unique ways,” Breen said. “We’re already finding stuff we do not see with our other assortment of tools.”

The combination of Bro’s behavior-monitoring capabilities and its ability to be deployed on server clusters gives Breen confidence it will be able to scale as the University of Utah’s HPC environment continues to grow.

“We’re able to see things at a more granular level because of the cluster and because of the behavior-based function of Bro,” he said. “It should be able to scale up to 100Gbps easily and beyond. That’s what our expectations and hopes are.”

Rounding out the technologies Breen has recently added is a network performance monitor called perfSONAR, jointly developed by engineers from ESnet, Internet2, the National Research and Education Networks, and the GÉANT2 , a high-bandwidth, academic Internet for European researchers and educators. The tool suite enables groups to actively measure bandwidth and latency within a local site or across national backbones. It also allows local users to troubleshoot hop by hop so they can isolate network bottlenecks.

Coming to a (non-HPC) network near you

HPC environments have unique demands and designs, but that doesn’t mean that the concept of Science DMZs won’t trickle down to smaller networks.

As enterprises expand to more locations around the world, the ability to move large sets of data sets among them becomes critical, Breen said. An example is manufacturers’ embrace of what’s known as “digital” manufacturing, which requires huge 3D files that may be jointly developed by engineers on multiple continents in short periods of time. How will they move the increasingly large data sets from site to site?

“They, or their business partners, may need a section of their network defined for large transfers and out-of-line security tools that scale rapidly,” Breen said. “Scientists at the universities do this type of workflow today.”

More collaborations, both public and private, are moving forward with cloud collaborations, he added. Already, the National Institutes of Health is working with Amazon to move more than 200 terabytes of genomic data into Amazon’s cloud for analysis by scientists, as part of a project known as the 1,000 Genome project.

“When the resulting data sets have to move out of the cloud, they have to move to some local resource that must support fast and unfettered data transfers,” Breen said. “Entities will want to segment a section of their network for moving around big data in a scalable manner.”

The other takeaway for smaller network operators, Indiana University’s Davy said, is the importance of rethinking the approach of building a strong network perimeter at the expense of locking down other devices inside the network.

“There’s still this mentality of the crunchy outside and the soft middle—a lot of perimeter defense,” he explained. “We look much more inside and don’t treat the things inside the network as having this layer of protection somewhere else. All the components themselves need to be secure.”