Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Perhaps of interest -s.r. ---------- Forwarded message ---------- Date: Fri, 9 Dec 2011 09:52:45 -0500 From: Dave Farber <dave at farber.net> To: ip <ip at listbox.com> Subject: [IP] BufferBloat: What's Wrong with the Internet? <http://queue.acm.org/detail.cfm?id=2076798> BufferBloat: What's Wrong with the Internet? A discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys Internet delays are now as common as they are maddening. That means they end up affecting system engineers just like all the rest of us. And when system engineers get irritated, they often go looking for what's at the root of the problem. Take Jim Gettys, for example. His slow home network had repeatedly proved to be the source of considerable frustration, so he set out to determine what was wrong, and he even coined a term for what he found: bufferbloat. Bufferbloat refers to excess buffering inside a network, resulting in high latency and reduced throughput. Some buffering is needed; it provides space to queue packets waiting for transmission, thus minimizing data loss. In the past, the high cost of memory kept buffers fairly small, so they filled quickly and packets began to drop shortly after the link became saturated, signaling to the communications protocol the presence of congestion and thus the need for compensating adjustments. Because memory now is significantly cheaper than it used to be, buffering has been overdone in all manner of network devices, without consideration for the consequences. Manufacturers have reflexively acted to prevent any and all packet loss and, by doing so, have inadvertently defeated a critical TCP congestion-detection mechanism, with the result being worsened congestion and increased latency. Now that the problem has been diagnosed, people are working feverishly to fix it. This case study considers the extent of the bufferbloat problem and its potential implications. Working to steer the discussion is Vint Cerf, popularly known as one of the "fathers of the Internet." As the co-designer of the TCP/IP protocols, Cerf did indeed play a key role in developing the Internet and related packet data and security technologies while at Stanford University from 1972-1976 and with DARPA (the U.S. Department of Defense's Advanced Research Projects Agency) from 1976-1982. He currently serves as Google's chief Internet evangelist. Van Jacobson, presently a research fellow at PARC where he leads the networking research program, is also central to this discussion. Considered one of the world's leading authorities on TCP, he helped develop the RED (random early detection) queue management algorithm that has been widely credited with allowing the Internet to grow and meet ever-increasing throughput demands over the years. Prior to joining PARC, Jacobson was a chief scientist at Cisco Systems and later at Packet Design Networks. Also participating is Nick Weaver, a researcher at ICSI (International Computer Science Institute in Berkeley where he was part of the team that developed Netalyzr, a tool that analyzes network connections and has been instrumental in detecting bufferbloat and measuring its impact across the Internet. Rounding out the discussion is Gettys, who edited the HTTP/1.1 specification and was a co-designer of the X Window System. He now is a member of the technical staff at Alcatel-Lucent Bell Labs, where he focuses on systems design and engineering, protocol design, and free software development. VINT CERF What caused you to do the analysis that led you to conclude you had problems with your home network related to buffers in intermediate devices? JIM GETTYs I was running some bandwidth tests on an old IPsec (Internet Protocol Security)-like device that belongs to Bell Labs and observed latencies of as much as 1.2 seconds whenever the device was running as fast it could. That didn't entirely surprise me, but then I happened to run the same test without the IPsec box in the way, and I ended up with the same result. With 1.2-second latency accompanied by horrible jitter, my home network obviously needed some help. The rule of thumb for good telephony is 150-millisecond latency at most, and my network had nearly 10 times that much. My first thought was that the problem might relate to a feature called PowerBoost that comes as part of my home service from Comcast. That led me to drop a note to Rich Woundy at Comcast since his name appears on the Internet draft for that feature. He lives in the next town over from me, so we arranged to get together for lunch. During that lunch, Rich provided me with several pieces to the puzzle. To begin with, he suggested my problem might have to do with the excessive buffering in a device in my path rather than with the PowerBoost feature. He also pointed out that ICSI has a great tool called Netalyzr that helps you figure out what your buffering is. Also, much to my surprise, he said a number of ISPs had told him they were running without any queue management whatsoever?that is, they weren't running RED on any of their routers or edge devices. The very next day I managed to get a wonderful trace. I had been having trouble reproducing the problem I'd experienced earlier, but since I was using a more recent cable modem this time around, I had a trivial one-line command for reproducing the problem. The resulting SmokePing plot clearly showed the severity of the problem, and that motivated me to take a packet-capture so I could see just what in the world was going on. About a week later, I saw basically the same signature on a Verizon FiOS [a bundled home communications service operating over a fiber network], and that surprised me. Anyway, it became clear that what I'd been experiencing on my home network wasn't unique to cable modems. VC I assume you weren't the only one making noises about these sorts of problems? JG I'd been hearing similar complaints all along. In fact, Dave Reed [Internet network architect, now with SAP Labs] about a year earlier had reported problems in 3G networks that also appeared to be caused by excessive buffering. He was ultimately ignored when he publicized his concerns, but I've since been able to confirm that Dave was right. In his case, he would see daily high latency without much packet loss during the day, and then the latency would fall back down again at night as flow on the overall network dropped. Dave Clark [Internet network architect, currently senior research scientist at MIT] had noticed that the DSLAM (Digital Subscriber Line Access Multiplexer) his micro-ISP runs had way too much buffering?leading to as much as six seconds of latency. And this is something he'd observed six years earlier, which is what had led him to warn Rich Woundy of the possible problem. VC Perhaps there's an important life lesson here suggesting you may not want to simply throw away outliers on the grounds they're probably just flukes. When outliers show up, it might be a good idea to find out why. NICK WEAVER But when testing for this particular problem, the outliers actually prove to be the good networks. JG Without Netalyzr, I never would have known for sure whether what I'd been observing was anything more than just a couple of flukes. After seeing the Netalyzr data, however, I could see how widespread the problem really was. I can still remember the day when I first saw the data for the Internet as a whole plotted out. That was rather horrifying. NW It's actually a pretty straightforward test that allowed us to capture all that data. In putting together Netalyzr at ICSI, we started out with a design philosophy that one anonymous commenter later captured very nicely: "This brings new meaning to the phrase, 'Bang it with a wrench.'" Basically, we just set out to hammer on everything?except we weren't interested in doing a bandwidth test since there were plenty of good ones out there already. I remembered, however, that Nick McKeown and others had ranted about how amazingly over-buffered home networks often proved to be, so buffering seemed like a natural thing to test for. It turns out that would also give us a bandwidth test as a side consequence. Thus we developed a pretty simple test. Over just a 10-second period, it sends a packet and then waits for a packet to return. Then each time it receives a packet back, it sends two more. It either sends large packets and receives small ones in return, or it sends small packets and receives large ones. During the last five seconds of that 10-second period, it just measures the latency under load in comparison to the latency without load. It's essentially just a simple way to stress out the network. We didn't get around to analyzing all that data until a few months after releasing the tool. Then what we saw were these very pretty graphs that gave us reasonable confidence that a huge fraction of the networks we had just tested could not possibly exhibit good behavior under load. That was a very scary discovery. JG Horrifying, I think. NW It wasn't quite so horrifying for me because I'd already effectively taken steps to mitigate the problem on my own network?namely, I'd paid for a higher class of service on my home network specifically to get better behavior under load. You can do that because the buffers are all sized in bytes. So if you pay for the 4x bandwidth service, your buffer will be 4x smaller in terms of delay, and that ends up acting as a boundary on how bad things can get under load. And I've taken steps to reduce other potential problems ? by installing multiple access points in my home, for example. JG The problem is that the next generation of equipment will come out with even larger buffers. That's part of why I was having trouble initially reproducing this problem with DOCSIS (Data over Cable Service Interface Specification) 3.0 modems. That is, because I had even more extreme buffering than I'd had before, it took even longer to fill up the buffer and get it to start misbehaving. VC What I think you've just outlined is a measure of goodness that later proved to be exactly the wrong thing to do. At first, the equipment manufacturers believed that adding more buffers would be a good thing, primarily to handle increased traffic volumes and provide for fair access to capacity. Of course, it has also become increasingly difficult to buy a chip that doesn't have a lot of memory in it. NW Also, to the degree that people have been testing at all, they've been testing for latency or bandwidth. The problem we're discussing is one of latency under load, so if you test only quiescent latency, you won't notice it; and if you test only bandwidth, you'll never notice it. Unless you're testing specifically for behavior under load, you won't even be aware this is happening. VAN JACOBSON I think there's a deeper problem. We know the cause of these big queues is data piling up wherever there's a fast-to-slow transition in the network. That generally happens either going from the Internet core out to a subscriber (as with YouTube videos) or from the subscriber back into the core, where a fast home network such as a 54-megabit wireless hits a slow 1- to 2-megabit Internet connection. [snip] Dewayne-Net RSS Feed: <http://www.warpspeed.com/wordpress> ------------------------------------------- Archives: https://www.listbox.com/member/archive/247/=now RSS Feed: https://www.listbox.com/member/archive/rss/247/5577888-0953570e Modify Your Subscription: https://www.listbox.com/member/?member_id=5577888&id_secret=5577888-2f0d3142 Unsubscribe Now: https://www.listbox.com/unsubscribe/?member_id=5577888&id_secret=5577888-ceb3b021&post_id=20111209095251:75F2767E-2275-11E1-A9D2-82C13E0D22EC Powered by Listbox: http://www.listbox.com
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |