[Discuss] Off-Topic [IP] BufferBloat: What's Wrong with the Internet? (fwd)
Stephen Ronan
sronan at panix.com
Mon Dec 12 16:13:37 EST 2011
Perhaps of interest -s.r.
---------- Forwarded message ----------
Date: Fri, 9 Dec 2011 09:52:45 -0500
From: Dave Farber <dave at farber.net>
To: ip <ip at listbox.com>
Subject: [IP] BufferBloat: What's Wrong with the Internet?
<http://queue.acm.org/detail.cfm?id=2076798>
BufferBloat: What's Wrong with the Internet?
A discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys
Internet delays are now as common as they are maddening. That
means they end up affecting system engineers just like all the
rest of us. And when system engineers get irritated, they often
go looking for what's at the root of the problem. Take Jim
Gettys, for example. His slow home network had repeatedly proved
to be the source of considerable frustration, so he set out to
determine what was wrong, and he even coined a term for what he
found: bufferbloat.
Bufferbloat refers to excess buffering inside a network,
resulting in high latency and reduced throughput. Some buffering
is needed; it provides space to queue packets waiting for
transmission, thus minimizing data loss. In the past, the high
cost of memory kept buffers fairly small, so they filled quickly
and packets began to drop shortly after the link became
saturated, signaling to the communications protocol the presence
of congestion and thus the need for compensating adjustments.
Because memory now is significantly cheaper than it used to be,
buffering has been overdone in all manner of network devices,
without consideration for the consequences. Manufacturers have
reflexively acted to prevent any and all packet loss and, by
doing so, have inadvertently defeated a critical TCP
congestion-detection mechanism, with the result being worsened
congestion and increased latency.
Now that the problem has been diagnosed, people are working
feverishly to fix it. This case study considers the extent of the
bufferbloat problem and its potential implications. Working to
steer the discussion is Vint Cerf, popularly known as one of the
"fathers of the Internet." As the co-designer of the TCP/IP
protocols, Cerf did indeed play a key role in developing the
Internet and related packet data and security technologies while
at Stanford University from 1972-1976 and with DARPA (the U.S.
Department of Defense's Advanced Research Projects Agency) from
1976-1982. He currently serves as Google's chief Internet
evangelist.
Van Jacobson, presently a research fellow at PARC where he leads
the networking research program, is also central to this
discussion. Considered one of the world's leading authorities on
TCP, he helped develop the RED (random early detection) queue
management algorithm that has been widely credited with allowing
the Internet to grow and meet ever-increasing throughput demands
over the years. Prior to joining PARC, Jacobson was a chief
scientist at Cisco Systems and later at Packet Design Networks.
Also participating is Nick Weaver, a researcher at ICSI
(International Computer Science Institute in Berkeley where he
was part of the team that developed Netalyzr, a tool that
analyzes network connections and has been instrumental in
detecting bufferbloat and measuring its impact across the
Internet.
Rounding out the discussion is Gettys, who edited the HTTP/1.1
specification and was a co-designer of the X Window System. He
now is a member of the technical staff at Alcatel-Lucent Bell
Labs, where he focuses on systems design and engineering,
protocol design, and free software development.
VINT CERF What caused you to do the analysis that led you to
conclude you had problems with your home network related to
buffers in intermediate devices?
JIM GETTYs I was running some bandwidth tests on an old IPsec
(Internet Protocol Security)-like device that belongs to Bell
Labs and observed latencies of as much as 1.2 seconds whenever
the device was running as fast it could. That didn't entirely
surprise me, but then I happened to run the same test without the
IPsec box in the way, and I ended up with the same result. With
1.2-second latency accompanied by horrible jitter, my home
network obviously needed some help. The rule of thumb for good
telephony is 150-millisecond latency at most, and my network had
nearly 10 times that much.
My first thought was that the problem might relate to a feature
called PowerBoost that comes as part of my home service from
Comcast. That led me to drop a note to Rich Woundy at Comcast
since his name appears on the Internet draft for that feature. He
lives in the next town over from me, so we arranged to get
together for lunch. During that lunch, Rich provided me with
several pieces to the puzzle. To begin with, he suggested my
problem might have to do with the excessive buffering in a device
in my path rather than with the PowerBoost feature. He also
pointed out that ICSI has a great tool called Netalyzr that helps
you figure out what your buffering is. Also, much to my surprise,
he said a number of ISPs had told him they were running without
any queue management whatsoeverthat is, they weren't running RED
on any of their routers or edge devices.
The very next day I managed to get a wonderful trace. I had been
having trouble reproducing the problem I'd experienced earlier,
but since I was using a more recent cable modem this time around,
I had a trivial one-line command for reproducing the problem. The
resulting SmokePing plot clearly showed the severity of the
problem, and that motivated me to take a packet-capture so I
could see just what in the world was going on. About a week
later, I saw basically the same signature on a Verizon FiOS [a
bundled home communications service operating over a fiber
network], and that surprised me. Anyway, it became clear that
what I'd been experiencing on my home network wasn't unique to
cable modems.
VC I assume you weren't the only one making noises about these
sorts of problems?
JG I'd been hearing similar complaints all along. In fact, Dave
Reed [Internet network architect, now with SAP Labs] about a year
earlier had reported problems in 3G networks that also appeared
to be caused by excessive buffering. He was ultimately ignored
when he publicized his concerns, but I've since been able to
confirm that Dave was right. In his case, he would see daily high
latency without much packet loss during the day, and then the
latency would fall back down again at night as flow on the
overall network dropped.
Dave Clark [Internet network architect, currently senior research
scientist at MIT] had noticed that the DSLAM (Digital Subscriber
Line Access Multiplexer) his micro-ISP runs had way too much
bufferingleading to as much as six seconds of latency. And this
is something he'd observed six years earlier, which is what had
led him to warn Rich Woundy of the possible problem.
VC Perhaps there's an important life lesson here suggesting you
may not want to simply throw away outliers on the grounds they're
probably just flukes. When outliers show up, it might be a good
idea to find out why.
NICK WEAVER But when testing for this particular problem, the
outliers actually prove to be the good networks.
JG Without Netalyzr, I never would have known for sure whether
what I'd been observing was anything more than just a couple of
flukes. After seeing the Netalyzr data, however, I could see how
widespread the problem really was. I can still remember the day
when I first saw the data for the Internet as a whole plotted
out. That was rather horrifying.
NW It's actually a pretty straightforward test that allowed us to
capture all that data. In putting together Netalyzr at ICSI, we
started out with a design philosophy that one anonymous commenter
later captured very nicely: "This brings new meaning to the
phrase, 'Bang it with a wrench.'" Basically, we just set out to
hammer on everythingexcept we weren't interested in doing a
bandwidth test since there were plenty of good ones out there
already.
I remembered, however, that Nick McKeown and others had ranted
about how amazingly over-buffered home networks often proved to
be, so buffering seemed like a natural thing to test for. It
turns out that would also give us a bandwidth test as a side
consequence. Thus we developed a pretty simple test. Over just a
10-second period, it sends a packet and then waits for a packet
to return. Then each time it receives a packet back, it sends two
more. It either sends large packets and receives small ones in
return, or it sends small packets and receives large ones. During
the last five seconds of that 10-second period, it just measures
the latency under load in comparison to the latency without load.
It's essentially just a simple way to stress out the network.
We didn't get around to analyzing all that data until a few
months after releasing the tool. Then what we saw were these very
pretty graphs that gave us reasonable confidence that a huge
fraction of the networks we had just tested could not possibly
exhibit good behavior under load. That was a very scary
discovery.
JG Horrifying, I think.
NW It wasn't quite so horrifying for me because I'd already
effectively taken steps to mitigate the problem on my own
networknamely, I'd paid for a higher class of service on my home
network specifically to get better behavior under load. You can
do that because the buffers are all sized in bytes. So if you pay
for the 4x bandwidth service, your buffer will be 4x smaller in
terms of delay, and that ends up acting as a boundary on how bad
things can get under load. And I've taken steps to reduce other
potential problems by installing multiple access points in my
home, for example.
JG The problem is that the next generation of equipment will come
out with even larger buffers. That's part of why I was having
trouble initially reproducing this problem with DOCSIS (Data over
Cable Service Interface Specification) 3.0 modems. That is,
because I had even more extreme buffering than I'd had before, it
took even longer to fill up the buffer and get it to start
misbehaving.
VC What I think you've just outlined is a measure of goodness
that later proved to be exactly the wrong thing to do. At first,
the equipment manufacturers believed that adding more buffers
would be a good thing, primarily to handle increased traffic
volumes and provide for fair access to capacity. Of course, it
has also become increasingly difficult to buy a chip that doesn't
have a lot of memory in it.
NW Also, to the degree that people have been testing at all,
they've been testing for latency or bandwidth. The problem we're
discussing is one of latency under load, so if you test only
quiescent latency, you won't notice it; and if you test only
bandwidth, you'll never notice it. Unless you're testing
specifically for behavior under load, you won't even be aware
this is happening.
VAN JACOBSON I think there's a deeper problem. We know the cause
of these big queues is data piling up wherever there's a
fast-to-slow transition in the network. That generally happens
either going from the Internet core out to a subscriber (as with
YouTube videos) or from the subscriber back into the core, where
a fast home network such as a 54-megabit wireless hits a slow 1-
to 2-megabit Internet connection.
[snip]
Dewayne-Net RSS Feed: <http://www.warpspeed.com/wordpress>
-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/5577888-0953570e
Modify Your Subscription: https://www.listbox.com/member/?member_id=5577888&id_secret=5577888-2f0d3142
Unsubscribe Now: https://www.listbox.com/unsubscribe/?member_id=5577888&id_secret=5577888-ceb3b021&post_id=20111209095251:75F2767E-2275-11E1-A9D2-82C13E0D22EC
Powered by Listbox: http://www.listbox.com
More information about the Discuss
mailing list