Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Linux on Lenovo P70 -- data corruption



On Mon, 4 Sep 2017 15:56:06 -0400, Kent Borg wrote:
> On 09/04/2017 02:10 PM, Robert Krawitz wrote:
>> On Mon, 4 Sep 2017 13:59:48 -0400, Frank DiPrete wrote:
>>> What us the NIC in the laptop ?
>>> I've had this problem before using the open source driver for a network
>>> adapter.
>>> (trying to remember which one)
>> But this appears to also happen over the loopback interface.
>
> If you can reliably get it to mess up, see what reliably never messes up, squeeze between the two.

I reproduced the problem under Windows, with exactly the same
symptoms.  But it was considerably harder, perhaps because ssh/scp
under Cygwin is a lot slower than it is on Linux, and it did appear
that the rate of data transfer affected how frequently it failed.

I did it by copying a ~36 GB file repeatedly from a known good Linux
host to the laptop, while running a load in the background (on Linux,
multiple copies of glxgears with the option to turn off sync, on
Windows, by running prime95 along with an OpenGL demo in the
background).  It took me about 5 tries on Windows to finally get a
failure, but I did, with exactly the same pattern as under Linux.

I had basically tried the suggestions you made here.

> Two hunches:
>
> #1 This has nothing to do with networking, it is a local problem that kernel caching hides when doing local operations; with network operations the kernel can't make the same assumptions.
>
> #2 This is a driver bug. What unique-ish hardware does this machine have? Can you run tests that exclude that hardware?
>
> I would try more experiments along these lines:
>
>   - Remove RAM, so the kernel can't do as much caching.
>
>   - Turn off swap.
>
>   - Create some files on a different (working) machine that have known contents: bigger-than-your-RAM patterned data, and bigger-than-your-RAM data files from urandom (SHA-256 is a way to check random data files)...
>
>   - Transfer them into this machine in different ways: wired network, wireless, builtin interfaces, USB network interfaces, USB flash drive, USB spinning disk drive, USB 3.0, USB <3.0...
>
>   - Transfer them to different places: RAM disk, each SSD, USB flash drive, USB spinning drive, loopback volume, different kinds of formatted filesystems on these, right back out the network without ever storing locally, into a RAM buffer of a trivial C program you write for the purpose.
>
>   - Try an obsolete kernel, in a graphical UI, in a text console...
>
>   - Try repeated SHA hashes of stored files (stored in different places) see whether they are consistent.
>
> Keep careful notes, look for patterns, hope it becomes clear before you give up.
>
> Report back!
>
> -kb, the Kent whose old Lenovo liked to freeze up when doing lots of video blitting near the edge of the display, until a new version of Debian fixed it.

-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org