[Discuss] FPU differences based on different chips.

Jerry Feldman gaf at blu.org
Sat Sep 1 07:59:35 EDT 2012


I don't know the exact details since the guy who ran the tests is in
Toronto. We have a tool that is a simulation generator. And yes, it does
use a random number generator, but I am assuming it is a known seed so
they can obtain predictable results.
One of my suggestions to him is to run it on 3 different VMs in Boston.
The initial test was on a separate cluster. The other 2 VMs I suggested
are on a cluster I have access to. By running on 3 different VMs in
Boston hopefully he will get consistent results. I'll follow up with him
on Tuesday. The system he is running is a simulation engine that we use
to generate scenarios. I'm pretty sure that the system he ran on the
physical machine in Toronto is using the same binary as the system he
ran here in Boston. I'm sure he was running this in test mode so he can
get predictable results. This is not the only simulation he ran. I'm not
sure how many, but it was only 1 simulation that had different results.
Most of our products are fully self contained where all the libraries
are included in the package, so the local installation should not affect
the product. But, even with packages libraries, such as glibc, the
system calls are handled by the OS, and some instructions are handled by
VMWare. Since the Toronto data center is going to move also, they are
getting some large IBM VMWare servers, so I might want him to run the
simulation on a VM in Toronto also. Several years ago I had a similar
issue with porting to an IA64 machine. This was when our product was
32-bits, so a lot of other issues were involved.

On 09/01/2012 01:08 AM, Bill Bogstad wrote:
>
>
> On Fri, Aug 31, 2012 at 4:18 PM, Jerry Feldman <gaf at blu.org
> <mailto:gaf at blu.org>> wrote:
>
>     We've got a situation where on a single monte-carlo simulation,
>     there is a difference between a result using a Xeon E5530 2.4G and
>     a E5570 2.93G. In gleaning out some details, the E5530 is a VMware
>     ESX where I think the E5570is a physical machine.  I've suggested
>     that the guy run another test on a system on another VMware server
>     (with an E7 - 8870).  In this product the engineer is testing,
>     there are many different simulations, but only one is showing a
>     difference (about 5 or 6 decimal places. Additionally, there is a
>     random number generator involved. We've also got a few different
>     versions of RHEL (5.2, 5.4, and 5.8).
>
>
> It's not clear from your email if the same simulation has been run
> more then once on the same machine and if you
> saw differences in that case.
>
> And the fact that a random number generator is involved is an
> incredibly huge red flag.   Is there a way to run the simulations with
> a pseudo-random number generator with a known initial seed?   If only
> for testing purposes (like this), it is often helpful when something
> like this comes up.
>


-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix
PGP key id:3BC1EB90 
PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66  C0AF 7CEA 30FC 3BC1 EB90




More information about the Discuss mailing list