Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] VMWare Player question



On 10/22/2011 01:03 AM, Bill Bogstad wrote:
> On Thu, Oct 20, 2011 at 5:23 PM, Richard Pieri <richard.pieri at gmail.com> wrote:
>> On Oct 20, 2011, at 12:52 PM, Bill Bogstad wrote:
>>> Instructions that accessed hardware and other specialized instructions
>>> were either trapped or replaced in line by calls to vmware code.
>> Look up trap-and-emulate, and then look at why the x86 can't do it.  I'll wait.
>>
>> Now read this, specifically section 3.2 "Simple binary translation" and 3.3 "Adaptive binary translation".  The paper is about performance between different virtualization techniques on the same hardware but it covers the relevant point.  I'll wait.
>> http://www.vmware.com/pdf/asplos235_adams.pdf
> Thanks for the correction.   I was aware that they didn't have to do
> full emulation, but didn't ever look at the full details on how things
> worked.  I found this line from the end of section 3.2 interesting:
>
> "By switching guest execution between BT mode and direct execution as
> the guest switches between kernel- and user-mode, we can limit BT
> overheads to kernel code and permit application code to run at native
> speed."
>
> This implies to me that VMware runs user program instructions for some
> guest OSes without emulation/binary translation or any other
> modification for that matter.  I wonder for which OSes VMware actually
> works that way (if any).
>
>>  VMware does nothing especially different.  It simply has the benefit of emulating x86 on x86 so most instructions are just passed through to the real CPU -- the "high-performance" mentioned in the lit
>>  erature.  It's still slower than running natively because every instruction needs to be examined first.
> I don't see in the paper how they handle the possibility of
> self-modifying code.   I wonder if they handle it the same way they
> seem to handle page table manipulation (apparently by protecting
> sensitive memory, running load/store instructions natively, and fixing
> things if a memory trap occurs).
>
>> VMware could have incorporated the 64-bit instructions and addressing into the binary translator.  This would have permitted 64-bit guests on 32-bit hosts.  This would have been horribly slow, as I previously mentioned.
> i.e. Even slower then the "check every instruction and then emulate
> those with issues" as every 64-bit instruction would have to have been
> emulated not just checked.
>
>>> Full emulation of a baroque architecture like the x86 would have been
>>> incredibly slow.  This web page
>> Look at Connectix VirtualPC (prior to Microsoft's acquisition).
> Oh, I know it can be done.  I still have license somewhere for a
> product that did the reverse (Executor by ARDI).  It emulated a 68K as
> well as enough of MacOS 6/7 to run a few Mac apps on PCs of that era.
> I think I ran a student lab full of PowerPC Macs which had Connectix
> on them as well.   This was back when the PowerPC was just coming out
> and Apple had their own emulator/translator for 68K instructions so
> you could run 68K apps on the new PowerPC Macs.   I seem to recall
> that one of the classes that taught assembly language used a Windows
> based 68K emulator on those machines.  It almost made my head explode
> to think about the various levels of emulation/dynamic
> translation/etc. going on there.  The rumors I heard at the time were
> that parts of MacOS on the PowerPC were still written in 68K code and
> were being run via the emulator.  I never knew if that was true or
> not.
I read this paper. Years ago I ran IBM's VM370 with OS/VS1 as the guest
OS. While I don't want to get into specifics, on some machines IBM had
microcode assists for VM370. In some cases the guest OS performed better
under VM370 than native. At a SHARE meeting, AMOCO talked about this as
they ran DOS (mainframe DOS) and they split it into 2 guests, (1)
interactive using IBM's CICS and (2) batch. They reported a much better
throughput on both. I personally was able to get better performance. We
ran OS/VS1 native testing payroll. The next night we reran payroll under
VM370 and got better throughput. Under VM370 you had several memory
options. You could let the guest OS page, or you could give the guest a
full container and let VM370 page. VM370 had a 4K page where OS/VS1 had
a 2K page. Another thing was spooling. We found that allowing VM370 to
own the printers improved performance although you were double spooling,
but unpatched you had less control so it was hard to like up check
numbers for payroll. However we patched VM370 for this purpose. There
were a few more tricks we had, but the bottom line is we generally had
as good throughput under VM370 and we were able to support online users
(programmers).

Running VMWare ESX on an HP DL380 G7 gives us a number of well
performing 64-bit Linux guests.

-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix
PGP key id:3BC1EB90 
PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66  C0AF 7CEA 30FC 3BC1 EB90




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org