Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On 2000-11-26 at 13:49 -0500, Derek D. Martin wrote: > Interesting... I wouldn't have guessed this strategy. Which I suppose > is at least partly why I'm not a kernel hacker! :) Copy-on-write is somewhat notorious as it has been patented a number of times by a number of different parties. See, by way of example, IBM's US Patent 4,742,450, which covers a technique which was probably in common use for 15 years or so before the patented method was supposedly invented. Precisely because copy-on-write is such a clever idea, although simultaneously discovered by hundreds if not thousands of competent hardware and software designers at least a decade before anyone first thought to patent it, those who independently discover it tend to be so impressed with their own brilliance that a few dozen of them have trotted right off to their patent lawyers. > But I was thinking about your comments as pertains to the original > poster's question about whether or not the OS keeps copies of dead > applications' pages in memory to speed up future loads... > > From a user's perspective, most processes are started by their login > shell (or a subshell under various circumstances, such as when > starting X). The shell forks, creates a "copy" of itself which shares > the same text and data pages, but then execs a different program. At > that point, I would expect that the data pages of that process become > independent of the original shell's data pages (except perhaps if it's > another shell), since the two programs have nothing to do with > eachother. Is that correct? Child processes have a relationship which exists at a much higher level than memory paging. From the point of view of the application running in userland, its segments reside at some sort of address. In some architectures, these may be arbitrary conventions, such as the text segment always being numbered 1, the data segment always being numbered 2, the stack segment always being numbered 3, and so on. Generally, these segment numbers correspond to some segment register which is maintained by the CPU hardware, and a large part of a task context consist in somehow mapping this set of standardized numbers into some sort of global list of segments maintained from the perspective of kernel mode. On an Intel system, for example, the CPU has "segment registers" which have historically been named CS, DS, ES, and SS (for "Code Segment," "Data Segment," "Extra Segment," and "Stack Segment," respectively) since the 8086 processor generation. When protected mode was introduced with the 80286, the segment registers came to have a slightly different meaning and the numbers contained in them became "selectors." The 80386 generation added the FS and GS registers which could be used for optimizations, but other than this the architecture is pretty much the same. Modifying the contents of a segment register is an operation subject to privilege, since a selector is really an index into some sort of "descriptor table" which is maintained by the operating system to map these selectors into some globally significant view of memory. The segment described by each entry in the descriptor table can be either read-only or read-write for the process. For reasons of efficiency, the hardware itself is responsible (on an Intel system) for doing all of these translations and checking to see if an access is allowed. In order to do this, the CPU has a set of descriptor table registers which are loaded with the global address of the appropriate descriptor table in memory in order to manage context changes between processes. The end result of this translation process into a globally significant address produces, however, only a virtual address. The operating system is free to map this virtual address into anywhere in physical memory it can find space, or even to swap the data out to disk and mark the descriptor as "not present." When physical memory is shared by multiple instances of the same program as a result of a fork or otherwise, it happens at this lower level so that different descriptor entries (and possibly even different local descriptor tables) are created immediately, even if they simply provide different virtual addresses for the same physical memory. To add some more complexity, small parts of the segment mapped from a decriptor, each known as a "page," might be independently swappable on some architectures (including Intel). The end result of all of this is that the processes running in userland sharing the same physical memory do not see the same addresses into it, and they are actually unable to know that they are sharing the same physical memory because they cannot look into the internal tables of the operating system kernel. It is up to the kernel, when it maintains the various tables for the hardware, to keep track of what is being shared and in what manner, and to handle copy-on-write and any other tasks necessary to preserve the completeness of the illusion. > So then, does this new process share memory pages with an existing > copy of the same program? Here, I would think it would share text > pages, but not data pages, since the state of one is totally different > from the state of the other... I would intuitively think that it > wouldn't be worth the effort of mapping the old data pages to the new > process, since in many cases they will need to be changed immediately. In fact, you are confusing a lot of very narrow and specifc terms. If my data segment consists of a megabyte or so of data, and I change one byte, then I might be able to get away with copying only the page which contains that one byte and modifying the page table accordingly. On an Intel CPU (under Linux, anwyay), a page is 4 KB, so this could save me a great deal of efficiency in memory usage. Some optimizing linkers are actually smart enough to group chunks of data together in such a way as to minimize page faults, and we had such a tool for OS/2 which worked in concert with the LX-format executable. I do not know if anyone has ever developed such an optimizing linker for Linux ELF. > Really, I think there are 3 cases of the fork scenario you describe: > > - process forks a copy of itself > - process forks a new program, which is not already running > - process forks a new program, which IS already running > > I'd be curious what happens in each case. If you care to comment, > please do! You are confusing fork and exec. Only a fork really has these kinds of opptimizations available. In the first case where a process actually does fork, both the text and data segments are simply mapped without being copied until a write occurs, and then -- depending upon the architecture -- either the whole data segment or a granular page will be copied. In the second case, new memory is allocated. In the third case, where a program execs (not forks) a new program, the text segment will be shared but the data segment will not. > P.S. as partial answer to the original poster's question, I believe > what Linux does to aid in the start-up of recently terminated programs > is keep the disk blocks containing the program in buffer cache, so long as > there is enough free memory to do that. This is generally true, although there is also a more sophisticated option. For a library, which might be loaded into its own text and data segments separate from the process which is calling into it, a daemon could be left running which keeps the segments valid and resident, awaiting startup of a new instance. This is rarely done on Linux, partly because it is stupid, but also because Linux has fairly low startup latency for most reasonable binaries. Instead, this is the technique which is very widely used on Windows, and the main reason why things run in the "System Tray" in Windows is to try to gain the upper hand against other programs. Of course, this quickly leads to a point of diminishing returns, where a few dozen "systray applets" are installed by various aggressive software vendors, all of which themselves consume available memory. Eventually, the only result is that the swap file grows and the system slows to a crawl. -- Mike - Subcription/unsubscription/info requests: send e-mail with "subscribe", "unsubscribe", or "info" on the first line of the message body to discuss-request at blu.org (Subject line is ignored).
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |