[nflug] AMD64 Debian 'Etch' Stability

Tue Jun 10 21:50:16 EDT 2008

On Tue, 2008-06-10 at 19:31 -0400, Cyber Source wrote:
> Not to steal this thread (although this one could go all over) but this 
> talk is making me wonder how swap plays a roll in all this. It's 
> generally been a rule of thumb to double the size of the ram when 
> setting up a swap partition. And I've read all over the theories that a 
> swap partition larger than 2GB is just a waste. So, I've been making my 
> swap partitions double the ram or a max of 2GB, whatever comes first. I 
> have noticed that on some systems with the same OS and relatively same 
> processes running, that some seem to eat up all the ram and start 
> dipping into swap (or nearly) and some that don't use near as much 
> system ram. That's always been a bit of a mystery to me. So, how does 
> one deal with swap in systems with such huge amounts of ram?

Hee hee hee hee...  You just asked about the mechanics of what students
in Operating Systems courses fear the most.  Two words:  Virtual Memory.

[ Readers' Digest Version ]

You must be a young sys-admin, when I learned it the "rule of thumb" was
size swap space at three times the amount of physical memory.  :-)  But
that rule of thumb is only for when you don't *really* have a good idea
of what the machine will be used for, or if it's so general-purpose a
machine what it's doing varies widely through time.  The bottom line is
the system needs to have enough "memory storage space" (where "memory
storage space" is the sum of your physical memory and swap space) to
accomodate the full address space of every process that runs on it.  So
if you've got a machine with 4Gb of memory and you're a fairly typical
user on a fairly typical workstation type machine you are probably
running processes that consume on the order of 2Gb total.  So you would
actually be perfectly fine with no swap space at all.  On the other hand
if you're setting up a large compute server that you expect scientists
to be running very large matrix simulations on, where each single
process is on the order of 2Gb and you want six or seven of these
scientists to be able to run simulations at the same time you're gonna
need a lot more swap space.

So, the answer to "how much swap space do I need" has always been, and
continues to be, "Depends on what you're going to be doing with the
machine."

If you really don't know and if you need a general rule of thumb the one
you're using isn't horrible, it would only bite you in the butt if
someone came along and started running lots of really huge processes on
your machine.  These days it's not totally unreasonable to try and
provide enough physical memory (2Gb, 4Gb) that the machine never
actually needs to use its swap space for an average person using an
average workstation.  It used to be the case that trying to do that was
totally unreasonable (memory cost way too much) hence the previous rule
of thumb.

This begs the question "Why stick more than 2Gb of memory in a machine,
won't it just be wasted?"  What you mention above about some of your
systems seeming like they seem to use all of their physical memory and
almost but not quite start using swap space fairly consistently actually
factors into that question.  The answer is "Well, if you've got the
money it won't hurt to stick more than 2Gb of memory in the machine.
The OS will find a use for it, and you may notice it run a little bit
faster as a result.

[ If hardcore techie stuff gives you a headache stop reading now. ]

It used to be the case that the non-kernel memory (that memory on the
machine not being "sat on" by the kernel) was strictly for user-level
processes' address space (their executable code, data storage, stack
space).  The kernel had inside itself a buffer of memory that was
generally known as "the buffer cache" and that's where file I/O was
buffered to help reduce the amount of physical disk reads/writes.  But
virtually all systems have now shifted over to a "unified memory/buffer
cache" system where instead of using a dedicated chunk of memory inside
the kernel the file I/O buffers are now part of the overall non-system
memory, in some sense "competing with" the user-level processes' address
space.  The reason I say "in some sense" is because a portion of a
process's address space needed to come from the files (the machine code)
to begin with.  This setup lets the OS programmers play all sorts of
games that do have visible effects on what you see for memory usage.
For example if the kernel keeps track of which portions of a process's
address space came from a file on disk when the kernel needs to page
that out (running low on physical memory, and the kernel noticed this
piece of the process hasn't been used lately so the kernel is hoping
that it's not really needed any more) the kernel doesn't actually need
to write that page of memory to the swap space - instead it just throws
that piece away (meaning the kernel notes that this portion of that
process is no longer in physical memory so if the process tries to
access that chunk of memory again it needs to be paged back in) and if
you wind up needing it again you just read it back in from the original
file instead of expecting it to be in the swap space.  And it explains
to some degree the behavior you observed.  If you watch memory use in
something like 'top' it will seem like you're using up nearly all your
physical memory if you just look at the "Free" statistic.  But a large
chunk of that will be stuff read in off disk (buffered I/O) that's "just
sitting there" for no other reason that we *might* try to read the same
stuff from the same file and in that case we won't need to do the
physical I/O to the drive again - it's just sitting there in memory.  If
you were to start up a bunch of processes "accomodating process address
space" trumps "keeping buffered file I/O around for no particular
reason" so the kernel will use some of that buffered file I/O space for
your processes (and resort to doing physical disk I/O if it turns out
later on we actually do read the same stuff from the same file).

If your version of "top" has statistics for "Active" and "Inactive"
that's useful to watch.  "Active" is physical memory currently in use by
all the running process (note that is NOT the sum total of all the
address space of all the processes - large portions of the processes may
not be in physical memory).  "Inactive" is physical memory that the
kernel remembers what had been in it, but whatever it had been used for
it's no longer actively being used in that way.  Picture this scenario.
You start up Firefox, run it for a while, and then exit.  While Firefox
had been running the system needed to read in stuff from the Firefox
executable file - some of it was the machine code and some of it was
"initialized data".  Now lets say you start up Firefox again.  If the
kernel remebered that a certain portion of the physical memory contained
the machine code it had read in from the Firefox executable file during
the previous running of Firefox it can save itself from needing to do
the disk I/O again by simply re-using that physical memory.  Note
however it can't do that with the physical memory that had been used for
the initialized data because the previous running of Firefox may have
(probably did...) change the values of the data while it had been
running (not to mention that would be an "information leak" :-).  And,
taking it one step further, note that "self-modifying code" has been a
big no-no for quite a while.  So the executable machine code portion of
Firefox can actually be shared by all instances of Firefox running on a
machine.  So in the above example if we didn't actually exit Firefox the
first time we ran it the second instance of Firefox that got started
could still wind up starting faster, and the kernel can set its address
space up to be using the same physical memory as the first instance of
Firefox for the *machine code* (again, NOT the data portion).  Some of
you *may* have tried updating an executable file on a Unix system and
gotten an error message saying "text file busy".  This is the cause of
that (executable machine code is generally known as the "text portion"
of the process's address space).

Some people can notice this on Unix systems but it's a LOT easier to
notice on a Windows machine or a Mac (both of which do exactly the same
thing).  Boot the machine from scratch.  Log in.  Run Firefox.  Exit
Firefox.  Run Firefox again.  The second time it will start up a LOT(!)
faster.  The above explains why.

-- 
                                                Ken Smith
- From there to here, from here to      |       kensmith at cse.buffalo.edu
  there, funny things are everywhere.   |
                      - Theodore Geisel |