|
return to content page/back to computer science
COSC243 - lecture10
Stallings "Computer Organisation and Architecture" Ch4.3 pg117-136
Fast but expensive temporary memory.
back to top
- Well, it needs to be small enough not to affect the overall cost of a system too much
(generally aim to be about the same price as the main memory)...
- ...but large enough so that access times are close to that of the cache alone (perhaps aim for 110%).
- Optimum size is VERY dependant on workload - effective size is 1K-512K (depending on use of system)
back to top
Close to the CPU - between the main memory and the CPU. The CPU will first retrieve
data from the cache.. If it is not in the cache, the data is first loaded from the main memory
into the cache, and then taken by the CPU from the cache. The chance of the CPU finding a
particular piece of data in the cache is called it's "hit rate".

Blocks of main memory map into lines of cache memory.
Each block/line consists of a number of words/cells - usually quite small
(optimally between 2 and 8). At any one time the cache will contain a subset of the main memory.

back to top
- Direct Mapping
Each memory block in the main memory can only be sent
to one line. As there are alot more blocks of main memory than lines of cache,
many blocks of main memory will map to a single line of cache. One of the disadvantages
of this type of mapping is that it can lead to "thrashing": replacing a block of memory
that is again immediately needed.
The main memory address is split into a tag, line number, and word number.

Let us take an example. Say we have 64bytes of main memory, and each word/cell is only one byte.
Then we need lg64=6 bits to address each cell. Now let us say that each block is composed of only
four cells (thus we have a total of 64cells/(4cells/block)=16blocks). Now let us pretend
that our Cache memory is only 16bytes. That gives us 16bytes/(4cells/line)=4lines (the number of
cells per line will always be the same as the number of lines). We can now think of
our cache memory as a grid of line numbers and cell numbers.

- The number of bits required to cover all the cell numbers is lg(blocksize)=lg4=2
- The number of bits required to cover all the line numbers is lg(num.lines)=lg4=2
- And the tag is just what's left (number of bits in main memory address less the bits above eg. 6-2-2=2)- it distinquishes between all the possible main
memory blocks that could be in that particular cache line, and has to be checked
before calculations.
back to top

- Associative Mapping
Unlike direct mapping, any block from main memory can appear on any cache line.
Thus we don't bother indexing the line (just the cell with the least significant bits from the
main memory address). The remainder of the address is the tag, and to find out which
line of cache we are dealing with, the tag on each line has to be compared to that part of the
address we want untill we reach the correct line. This is done simultaneously (else would take
quite awhile as you could imagine) - but the circuitry required is complex and expensive.


back to top
- Set Associative Mapping
The lines of cache are grouped into indexed sets. Each set is only 2 or 4 lines. In this way
we have direct mapping to the set, but associative mapping to the line in the set. It is a good
comprimise between the two.


back to top
Of course, this is not an issue for Direct Mapping as there is only one possible line in the
cache where any particular block of memory can go. The following mostly applies to Associative,
but also Set Associative (although the choice here will only be one of 2 or 4 lines):
- Least Recently Used (LRU)
Not good if executing a loop that is slightly larger than the cache. Works easily for two-way
set associative - use an extra bit (per line) that is set if most recently used, so replace the line with this bit set to 0.
- First In First Out (FIFO)
Easy to Implement, but again not good if program loop slightly larger than cache
- Least Frequently Used (LFU)
The problem of calculating it.
- Random
Actually quite effective - don't spend time deciding which one to throw out. Works
quite well in a mix/variety of programmes. Burroughs (now Unisys) came up with the idea.
back to top
This is easy if we are replacing a line that hasn't been modified - but this isn't always easy
to determine, especially as many I/O devices may have access to the cache...and several processors may
have their own cache. So, we need to maintain the validity between main memory and all caches.
- Write Through
All write operations to cache are also written to main memory...
...but then all other processor caches must monitor traffic to maintain consistancy....
this generates substrantial memory traffic and may cause a bottleneck.
- Write Back
Updates are only made to cache. Each line of cache has an associated ubdate ("dirty") bit so
that when wanting to overwrite a line, the former one in written to main memory only if this
bit is set.
...this means that parts of the main memory may no longer be valid if their value
is changed in cache, so all modules must first access the cache first to see if their
memory cell is there first...this again leads to complex circuitry and possible bottlenecks.
back to top
- Bus Watching With Write Through.
Each processor has a bus master that watches when other processors write through to the
main memory - if detected for one of the addresses it has in it's own cache, renders it's own
cache data invalid
- Hardware Transparency.
Additional hardware updates all caches when a write through occurs.
- Non-cachable Memory.
Any memory which more than one processor is trying to access (identified with
chip-select logic or high address bits) is deemed noncachable and must be accessed
directly from the main memory
back to top
- Higher logic densities enable on-chip caches - very fast and no bus use...
- ...but still need an ordinary cache to reduce accesses to DRAM and ROM through the bus.
- Thus have two levels of cache: L1 on the actual chip, and L2 is our normal one
associated with main memory.
back to top
- Unified Cache
- Data and instructions together.
- Higher hit rate than split. Simpler and cheaper to design and maintain.
- Not good for parallel instruction execution and pre-fetching of predicted instructions.
- Split Cache
- Instruction cache + data cache.
- Efficient pipelining and locality of reference.
back to top
|