#### CS162 Operating Systems and Systems Programming Lecture 15

# Memory 2: Paging (Con't), Caching and TLBs

March 7<sup>th</sup>, 2024 Prof. John Kubiatowicz http://cs162.eecs.Berkeley.edu

### **Recall: General Address translation**



- Consequently, two views of memory:
  - View from the CPU (what program sees, virtual memory)
  - View from memory (physical memory)
  - Translation box (Memory Management Unit or MMU) converts between the two views

#### • Translation ⇒ much easier to implement protection!

- If task A cannot even gain access to task B's data, no way for A to adversely affect B
- Extra benefit: every program can be linked/loaded into same region of user address space

```
3/7/2024
```

```
Kubiatowicz CS162 © UCB Spring 2024
```

```
Lec 15.2
```

## Recall: Multi-Segment Model



Recall: Problems with Segmentation



- Must fit variable-sized chunks into physical memory – Complicated allocation algorithms in kernel
- May move processes multiple times to fit everything

   Lots of wasted time copying
  - Lois of wasted time copying
- Limited options for swapping to disk
  - All or nothing: Can't have part of a segment
- Fragmentation: wasted space
  - External: free gaps between allocated chunks
  - Internal: don't need all memory within allocated chunks









### Where is page sharing used?

- The "kernel region" of every process has the same page table entries
  - The process cannot access it at user level
  - But on U->K switch, kernel code can access it AS WELL AS the region for THIS user
    - » What does the kernel need to do to access other user processes?
- Different processes running same binary!
  - Execute-only, but do not need to duplicate code segments
- User-level system libraries (execute only)
- Shared-memory segments between different processes
  - Can actually share objects directly between processes » Must map page into same place in address space!
  - This is a limited form of the sharing that threads have within a single process



### Some simple security measures

- Address Space Randomization
  - Position-Independent Code  $\Rightarrow$  can place user code anywhere in address space
    - » Random start address makes much harder for attacker to cause jump to code that it seeks to take over
  - Stack & Heap can start anywhere, so randomize placement
- Kernel address space isolation
  - Don't map whole kernel space into each process, switch to kernel page table
  - Meltdown⇒map none of kernel into user mode!





Lec 15.11

3/7/2024





#### Summary: Paging

3/7/2024

Lec 15.12

# Recall: Memory Layout for Linux 32-bit (Pre-Meltdown patch!)

Random stack offset

Random mmap offset

program break

Random brk offset

art bri

nd data

start data

RLIMIT\_STACK (e.g., 8MB)

Kernel space Jser code CANNOT read from nor write to these addresses doing so results in a Segmentation Fault

Stack (grows down

Heap

BSS segment Uninitialized static variables, filled with zeros.

Example: static char \*u Data segment Static variables initialized by the programmer. xample: static char \*gonzo = "God's own prototype";

mple: /lib/libc.

Text segment (ELF) end\_code

Memory Mapping Segment File mappings (including dynamic libraries) and

nappings. Exa

1GB

3GE





3/7/2024

#### Fix for sparse address space: The two-level page table



Example: x86 classic 32-bit address translation





3/7/2024

#### Sharing with multilevel page tables



3/7/2024

### Summary: Two-Level Paging





Summary: Two-Level Paging

#### Multi-level Translation: Segments + Pages

- Lowest level page table  $\Rightarrow$  memory still allocated with bitmap
- Could have any number of levels. Example (top segment):



## **Multi-level Translation Analysis**

- Only need to allocate as many page table entries as we need for

» In other wards, sparse address spaces are easy

» Share at segment or page level (need additional reference counting)

– One pointer per page (typically 4K – 16K pages today)

- Page tables need to be contiguous
  - » However, the 10b-10b-12b configuration keeps tables to exactly one
- Two (or more, if >2 levels) lookups per reference





## Larger page sizes supported as well



- · Larger page sizes (2MB, 1GB) make sense since memory is now cheap
  - Great for kernel, large libraries, etc
  - Use limited primarily by internal fragmentation...

Kubiatowicz CS162 © UCB Spring 2024

Lec 15.35

3/7/2024

## IA64: 64bit addresses: Six-level page table?!?



## Alternative: Inverted Page Table



- The MMU must translate virtual address to physical address on:
  - Every instruction fetch
  - Every load
  - Every store
- · What does the MMU need to do to translate an address?
  - 1-level Page Table
    - » Read PTE from memory, check valid, merge address
    - » Set "accessed" bit in PTE, Set "dirty bit" on write
  - 2-level Page Table
    - » Read and check first level
    - » Read, check, and update PTE
  - N-level Page Table ...

#### MMU does page table Tree Traversal to translate each address

3/7/2024

Kubiatowicz CS162 © UCB Spring 2024

Lec 15.39



Address Translation Comparison

- The processor requests READ Virtual-Address to memory system - Through the MMU to the cache (to the memory)
- Some time later, the memory system responds with the data stored at the physical address (resulting from virtual → physical) translation
- Fast on a cache hit, slow on a miss So what is the MMU doing?
- On every reference (I-fetch, Load, Store) read (multiple levels of) page table entries to get physical frame or FAULT
  - Through the caches to the memory
  - Then read/write the physical location





- -Or: perhaps I/O if page table partially on disk!
- Even worse: What if we are using caching to make memory access faster than DRAM access?
- Solution? Cache translations!

3/7/2024



3/7/2024



- Temporal Locality (Locality in Time): - Keep recently accessed data items closer to processor
- Spatial Locality (Locality in Space):
  - Move contiguous blocks to the upper levels



### **Recall: Memory Hierarchy**

- Caching: Take advantage of the principle of locality to:
  - Present the illusion of having as much memory as in the cheapest technology
  - Provide average speed similar to that offered by the fastest technology



# How do we make Address Translation Fast?

- Cache results of recent translations !
  - Different from a traditional cache
  - Cache Page Table Entries using Virtual Page # as the key



## **Translation Look-Aside Buffer**

- Record recent Virtual Page # to Physical Frame # translation
- If present, have the physical address without reading any of the page tables !!!
  - Even if the translation involved multiple levels
  - Caches the end-to-end result
- Was invented by Sir Maurice Wilkes prior to caches
  - When you come up with a new concept, you get to name it!
  - People realized "if it's good for page tables, why not the rest of the data in memory?"
- On a *TLB miss*, the page tables may be cached, so only go to memory when both miss

# Caching Applied to Address Translation



Kubiatowicz CS162 © UCB Spring 2024

3/7/2024

Lec 15.47

#### Conclusion

- Page Tables
  - Memory divided into fixed-sized chunks of memory
  - Virtual page number from virtual address mapped through page table to physical page number
  - Offset of virtual address same as physical address
  - Large page tables can be placed into virtual memory
- Multi-Level Tables
  - Virtual address mapped to series of tables
  - Permit sparse population of address space
- Inverted Page Table
  - Use of hash-table to hold translation entries
  - Size of page table ~ size of physical memory rather than size of virtual memory
- The Principle of Locality:
  - Program likely to access a relatively small portion of the address space at any instant of time.
    - » Temporal Locality: Locality in Time
    - » Spatial Locality: Locality in Space

```
3/7/2024
```

Kubiatowicz CS162 © UCB Spring 2024

Lec 15.49