Copy on Write
Last semester in my operating systems class, we learned about a cool technique called copy on write (COW). Since then, I've seen the idea come up in a couple other places, and it is also my namesake, so I thought I'd write a blog post about it.
What is COW?
COW is a technique for creating a copy of a piece of data. The naive approach for copying data would be to just create the copy - but we can do better! COW is an optimization based on the observation that as long as the copy remains the same as the original (i.e. no writes have occurred), we do not need to create the copy, but instead can handle all read requests using the original, since they are identical. We only need to create the copy when someone tries to modify it or the original, in order to keep track of both versions. This is what the "on write" part of the term "copy on write" refers to.
As an example, imagine there are two users Alice and Bob. Initially, Alice has a piece of data \(X_A = 1\). Bob decides to create his own copy of the data \(X_B\). As long as Alice doesn't change \(X_A\) and Bob doesn't change \(X_B\), we do not need to allocate memory for \(X_B\). Whenever Bob reads from \(X_B\), we can just provide the value of \(X_A\). However, if Alice decides to write \(X_A = 2\), then at this point we have to allocate \(X_B = 1\). In this simple example, the cost of creating the copy is negligible, but you could imagine that the cost would be substantial if \(X_A\) was a very large data structure, like a file.
COW is a lazy approach because it procrastinates on doing work (i.e. creating the copy) until it absolutely has to. In general, one good thing about procrastinating is that you avoid doing useless work. In the case where the original and copy never diverge during the lifetime of the data, creating the copy naively will be a waste of CPU cycles and memory.
The opposite of lazy is eager. Sometimes, being eager can be better than being lazy, just like real life. For workloads where the copy and original always diverge, the eager approach is better because lazily creating the copy adds latency to the first write request.
However, there are a lot of workloads in the real world for which COW is beneficial. Let's take a look at COW in the wild!
The Fork Syscall
fork() is a system call provided by most operating systems that allows a process to spawn an identical process. For those unfamiliar with operating systems, you can think of a system call as an API that the operating system provides to programs to talk to the hardware and each other, and a process as an instance of a program. Each process has its own private memory address space.
When a process calls
fork(), a child process is spawned with the same initial memory contents as the parent. However, future writes to memory by each process will not be visible to the other.
A naive implementation of
fork() might copy all of the memory contents of the parent into the child's address space. However, this can be very expensive if most of the memory contents are untouched by the parent or child after the fork. Aha! This seems like a ripe opportunity for COW.
Indeed, many operating systems implement
fork() using COW. One clever way to do so is to use page table magic. When the parent calls
fork(), the operating system marks all page table entries (PTEs) of the parent as read-only. It also sets a "COW" bit to indicate that the PTE has been "copied". Finally, it lets the child process use the same page table as the parent! This way, reads performed by the parent and child will access the same physical memory. If the parent or child attempts to write to a location in memory, it will trigger a page fault. The page fault handler in the operating system sees that the "COW" bit is set so it copies the page. Therefore, unmodified pages are never copied, which saves a lot of memory.
This optimization is especially helpful for the common situation where the child process immediately executes another binary program using
exec() after it is spawned. Since
exec() replaces the memory of the calling process with the new one, it would be very wasteful if we had spent all that time copying memory from the parent to the child just to have it immediately thrown away. Luckily, COW makes the
fork()+exec() pattern practically free.
Git is a popular version control system for file directories. It allows one to store snapshots of a file directory at different points in time, and navigate between them. Naively, a snapshot would have to contain a copy of every file at that point in time. However, since it is often the case that only a small fraction of the files change between snapshots, Git employs COW by using pointers to the latest copies of unmodified files instead of creating identical duplicates.
More broadly, many snapshot-taking systems (such as versioned databases) could use COW.
Persistent Data Structures
Persistent Data Structures are a more theoretical application of COW. Their wikipedia page has a great explanation of them so I won't repeat it here, but just want to point out that it is basically the same as the idea of maintaining multiple snapshots in Git. In fact, Git could be viewed as an implementation of a persistent data structure.
It's good to be lazy like COW!