summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorScott Olsen <scg.olsen@gmail.com>2022-12-28 01:17:48 -0600
committerGitHub <noreply@github.com>2022-12-28 08:17:48 +0100
commit339722325ec607091f6035866ebedea2b69080fe (patch)
treea61b32d266321da43feec818e8d1f88e2caeb107
parent25f50c92a57cc91b6cb4ec48df658439f936b641 (diff)
Document the memory management system further (#1442)
-rw-r--r--docs/Memory.md205
1 files changed, 204 insertions, 1 deletions
diff --git a/docs/Memory.md b/docs/Memory.md
index 8674e3b9..4ff9dca2 100644
--- a/docs/Memory.md
+++ b/docs/Memory.md
@@ -413,7 +413,210 @@ Or `foreach` (works like a foreach loop construct in a lot of other programming
```
-## Under the hood
+## Under the Hood: The Implementation of Carp's Memory Management System
+
+This section explores the implementation of Carp's memory management system in
+greater technical detail. Most users won't need to read this, but if you'd like
+to have a deeper understanding of how the system works, you'll find an
+explanation in this section.
+
+### AST Info, Identifiers, and Deleters
+
+Like other portions the Carp compiler, the memory management system operates on
+the abstract syntax tree (AST) representation of the forms in your program. When
+the compiler compiles your code, it assigns addition *information objects*,
+called `Info`, to each form in your program; these objects are particularly
+important to the memory management system. Among other things, these `Info`
+objects contain unique identifiers for each form in your program. The memory
+management system uses these identifiers to keep track of memory as it moves
+across different parts of your code.
+
+In addition to identifiers, form information objects also contain `Deleters`.
+These are a special data structure used to hold information about the `delete`
+functions needed for each linear value in your program. One of the memory
+management system's main responsibilities is to assign and keep track of these
+deleters for each form in your program that makes use of a linear value.
+
+Essentially, as the memory management system examines your code, if it finds a
+form that uses a linear value that should be deleted at a certain point, it adds
+an appropriate deleter to the info object for the form. If the linear value is
+*moved* to some other part of your code, the memory management system will
+remove the corresponding deleter, which will be added to the form it's moved
+into later.
+
+The key point to understand is that the memory management system primarily
+models the movements of linear values using the presence or absence of these
+deleter objects. When the compiler's code emission component encounters a form,
+if the form has an associated deleter, the emitter will produce a call to the
+deletion routine in the corresponding output C code.
+
+As we'll see in a moment, there are some further complications, but this is the
+basic approach taken by the memory management system.
+
+### Lifetimes
+
+The basic operation of the memory management system entails moving deleters
+across different Info objects for the forms in your program. As the system
+performs this task, it also has to account for the way *references* are used
+throughout your code, and how they relate to linear values. In order to track
+this, the memory management system uses *lifetimes* which determine whether or
+not a reference is valid in a given form.
+
+The following function provides an example of this reference validity tracking
+in action:
+
+```clojure
+(defn invalid-ref []
+ &[1 2 3])
+```
+
+In the prior example, our `invalid-ref` function returns a reference to the
+literal linear array value `[1 2 3]`. This code is problematic because the
+linear array value will be deleted at the end of the function, so the returned
+reference will point to nothing! The memory management system catches this for
+us and let's us know about the problem.
+
+Contrarily, the following code is perfectly fine:
+
+```clojure
+(def an-array [1 2 3])
+
+(defn valid-ref []
+ &an-array)
+```
+
+The `valid-ref` function also returns a reference, but this reference is valid
+since it points to a linear array value (`an-array`) that won't be deleted (it
+will still be "alive") by the time the function returns the reference.
+
+The system will also catch cases when we attempt to reference a linear value
+that's already been *moved* into a different location/binding:
+
+```clojure
+(defn unowned-ref []
+ (let [a [1 2 3]
+ b a
+ c &a]
+ ()))
+```
+
+In this example, we move the linear array from `a` into `b`, but then try to set
+`c` to a reference to `a`, which, after the move, no longer points to anything.
+
+Internally, the memory management system uses *lifetimes* to model the
+relationships between references and linear values and track the validity of
+reference across your code.
+
+#### Lifetimes in Detail
+
+Carp's lifetimes are made up of two pieces of information. Only references have
+lifetimes, and every reference has *exactly one* lifetime assigned to it:
+
+- A unique type variable that identifies the lifetime.
+- A lifetime mode, that indicates if the linear value tied to the reference has
+ a lexical scope that extends beyond the reference's lexical scope or if it's
+ limited to the reference's lexical scope.
+
+In general, a reference is valid only when the value it points to has either an
+equivalent or greater lexical scope. This property is encoded in its lifetime.
+
+Let's look at some examples to help illustrate this:
+
+```clojure
+(def an-array [1 2 3])
+
+(defn valid-ref []
+ (let [array-ref &an-arry]) ())
+```
+
+In this example, the anonymous reference `&an-array` has a unique lifetime that
+*extends beyond the lexical scope* of the reference itself. The lexical scope of
+the reference value `[1 2 3]` is greater than or equal to the lexical scope of
+the reference, which only extends across the let form, so, this reference is
+valid.
+
+Contrarily, the following reference is not valid:
+
+```clojure
+(defn invalid-ref []
+ &[1 2 3])
+```
+
+Here, the reference has a greater lexical scope than the linear value it points
+to. The anonymous linear value `[1 2 3]` will be deleted at the end of the
+function scope, but the reference will be returned from the function, so its
+lifetime is potentially greater than that of the value it points to.
+
+The memory management system performs two key checks around ref usage:
+
+1. Check that a newly created reference doesn't point to a linear value binding
+ that has already transferred away ownership.
+2. Check that a reference is alive at a certain point in the program.
+
+Both of these are implemented as separate checks, but they may be viewed as
+specializations of a general operation that checks if every reference form in
+your program is "alive" at the point of use.
+
+Currently, liveness analysis revolves around checking if the value the reference
+points to belongs to the same lexical scope as the reference, and, if so, that
+the value has a deleter in that scope, which indicates the scope properly owns
+the value. If no such deleter exists, it means the reference outlives the value
+it points to, and is invalid.
+
+### Type Dependencies
+
+The final key piece of information the memory system manages are the *type
+dependencies* of the deletion functions for linear values.
+
+Since Carp supports generic programming and polymorphic functions, it's possible
+that some deleter is needed in a polymorphic context. In particular, generic
+functions that "take ownership" of generic values need to be able to find the
+correct deletion routines for the value. For example, in the generic function:
+
+```clojure
+(sig my-generic-force-delete (Fn [a] Unit))
+
+(defn my-generic-force-delete [a]
+ ())
+```
+
+This `my-generic-force-delete` function takes ownership of whatever argument it
+receives and does nothing. Since it takes ownership, however, the value passed
+to `a`, if it's linear, needs to be deleted at the end of the function scope.
+
+Since the function is generic, the memory management system can't know for
+certain what value is being passed. In some cases it might be a linear value, in
+some cases it might not be. Sometimes it might be a `String`, sometimes an
+`Int`, or sometimes an `Array`. Each of these types has a different `delete`
+implementation.
+
+Rather than having the memory management system figure out what function to use,
+the system instead just keeps track of the types of all the values for the forms
+it analyzes. Later, the component already dedicated to resolving generic
+functions handles finding the right deletion routine for the values passed to
+the generic function. In order to accomplish this, it uses the type information
+captured by the memory system as it analyzes each form.
+
+### Memory State
+
+As we've explored, the memory management system needs to keep track of three key
+pieces of information as it analyzes the forms in your program:
+
+1. The deleters assigned to each AST node to track ownership of linear values
+ and delete them at the right time.
+2. The Lifetimes assigned to each reference to check reference validity.
+3. The types of each form it analyzes to resolve generic deletion functions.
+
+Each of these units of information is bundled into a single data structure,
+called the *memory state* or `MemState` of your program.
+
+As the memory management system analyzes each of the AST nodes in your program
+source, it updates the memory state accordingly. Deleters are added and removed
+from the state at different points as ownership transfers of linear values
+occur. When the system finishes analyzing a node, it update's the node's `Info`
+object, attaching the deleters associated with the current memory state. At any
+point, if the memory management system encounters a problem with the way memory
+is being transferred across your program's AST nodes, it reports an error.
A simple piece of code: