UP | HOME

Understanding Rust

Just what Scheme was to previous LISP dialects

The x86 CPU and Memory models

Modern CPUs are hardware implementations of abstract machines, defined by standard formal specifications.

It is crucial to realize that modern hardware is just creating an illusion according to simplified standard specification of a CPU with its particular instruction set, the set of registers, its stack, RAM, interrupts and I/O ports.

Everything is a well-defined abstraction nowadays, which is good and the right thing to do.

The C++ language ISO standard is also defined as a model. It tries to define semantics of the language on top of hardware abstractions. This is also the right thing to do.

To understand both Rust and C/C++ we have to understand the most basic “hardware abstractions”.

Instruction Set

A CPU is just a hardware implementation of an Interpeter (of its instruction set). Yes, an Interpreter, which is a particular kind of an Abstract Machine, is one of, if not the most fundamental abstraction in all computing.

Stack

This is an “hardware Abstract Data Type”, defined within a CPU. Seriously. There is a particular set of instructions to access the Stack (an ADT) and standardized set of “rules” (calling conventions) of how the Stack is used when a procedure is called or an interrupt occurs.

However Stack is just superimposed on some region in memory (RAM). A CPU has special registers to support it (as an abstraction), which maintain a Stack Segment for a program..

Heap

Data Segment

A process (resulting from running an executable) traditionally has

  • Code Segment
  • Stack Segment
  • Data Segment

which are abstract regions of memory each process has.

These segments (memory regions) are completely separated from any other process and has a particular addressing scheme within each process.

In fact CPU, hardware and an OS are maintaining an illusion of an isolated process and its memory layout, as if each process has a “full computer” in its possession.

Read-only “section”

Each executable has to be fully (or partially) loaded into computer’s memory in order to be executed (interpreted) by a CPU. A binary image has its own read-only data section(s), which contain “static (immutable) data” of paticular types – notably, numbers, structs and strings.

The data can be accessed by the code as if it is just an ordinary data in memory, except that it is in fact read-only. Any attempt to write on any address from the read-only section will cause a hardware error, which will be caught by an OS and reported as a segmentation fault.

High-Level Functional languages

High-level functional languages are designed in the way to abstract away everything hardware-related.

Languages like Miranda or Haskell could be operationally defined in its entirety as evaluation (reduction) of pure expressions by an interpretation process, formalized as of Graph Reduction (by a G-Machine).

In fact it is operationally the same as evaluation using pen and paper (and person’s memory and mind). This is absolutely remarkable and has lots of unique consequences.

Low-level Imperative languages

Imperative low-level languages such as C, C++, or Rust rely on standardized “hardware abstractions”, such as Registers, Stack and Memory Layout (visible from within a process).

This is the fundamental difference. Low-level language are tied to underlying hardware abstractions, while pure-functional languages are just “mathematical” and “logical” expressions on “paper”.

Rust

Rust superimposes its own set of abstractions, rules and conventions upon standardized “hardware abstractions”. This set is supposed to be “just right” (well-defined, enforced by the type-checker and the lifetime-checker) and “good-enough” for system programming (unlike C++ with is a kitchen sink).

This is the main innovation behind Rust - formalization of usage of references (a restricted variant of pointers) together with an explicit lifetime-checker for references (what they call the borrow-checker), which guarantees type-safety and partial memory-safety (and thus soundness) of the code at the compile-time.

Yes, it restricts the imperative language (its semantics) to make it more sound, and imposes a strict “discipline” on references (by making them as they are in SML), similar to a pure-functional subset of Scheme or Scala. The analogy with Scheme is the principle-guided way of the right understanding of Rust.

Instead of “arbitrary” memory access using “raw” pointers (as in C or C++) Rust enforces a strict “discipline”, guaranteed by the compiler (its type-checker and lifetime-checker), similar to refs (which is just an ADT) of functional language.

Functional languages (or the pure subsets) do not have any lifetime issues by the virtue of having immutable bindings (and data) and the resulting referential transparency property. Everything “lives forever”.

In Rust refs are the fundamental parts of the language and are deeply integrated within. This strict ref-discipline, together with the “static and strong” type-discipline, is what makes Rust special (and constitutes its major innovation).

It also uses the “zero cost abstraction” meme, introduced by C++, claiming that there is zero runtime overhead in using its data-types. This is a long story.

Author: <schiptsov@gmail.com>

Email: lngnmn2@yahoo.com

Created: 2023-08-08 Tue 18:38

Emacs 29.1.50 (Org mode 9.7-pre)