Content - 9ecb1de5065a6f5c0396727a4fcfaea380cce47c - 5981358/docs/developer/entering_alpha.rst

visit type:
Tip revision: a87b34285d691016deee3b9df60e1f7c51de83f2 authored by Philippe Wang on 16 October 2023, 22:53:33 UTC
kaitai-struct-files: CI: have it work without Dockerfile
Tip revision: a87b342
entering_alpha.rst
How to start reading protocol Alpha
===================================

Protocol Alpha is the name of the economic protocol under development in the ``master`` branch. Alpha is a placeholder
name and is not related to the old Alphanet :ref:`test network <test-networks>`.

Before reading that document, you may want to:

-  read the whitepaper,
-  read about :doc:`the protocol environment <../shell/protocol_environment>`.

As all protocols, Alpha is made of a series of OCaml interface and
implementation files, accompanied by a ``TEZOS_PROTOCOL`` file.

The ``TEZOS_PROTOCOL`` structure
--------------------------------

If you look at file ``TEZOS_PROTOCOL`` in the repository, you will see that it is
composed of the hash of the sources, and the list of its modules, in
linking order.

Protocol Alpha is structured as a tower of abstraction layers, a coding
discipline that we designed to have OCaml check as many invariants as
possible at typing time. You will also see empty lines in
``TEZOS_PROTOCOL`` that denote these layers of abstraction.

These layers follow the linking order: the first modules are the tower’s
foundation that talk to the raw key-value store, and going forward in
the module list means climbing up the abstraction tower.

The big abstraction barrier: ``Alpha_context``
----------------------------------------------

The proof-of-stake algorithm, as described in the white paper, relies on
an abstract state of the ledger, that is read and transformed during
validation of a block.

Due to the polymorphic nature of Tezos, the ledger’s state (that we call
**context** in the code), cannot be specific to protocol Alpha’s need.
The proof-of-stake is thus implemented over a generic key-value store
whose keys and associated binary data must implement the abstract
structure.

The ``Alpha_context`` module enforces the separation of concerns
between, on one hand, mapping the abstract state of the ledger to the
concrete structure of the key-value store, and, on the other hand,
implementing the proof-of-stake algorithm over this state.

In more practical terms, ``Alpha_context`` defines a type ``t`` that
represents a state of the ledger. This state is an abstracted out
version of the key-value store that can only be manipulated through the
use of the few selected manipulations reexported by ``Alpha_context``,
that always preserve the well-typed aspect and internal consistency
invariants of the state.

When validating a block, the low-level state that result from the
predecessor block is read from the disk, then abstracted out to a
``Alpha_context.t``, which is then only updated by high level operations
that preserve consistency, and finally, the low level state is extracted
to be committed on disk.

This way, we have two well separated parts in the code. The code below
``Alpha_context`` implements the ledger’s state storage, while the code
on top of it is the proof-of-stake algorithm. Thanks to this barrier,
the latter can remain nice, readable OCaml that only manipulates plain
OCaml values.

Below the ``Alpha_context``
---------------------------

For this part, in a first discovery of the source code, you can start by
relying mostly on this coarse grained description, with a little bit of
cherry-picking when you’re curious about how a specific invariant is
enforced.

Hashes and storage description layer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This layer is implemented by module ``storage_description.ml`` and modules named ``*_hash``.
It contains mainly Blake2b hash implementations specialized
for various basic types of the protocol.

The representation layer
~~~~~~~~~~~~~~~~~~~~~~~~

This layer is implemented by modules named ``*_repr``.
These modules abstract the values of the raw key-value context by using
:doc:`Data_encoding <data_encoding>`.

These modules define the data types used by the protocol that need to be
serialized (amounts, contract handles, script expressions, etc.). For
each type, it also defines its serialization format using
:doc:`Data_encoding <data_encoding>`.

Above this layer, the code should never see the byte sequences in the
database, the ones of transmitted blocks and operations, or the raw JSON
of data transmitted via RPCs. It only manipulates OCaml values.

The ``Storage`` module and storage functors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Even with the concrete formats of values in the context abstracted out,
type (or consistency) errors can still occur if the code accesses a
value with a wrong key, or a key bound to another value. The next
abstraction barrier is a remedy to that.

The storage module is the single place in the protocol where key
literals are defined. Hence, it is the only module necessary to audit,
to know that the keys are not colliding.

It also abstracts the keys, so that each kind of key get its own
accessors. For instance, module ``Storage.Contract.Balance`` contains
accessors specific to contracts’ balances.

Moreover, the keys bear the type of the values they point to. For
instance, only values of type ``Tez_repr.t`` can by stored at keys
``Storage.Contract.Balance``. And in case a key is not a global key, but
a parametric one, this key is parameterized by an OCaml value, and not the
raw key part.

So in the end, the only way to be used when accessing a contract balance
is ``Storage.Contract.Balance.get``, which takes a ``Contract_repr.t``
and gives a ``Tez_repr.t``.

All these well-typed operations are generated by a set of functors, that
come just before ``Storage`` in ``TEZOS_PROTOCOL``.

The ``*_storage`` modules
~~~~~~~~~~~~~~~~~~~~~~~~~

The two previous steps ensure that the ledger’s state is always accessed
and updated in a well-typed way.

However, it does not enforce that, for instance, when a contract is
deleted, all of the keys that store its state in the context are indeed
deleted.

This last series of modules named ``*_storage`` is there to enforce just
that kind of invariants: ensuring the internal consistency of the
context structure.

These transaction do not go as far as checking that, for instance, when
the destination of a transaction is credited, the source is also
debited, as in some cases, it might not be the case.

Above the ``Alpha_context``
---------------------------

The three next sections describe the main entrypoints to the protocol:
validation of blocks by the shell (that we often also call application),
smart contracts, and RPC services.

The ``Main`` module is the entrypoint that’s used by the shell. It
respects the module type that all protocol must follow. For that, its
code is mostly plumbing,

Starting from ``Apply``
~~~~~~~~~~~~~~~~~~~~~~~

This is were you want to start on your first read. Even if some plumbing
code is woven in, such as error cases declaration and registration, most
of the proof-of-stake code has been written in a verbose style, to be
understood with minimum OCaml knowledge.

You want to start from the shell entry points (validation of the block
header, validation of an operation, finalization of a block validation),
and follow the control flow until you hit the ``Alpha_context``
abstraction barrier. This will lead you to reading modules ``Baking``
and ``Amendment``.

Smart contracts
~~~~~~~~~~~~~~~

From ``Apply``, you will also end up in modules ``Script_ir_translator``
and ``Script_interpreter``. The former is the typechecker of Michelson
that is called when creating a new smart contract, and the latter is the
interpreter that is called when transferring tokens to a new smart
contract.

Protocol RPC API
~~~~~~~~~~~~~~~~

Finally, the RPCs specific to Alpha are also defined above the
``Alpha_context`` barrier.

Services are defined in a few modules, divided by theme. Each module
defines the RPC API: URL schemes with the types of parameters, and
input and output JSON schemas. This interface serves three
purposes. As it is thoroughly typed, it makes sure that the handlers
(that are registered in the same file) have the right input and output
types. It is also used by the client to perform RPC calls, to make
sure that the URL schemes and JSON formats and consistent between the
two parties. These two features are extremely useful when refactoring,
as the OCaml typechecker will help us track the effects of an RPC API
change on the whole codebase. The third purpose is of course, to make
automatic documentation generation possible (as in ``octez-client rpc
list/format``). Each service is also accompanied by a caller function,
that can be used from the client to perform the calls, and by the
tests to simulate calls in a fake in-memory context.

It can be useful if you are a third party developer who wants to read
the OCaml definition of the service hierarchy directly, instead of the
automatically generated JSON hierarchy.
Browse the archive

https://gitlab.com/tezos/tezos