Content - 0488fe0ef3cf1884d629a3f0bb354e55ec680fd0 - c10e10c/README.org

visit type:
Tip revision: f6a9666a47c742ef88c41ba6a6b7d25239970f11 authored by Arvid Jakobsson on 19 January 2021, 14:54:50 UTC
talk_18_01_2021_cpp_lightning_talk
Tip revision: f6a9666
README.org
#+Title: Mi-Cho-Coq

This repository is a formalisation of the [[https://www.michelson-lang.com/][Michelson]] language using the
[[https://coq.inria.fr/][Coq]] interactive theorem prover.

* Installation instructions

The simplest way to install the project is to use the OPAM package manager:

#+BEGIN_SRC bash

# Add the required opam repositories for Mi-Cho-Coq and its dependencies
opam repo add coq-released https://coq.inria.fr/opam/released
opam repo add coq-extra-dev https://coq.inria.fr/opam/extra-dev

# Install mi-cho-coq and its dependencies
opam install coq-mi-cho-coq
#+END_SRC

* Introduction

The following parts of the language
are formalized:

- the type system
- the syntax
- the semantics
- the lexing
- the parsing

This formalisation is entirely based on [[https://tezos.gitlab.io/betanet/whitedoc/michelson.html][the specification of
Michelson]].

* Syntax

** Type system

In Michelson, an important distinction is made between *comparable*
types and non-comparable ones. Comparable types can be used to index
maps and only sets of elements in comparable types are allowed.

Here is the list of comparable types defined in [[./src/michocoq/syntax_type.v]]:

#+BEGIN_SRC coq
Inductive simple_comparable_type : Set :=
| string
| nat
| int
| bytes
| bool
| mutez
| address
| key_hash
| timestamp.

Inductive comparable_type : Set :=
| Comparable_type_simple : simple_comparable_type -> comparable_type
| Cpair : simple_comparable_type -> comparable_type -> comparable_type.
#+END_SRC

Each comparable type is trivially a Michelson type, here is the full
list of Michelson types as defined in [[./src/michocoq/syntax_type.v]]:

#+BEGIN_SRC coq
Inductive type : Set :=
| Comparable_type (_ : simple_comparable_type)
| key
| unit
| signature
| option (a : type)
| list (a : type)
| set (a : comparable_type)
| contract (a : type)
| operation
| pair (a : type) (b : type)
| or (a : type) (_ : annot_o) (b : type) (_ : annot_o)
| lambda (a b : type)
| map (k : comparable_type) (v : type)
| big_map (k : comparable_type) (v : type)
| chain_id.
#+END_SRC

Coq lets us see the type =simple_comparable_type= as a subtype of
=comparable_type= and =comparable_type= as a subtype of the type
=type= using implicit coercions.

** Instructions

The syntax and typing of Michelson instructions is defined in file
[[./src/michocoq/syntax.v]]. We use a dependent inductive type to rule out
ill-typed instructions.

#+BEGIN_SRC coq
Inductive instruction :
  forall (self_type : Datatypes.option type) (tail_fail_flag : Datatypes.bool) (A B : Datatypes.list type), Set :=
| NOOP {A} : instruction A A
| FAILWITH {A B a} : instruction (a ::: A) B
| SEQ {A B C} : instruction A B -> instruction B C -> instruction A C
| IF_ {A B} : instruction A B -> instruction A B -> instruction (bool ::: A) B
| LOOP {A} : instruction A (bool ::: A) -> instruction (bool ::: A) A
...
#+END_SRC

The =self_type= index represents the type of the parameter of the
whole contract, this index is needed for typing =SELF=,
=CREATE_CONTRACT= and =LAMBDA= correctly.

The =tail_fail_flag= index is a flag indicating whether or not the
instruction output type can be inferred from its input type. Most
instructions have this flag set to =false= meaning that the type of
their input determines the type of their output. =FAILWITH=, sequences
ending in =FAILWITH=, and branching instructions (such as =IF=) with
all branches ending in =FAILWITH= have their =tail_fail_flag= set to
=true=. Instruction sequencing ={ A ; B }= (also written =SEQ A B=)
requires that =A= has its =tail_fail_flag= set to =false=. This
restriction is also present in the reference Michelson typechecker; it
permits efficient typechecking in a single pass through the contract
code.

Finally the =A= and =B= indexes are the types of respectively the
input and output stacks of the instruction.

* Semantics

The semantics of types is defined by interpreting them
by predefined Coq types using the following table (functions =comparable_data= in [[./src/michocoq/comparable.v]]  =data= in file [[./src/michocoq/semantics.v]]:

  | Michelson     | Coq                                    |
  |---------------+----------------------------------------|
  | =int=         | =Z=                                    |
  | =nat=         | =N=                                    |
  | =string=      | =string=                               |
  | =bytes=       | =string=                               |
  | =timestamp=   | =Z=                                    |
  | =mutez=       | =int63=                                |
  | =bool=        | =bool=                                 |
  | =unit=        | =unit=                                 |
  | =pair a b=    | =a * b=                                |
  | =option a=    | =option a=                             |
  | =or a b=      | =sum a b=                              |
  | =list a=      | =list a=                               |
  | =set a=       | =set a (lt a)=                         |
  | =map a b=     | =map a b (lt a)=                       |
  | =bigmap a b=  | idem                                   |
  | =lambda a b=  | ={tff & instruction None tff [a] [b]}= |
  | anything else | axiomatized                            |

with
#+BEGIN_SRC coq
 Definition int63 :=
   {t : int64.int64 | int64.sign t = false}
 Definition set a lt :=
   {l : list A | Sorted.StronglySorted lt l}
 Definition map a b lt :=
   set (a * b) (fun x y => lt (fst x) (fst y))
#+END_SRC



We formalize the semantics of Michelson by defining an evaluator in
file [[./src/michocoq/semantics.v]]. Since evaluating a Michelson instruction might
fail (which Coq functions cannot), we wrap the return type of this
evaluator in the exception monad defined in file
[[./src/michocoq/error.v]]. Moreover, Coq forbids non-terminating function so we
use a common Coq trick to define the evalator on diverging
instructions such as =LOOP=: we make the evaluator structurally
recursive on an extra argument of type =Datatypes.nat= called the
=fuel= of the evaluator.

#+BEGIN_SRC coq
  Fixpoint eval {self_type} {tff} {env} {A : stack_type} {B : stack_type}
           (i : instruction self_type tff A B) (fuel : Datatypes.nat) {struct fuel} :
    stack A -> M (stack B) :=
    match fuel with
    | O => fun SA => Failed _ Out_of_fuel
    | S n =>
      match i, SA, env with
      ...
#+END_SRC

The =env= parameter to =eval= is a record providing the semantics of
the following instructions:
- forging operations: =CREATE_CONTRACT=, =SET_DELEGATE=, and
  =TRANSFER_TOKENS=: these cannot be defined directly because the
  =operation= type is axiomatized.
- requesting information about the context: =AMOUNT=, =BALANCE=,
  =CHAIN_ID=, =NOW=, =SELF=, =SENDER=, and =SOURCE=
- cryptography: =HASH_KEY=, =BLAKE2B=, =SHA256=, =SHA512=, and
  =CHECK_SIGNATURE= (not yet formalized)
- serialization: =PACK= and =UNPACK= (not yet formalized)
- manipulation of addresses: =ADDRESS=, =CONTRACT=, and
  =IMPLICIT_ACCOUNT= (not yet formalized)

* Overloading

A Michelson instruction is called overloaded when it can be assigned
several types. For example, the =NEG= instruction which replaces a
number at the top of the stack by its opposite can have either the
type =int : 'S -> int : 'S= or the type =nat : 'S -> int : 'S=.

To handle this source of ambiguity in our typed AST =instruction=, we
use *canonical structures*, a Coq feature that let the Coq refiner
solve the ambiguity by inferring the missing piece of information from
the context.

Both versions of the =NEG= instruction are defined in [[./src/michocoq/syntax.v]]
as follows:

#+BEGIN_SRC coq
(* NEG takes either a nat or an int as argument *)
Inductive neg_variant : type -> Set :=
| Neg_variant_nat : neg_variant nat
| Neg_variant_int : neg_variant int.
Structure neg_struct (a : type) := Mk_neg { neg_variant_field : neg_variant a }.
Canonical Structure neg_nat : neg_struct nat :=
  {| neg_variant_field := Neg_variant_nat |}.
Canonical Structure neg_int : neg_struct int :=
  {| neg_variant_field := Neg_variant_int |}.
#+END_SRC

The structure =neg_struct= is then used in the =instruction= datatype as follows:

#+BEGIN_SRC coq
Inductive instruction : ... :=
...
| NEG {n} {s : neg_struct n} {S} : instruction (n ::: S) (int ::: S)
...
#+END_SRC

so the =NEG= instruction receives a =neg_struct= as an implicit argument
which is going to be provided by the canonical structure mechanism.

Finally, the evaluator in file [[./src/michocoq/semantics.v]] uses this implicit
argument to call the correct function depending on the chosen type for
the =NEG= instruction:

#+BEGIN_SRC coq

  Definition neg a (v : neg_variant a) : data a -> data int :=
    match v with
    | Neg_variant_nat => fun x => (- Z.of_N x)%Z
    | Neg_variant_int => fun x => (- x)%Z
    end.

  ...

  Fixpoint eval ... :=
    match fuel with
    | O => fun SA => Failed _ Out_of_fuel
    | S n =>
      match i, SA, env with
      ...
      | @NEG _ _ s, (x, SA), _ => Return (neg _ (neg_variant_field _ s) x, SA)
      ...
#+END_SRC

* Typechecking

An untyped version of the =instruction= inductive type is provided in
file [[./src/michocoq/untyped_syntax.v]]. Type checking and type inference
functions are defined in file [[./src/michocoq/typer.v]]. They are proved
correct in file [[./src/michocoq/untyper.v]] by proving they are inverses
of the trivial type-erasure function called =untype_instruction=.

* Micheline lexing, parsing, and pretty-printing

A Micheline parser is defined using Menhir in file
[[./src/michocoq/micheline_parser.vy]]; its input is streams of Micheline
tokens that can be produced from a mere string using the lexer in
[[./src/michocoq/micheline_lexer.v]]. Micheline annotations are currently
not supported.

To go in the other direction, pretty-printing functions are defined in
file [[./src/michocoq/micheline_pp.v]].

Finally, the bridge between parsed Micheline nodes and untyped
instructions is defined in files [[./src/michocoq/micheline2michelson.v]]
(that also unfolds macros) and [[./src/michocoq/michelson2micheline.v]].

* Functional verification of smart contracts

The [[./src/contracts_coq/]] directory contains some examples of verified
Michelson smart contracts.

- [[./src/contracts_coq/boomerang.v]] is a simple example of a contract
  that sends back whatever amount is receives to the address that
  called it. It can be used for example to test a wallet's ability to
  send funds without taking the risk to lose them.

- [[./src/contracts_coq/deposit.v]] is an example of address-based
  authentication. The contract stores the address of its manager (that
  can either be an implicit account or a smart contract with a
  parameter of type =unit=). This contract has two entrypoints: the
  one on the left does nothing and can be called by anybody, it can be
  used to send tokens to the deposit contract; the one on the right
  can only be called by the manager, it takes an amount in parameter
  and sends to the manager the requested amount.

- [[./src/contracts_coq/manager.v]] is the default smart contract that was
  originated in place of all scriptless KT1 accounts during the
  Babylon migration. Because of this, it is an extremely common smart
  contract on the Tezos blockchain. It is quite similar to the deposit
  contract; it stores the address of its manager (that has to be an
  implicit account) and has two entrypoints, the default entrypoint
  does nothing and can be used to send tokens to the contract from
  anywhere; the other entrypoint, called =%do= can only be called by
  the manager. There are two differences with the deposit contract in
  the behaviour of the =%do= entrypoint: first, it checks that no
  tokens was sent (the default entrypoint should be used for that),
  second its parameter is not simply an amount to send to the manager
  but a piece of code (of type =lambda unit (list operation)=) to
  execute and emit as a result. This provides more genericity as it
  permits for example to change the delegate of the contract, to call
  other smart contract expecting a parameter, and to perform several
  transfers atomically.

- [[./src/contracts_coq/multisig.v]] is the multisig contract supported by
  =tezos-client=. A multisig contract is a smart contract that can be
  used to share ownership over a bag of tokens. It is an example of
  signature-based authentication. The contract stores a list of public
  keys that represent the owners of the contract, a number called the
  threshold and an anti-replay counter. It features three possible
  actions:
  + spending the contract tokens (by sending them to a =contract unit=)
  + updating the delegate (including stopping delegation)
  + updating both the threshold and the keys of the owners

  To run any of these actions, the following data has to be sent to
  the contract:
  + the current value of the anti-replay counter
  + the description of the action
  + a list of optional signatures

  The signatures must be given in the same order as the stored keys,
  all provided signature must be valid and the number of provided
  signatures must be greater or equal to the threshold. The signed
  data contains the =chain_id= of the chain running the contract, the
  address of the contract, the anti-replay counter, and the
  description of the action.

- [[./src/contracts_coq/generic_multisig.v]] is to the multisig contract
  what the manager contract is to the deposit contract: it features a
  universally-callable default entrypoint that does nothing (to
  receive tokens), it checks that no token is sent on the main
  entrypoint, and it generalizes the spending and delegation actions
  using a piece of code of type =lambda unit (list operation)=.

- [[./src/contracts_coq/vote.v]] is an example of a contract manipulating
  a map and implementing a paywall. It is a vote contract that stores
  a map from the available vote options (represented as strings) to
  their tallies. To vote for one of the available options, a transfer
  of 5tz (or more) to the contract is needed.

* Ott semantics

See [[./src/michocott/README.txt]].
Browse the archive

https://gitlab.com/nomadic-labs/mi-cho-coq