https://gitlab.com/nomadic-labs/mi-cho-coq
Revision 8f57f12fb062010e7f5bfd582958efe81ca62a17 authored by Arvid Jakobsson on 10 May 2021, 07:35:53 UTC, committed by Arvid Jakobsson on 10 May 2021, 07:35:53 UTC
Draft: [ci] add cache opam directory

See merge request nomadic-labs/mi-cho-coq!119
2 parent s 6ba8c47 + 9749d15
Raw File
Tip revision: 8f57f12fb062010e7f5bfd582958efe81ca62a17 authored by Arvid Jakobsson on 10 May 2021, 07:35:53 UTC
Merge branch 'arvid@ci-build-refactor-cache' into 'dev'
Tip revision: 8f57f12
README.org
#+Title: Mi-Cho-Coq

This repository is a formalisation of the [[https://www.michelson-lang.com/][Michelson]] language using the
[[https://coq.inria.fr/][Coq]] interactive theorem prover.

* Installation instructions

The simplest way to install the project is to use the OPAM package manager:

#+BEGIN_SRC bash

# Add the coq-released opam repository because it contains the coq-ott dependency
opam repo add coq-released https://coq.inria.fr/opam/released

# Install mi-cho-coq and its dependencies (coq, ott, and coq-ott)
opam pin add -k path coq-mi-cho-coq ./

#+END_SRC

* Introduction

The following parts of the language
are formalized:

- the type system
- the syntax
- the semantics

This formalisation is entirely based on [[https://tezos.gitlab.io/betanet/whitedoc/michelson.html][the specification of
Michelson]]; more precisely, it is currently based on the following
version of this specification:
[[https://gitlab.com/tezos/tezos/blob/16bd2b036439d76a3e89dff8fdfbd36cf1d131fe/docs/whitedoc/michelson.rst]]

* Syntax

** Type system

In Michelson, an important distinction is made between *comparable*
types and non-comparable ones. Comparable types can be used to index
maps and only sets of elements in comparable types are allowed.

Here is the list of comparable types defined in [[./src/michocoq/syntax.v]]:

#+BEGIN_SRC coq
Inductive comparable_type : Set :=
| string
| nat
| int
| bytes
| bool
| mutez
| key_hash
| timestamp.
#+END_SRC

Each comparable type is trivially a Michelson type, here is the full
list of Michelson types as defined in [[./src/michocoq/syntax.v]]:

#+BEGIN_SRC coq
Inductive type : Set :=
| Comparable_type : comparable_type -> type
| key : type
| unit : type
| signature : type
| option_ : type -> type
| list_ : type -> type
| set_ : comparable_type -> type
| contract : type -> type
| address : type
| operation : type
| pair : type -> type -> type
| or : type -> type -> type
| lambda : type -> type -> type
| map : comparable_type -> type -> type
| big_map : comparable_type -> type -> type.
#+END_SRC

Coq let us see the type =comparable_type= as a subtype of the type
=type= by declaring the =Comparable_type= injection as an implicit
coercion.

#+BEGIN_SRC coq
Coercion Comparable_type : comparable_type >-> type.
#+END_SRC

** Instructions

The syntax and typing of Michelson instructions is defined in file
[[./src/michocoq/syntax.v]]. We use a dependent inductive type to rule out
ill-typed instructions.

#+BEGIN_SRC coq
Inductive instruction : list type -> list type -> Set :=
| NOOP {A} : instruction A A
| FAILWITH {A B a} : instruction (a ::: A) B
| SEQ {A B C} : instruction A B -> instruction B C -> instruction A C
| IF_ {A B} : instruction A B -> instruction A B -> instruction (bool ::: A) B
| LOOP {A} : instruction A (bool ::: A) -> instruction (bool ::: A) A
...
#+END_SRC

* Semantics

The semantics of types is defined by interpreting them
by predefined Coq types using the following table (function =data= in file [[./src/michocoq/semantics.v]]:

  | Michelson     | Coq                                   |
  |---------------+---------------------------------------|
  | =int=         | =Z=                                   |
  | =nat=         | =N=                                   |
  | =string=      | =string=                              |
  | =bytes=       | =string=                              |
  | =timestamp=   | =Z=                                   |
  | =mutez=       | =int63=                               |
  | =bool=        | =bool=                                |
  | =unit=        | =unit=                                |
  | =pair a b=    | =a * b=                               |
  | =option a=    | =option a=                            |
  | =or a b=      | =sum a b=                             |
  | =list a=      | =list a=                              |
  | =set a=       | =set a (lt a)=                        |
  | =map a b=     | =map a b (lt a)=                      |
  | =bigmap a b=  | idem                                  |
  | =lambda a b=  | =instruction (a ::: nil) (b ::: nil)= |
  | anything else | axiomatized                           |

with
#+BEGIN_SRC coq
 Definition int63 :=
   {t : int64.int64 | int64.sign t = false}
 Definition set a lt :=
   {l : list A | Sorted.StronglySorted lt l}
 Definition map a b lt :=
   set (a * b) (fun x y => lt (fst x) (fst y))
#+END_SRC



We formalize the semantics of Michelson by defining an evaluator in
file [[./src/michocoq/semantics.v]]. Since evaluating a Michelson instruction might
fail (which Coq functions cannot), we wrap the return type of this
evaluator in the exception monad defined in file
[[./src/michocoq/error.v]]. Moreover, Coq forbids non-terminating function so we
use a common Coq trick to define the evalator on diverging
instructions such as =LOOP=: we make the evaluator structurally
recursive on an extra argument of type =Datatypes.nat= called the
=fuel= of the evaluator.

#+BEGIN_SRC coq
  Fixpoint eval {A : stack_type} {B : stack_type}
           (i : instruction A B) (fuel : Datatypes.nat) {struct fuel} :
    stack A -> M (stack B) :=
    match fuel with
    | O => fun SA => Failed _ Out_of_fuel
    | S n =>
      match i in instruction A B return stack A -> M (stack B) with
      ...
#+END_SRC

* Overloading

A Michelson instruction is called overloaded when it can be assigned
several types. For example, the =NEG= instruction which replaces a
number at the top of the stack by its opposite can have either the
type =int : 'S -> int : 'S= or the type =nat : 'S -> int : 'S=.

To handle this source of ambiguity in our typed AST =instruction=, we
use *canonical structures*, a Coq feature that let the Coq refiner
solve the ambiguity by inferring the missing piece of information from
the context.

Both versions of the =NEG= instruction are defined in [[./src/michocoq/syntax.v]]
as follows:

#+BEGIN_SRC coq
(* NEG takes either a nat or an int as argument *)
Inductive neg_variant : type -> Set :=
| Neg_variant_nat : neg_variant nat
| Neg_variant_int : neg_variant int.
Structure neg_struct (a : type) := Mk_neg { neg_variant_field : neg_variant a }.
Canonical Structure neg_nat : neg_struct nat :=
  {| neg_variant_field := Neg_variant_nat |}.
Canonical Structure neg_int : neg_struct int :=
  {| neg_variant_field := Neg_variant_int |}.
#+END_SRC

The structure =neg_struct= is then used in the =instruction= datatype as follows:

#+BEGIN_SRC coq
Inductive instruction : list type -> list type -> Set :=
...
| NEG {n} {s : neg_struct n} {S} : instruction (n ::: S) (int ::: S)
...
#+END_SRC

so the =NEG= instruction receives a =neg_struct= as an implicit argument
which is going to be provided by the canonical structure mechanism.

Finally, the evaluator in file [[./src/michocoq/semantics.v]] uses this implicit
argument to call the correct function depending on the chosen type for
the =NEG= instruction:

#+BEGIN_SRC coq

  Definition neg a (v : neg_variant a) : data a -> data int :=
    match v with
    | Neg_variant_nat => fun x => (- Z.of_N x)%Z
    | Neg_variant_int => fun x => (- x)%Z
    end.

  ...

  Fixpoint eval {A : stack_type} {B : stack_type}
           (i : instruction A B) (fuel : Datatypes.nat) {struct fuel} :
    stack A -> M (stack B) :=
    match fuel with
    | O => fun SA => Failed _ Out_of_fuel
    | S n =>
      match i in instruction A B return stack A -> M (stack B) with
      ...
      | @NEG _ _ _ s =>
        fun '(x, SA) => Return _ (neg _ (neg_variant_field _ s) x, SA)
      ...
#+END_SRC

* Regression tests

Regressions of the type checker are tested using the .tz files under
[[./src/contracts/testsuite][src/contracts/testsuite]]. To run the tests, call =make test=. To
regenerate the regression traces (the .tz.expected files), call =make
RESET_REGRESSION=true test=.

* TODO

- Play with code extraction
- Define an extension of Michelson where loops are annotated with invariants
- check OCaml implem against the documentation
- replace section variables by functors

- Put notations in their own notation scope

- Formalize gas (and remove fuel)

  eval should return (stack SA -> M (stack SB * {remaining_gas | remaining_gas < gas }))

- Handle :-, %-, and @-annotations

* Possible mistakes in Michelson specification

- Full Grammar:
  + =FAILWITH <data>= should be =FAILWITH=
  + the =INT= instruction is undocumented

- NOOP should be documented
- The type of the IF_... instruction is erroneous
- Typo : "these are macros are simply"
- The IF_SOME macro is semantically defined instead of syntactically

- The type address should be comparable
back to top