#+Title: Michelson in Coq

This repository is a formalisation of the [[https://www.michelson-lang.com/][Michelson]] language using the
[[https://coq.inria.fr/][Coq]] interactive theorem prover. The following parts of the language
are formalized:

- the type system
- the syntax
- the semantics

This formalisation is entirely based on [[https://tezos.gitlab.io/betanet/whitedoc/michelson.html][the specification of
Michelson]]; more precisely, it is currently based on the following
version of this specification:
[[https://gitlab.com/tezos/tezos/blob/16bd2b036439d76a3e89dff8fdfbd36cf1d131fe/docs/whitedoc/michelson.rst]]

* Syntax

** Type system

In Michelson, an important distinction is made between *comparable*
types and non-comparable ones. Comparable types can be used to index
maps and only sets of elements in comparable types are allowed.

Here is the list of comparable types defined in [[./src/syntax.v]]:

#+BEGIN_SRC coq
Inductive comparable_type : Set :=
| string
| nat
| int
| bytes
| bool
| mutez
| key_hash
| timestamp.
#+END_SRC

Each comparable type is trivially a Michelson type, here is the full
list of Michelson types as defined in [[./src/syntax.v]]:

#+BEGIN_SRC coq
Inductive type : Set :=
| Comparable_type : comparable_type -> type
| key : type
| unit : type
| signature : type
| option_ : type -> type
| list_ : type -> type
| set_ : comparable_type -> type
| contract : type -> type
| address : type
| operation : type
| pair : type -> type -> type
| or : type -> type -> type
| lambda : type -> type -> type
| map : comparable_type -> type -> type
| big_map : comparable_type -> type -> type.
#+END_SRC

Coq let us see the type =comparable_type= as a subtype of the type
=type= by declaring the =Comparable_type= injection as an implicit
coercion.

#+BEGIN_SRC coq
Coercion Comparable_type : comparable_type >-> type.
#+END_SRC

** Instructions

The syntax and typing of Michelson instructions is defined in file
[[./src/syntax.v]]. We use a dependent inductive type to rule out
ill-typed instructions.

#+BEGIN_SRC coq
Inductive instruction : list type -> list type -> Set :=
| NOOP {A} : instruction A A
| FAILWITH {A B a} : instruction (a ::: A) B
| SEQ {A B C} : instruction A B -> instruction B C -> instruction A C
| IF_ {A B} : instruction A B -> instruction A B -> instruction (bool ::: A) B
| LOOP {A} : instruction A (bool ::: A) -> instruction (bool ::: A) A
...
#+END_SRC

* Semantics

The semantics of types is defined by interpreting them
by predefined Coq types using the following table (function =data= in file [[./src/semantics.v]]:

  | Michelson     | Coq                                   |
  |---------------+---------------------------------------|
  | =int=         | =Z=                                   |
  | =nat=         | =N=                                   |
  | =string=      | =string=                              |
  | =bytes=       | =string=                              |
  | =timestamp=   | =Z=                                   |
  | =mutez=       | =int63=                               |
  | =bool=        | =bool=                                |
  | =unit=        | =unit=                                |
  | =pair a b=    | =a * b=                               |
  | =option a=    | =option a=                            |
  | =or a b=      | =sum a b=                             |
  | =list a=      | =list a=                              |
  | =set a=       | =set a (lt a)=                        |
  | =map a b=     | =map a b (lt a)=                      |
  | =bigmap a b=  | idem                                  |
  | =lambda a b=  | =instruction (a ::: nil) (b ::: nil)= |
  | anything else | axiomatized                           |

with
#+BEGIN_SRC coq
 Definition int63 :=
   {t : int64.int64 | int64.sign t = false}
 Definition set a lt :=
   {l : list A | Sorted.StronglySorted lt l}
 Definition map a b lt :=
   set (a * b) (fun x y => lt (fst x) (fst y))
#+END_SRC


We formalize the semantics of Michelson by defining an evaluator in
file [[./src/semantics.v]]. Since evaluating a Michelson instruction might
fail (which Coq functions cannot), we wrap the return type of this
evaluator in the exception monad defined in file
[[./src/error.v]]. Moreover, Coq forbids non-terminating function so we
use a common Coq trick to define the evalator on diverging
instructions such as =LOOP=: we make the evaluator structurally
recursive on an extra argument of type =Datatypes.nat= called the
=fuel= of the evaluator.

#+BEGIN_SRC coq
  Fixpoint eval {A : stack_type} {B : stack_type}
           (i : instruction A B) (fuel : Datatypes.nat) {struct fuel} :
    stack A -> M (stack B) :=
    match fuel with
    | O => fun SA => Failed _ Out_of_fuel
    | S n =>
      match i in instruction A B return stack A -> M (stack B) with
      ...
#+END_SRC

* Overloading

A Michelson instruction is called overloaded when it can be assigned
several types. For example, the =NEG= instruction which replaces a
number at the top of the stack by its opposite can have either the
type =int : 'S -> int : 'S= or the type =nat : 'S -> int : 'S=.

To handle this source of ambiguity in our typed AST =instruction=, we
use *canonical structures*, a Coq feature that let the Coq refiner
solve the ambiguity by inferring the missing piece of information from
the context.

Both versions of the =NEG= instruction are defined in [[./src/syntax.v]]
as follows:

#+BEGIN_SRC coq
(* NEG takes either a nat or an int as argument *)
Inductive neg_variant : type -> Set :=
| Neg_variant_nat : neg_variant nat
| Neg_variant_int : neg_variant int.
Structure neg_struct (a : type) := Mk_neg { neg_variant_field : neg_variant a }.
Canonical Structure neg_nat : neg_struct nat :=
  {| neg_variant_field := Neg_variant_nat |}.
Canonical Structure neg_int : neg_struct int :=
  {| neg_variant_field := Neg_variant_int |}.
#+END_SRC

The structure =neg_struct= is then used in the =instruction= datatype as follows:

#+BEGIN_SRC coq
Inductive instruction : list type -> list type -> Set :=
...
| NEG {n} {s : neg_struct n} {S} : instruction (n ::: S) (int ::: S)
...
#+END_SRC

so the =NEG= instruction receives a =neg_struct= as an implicit argument
which is going to be provided by the canonical structure mechanism.

Finally, the evaluator in file [[./src/semantics.v]] uses this implicit
argument to call the correct function depending on the chosen type for
the =NEG= instruction:

#+BEGIN_SRC coq

  Definition neg a (v : neg_variant a) : data a -> data int :=
    match v with
    | Neg_variant_nat => fun x => (- Z.of_N x)%Z
    | Neg_variant_int => fun x => (- x)%Z
    end.

  ...

  Fixpoint eval {A : stack_type} {B : stack_type}
           (i : instruction A B) (fuel : Datatypes.nat) {struct fuel} :
    stack A -> M (stack B) :=
    match fuel with
    | O => fun SA => Failed _ Out_of_fuel
    | S n =>
      match i in instruction A B return stack A -> M (stack B) with
      ...
      | @NEG _ _ _ s =>
        fun '(x, SA) => Return _ (neg _ (neg_variant_field _ s) x, SA)
      ...
#+END_SRC

* TODO

- Play with code extraction
- Define an extension of Michelson where loops are annotated with invariants
- check OCaml implem against the documentation
- turn the type of operations into a Coq inductive

  To do this we need to implement the syntactic restriction over the
  types allowed for the storage to break the mutual dependency between
  this new inductive and the "semantics.data" function because the
  Create_contract operation requires an argument of type data
  storage_ty.

- replace section variables by functors

- Put notations in their own notation scope

- Formalize gas (and remove fuel)

  eval should return (stack SA -> M (stack SB * {remaining_gas | remaining_gas < gas }))

- Handle :-, %-, and @-annotations

* Possible mistakes in Michelson specification

- Full Grammar:
  + =FAILWITH <data>= should be =FAILWITH=
  + the =INT= instruction is undocumented

- NOOP should be documented
- The type of the IF_... instruction is erroneous
- Typo : "these are macros are simply"
- The IF_SOME macro is semantically defined instead of syntactically