#+Title: Michelson in Coq This repository is a formalisation of the [[https://www.michelson-lang.com/][Michelson]] language using the [[https://coq.inria.fr/][Coq]] interactive theorem prover. The following parts of the language are formalized: - the type system - the syntax - the semantics This formalisation is entirely based on [[https://tezos.gitlab.io/betanet/whitedoc/michelson.html][the specification of Michelson]]; more precisely, it is currently based on the following version of this specification: [[https://gitlab.com/tezos/tezos/blob/16bd2b036439d76a3e89dff8fdfbd36cf1d131fe/docs/whitedoc/michelson.rst]] * Syntax ** Type system In Michelson, an important distinction is made between *comparable* types and non-comparable ones. Comparable types can be used to index maps and only sets of elements in comparable types are allowed. Here is the list of comparable types defined in [[./src/syntax.v]]: #+BEGIN_SRC coq Inductive comparable_type : Set := | string | nat | int | bytes | bool | mutez | key_hash | timestamp. #+END_SRC Each comparable type is trivially a Michelson type, here is the full list of Michelson types as defined in [[./src/syntax.v]]: #+BEGIN_SRC coq Inductive type : Set := | Comparable_type : comparable_type -> type | key : type | unit : type | signature : type | option_ : type -> type | list_ : type -> type | set_ : comparable_type -> type | contract : type -> type | address : type | operation : type | pair : type -> type -> type | or : type -> type -> type | lambda : type -> type -> type | map : comparable_type -> type -> type | big_map : comparable_type -> type -> type. #+END_SRC Coq let us see the type =comparable_type= as a subtype of the type =type= by declaring the =Comparable_type= injection as an implicit coercion. #+BEGIN_SRC coq Coercion Comparable_type : comparable_type >-> type. #+END_SRC ** Instructions The syntax and typing of Michelson instructions is defined in file [[./src/syntax.v]]. We use a dependent inductive type to rule out ill-typed instructions. #+BEGIN_SRC coq Inductive instruction : list type -> list type -> Set := | NOOP {A} : instruction A A | FAILWITH {A B a} : instruction (a ::: A) B | SEQ {A B C} : instruction A B -> instruction B C -> instruction A C | IF_ {A B} : instruction A B -> instruction A B -> instruction (bool ::: A) B | LOOP {A} : instruction A (bool ::: A) -> instruction (bool ::: A) A ... #+END_SRC * Semantics The semantics of types is defined by interpreting them by predefined Coq types using the following table (function =data= in file [[./src/semantics.v]]: | Michelson | Coq | |---------------+---------------------------------------| | =int= | =Z= | | =nat= | =N= | | =string= | =string= | | =bytes= | =string= | | =timestamp= | =Z= | | =mutez= | =int63= | | =bool= | =bool= | | =unit= | =unit= | | =pair a b= | =a * b= | | =option a= | =option a= | | =or a b= | =sum a b= | | =list a= | =list a= | | =set a= | =set a (lt a)= | | =map a b= | =map a b (lt a)= | | =bigmap a b= | idem | | =lambda a b= | =instruction (a ::: nil) (b ::: nil)= | | anything else | axiomatized | with #+BEGIN_SRC coq Definition int63 := {t : int64.int64 | int64.sign t = false} Definition set a lt := {l : list A | Sorted.StronglySorted lt l} Definition map a b lt := set (a * b) (fun x y => lt (fst x) (fst y)) #+END_SRC We formalize the semantics of Michelson by defining an evaluator in file [[./src/semantics.v]]. Since evaluating a Michelson instruction might fail (which Coq functions cannot), we wrap the return type of this evaluator in the exception monad defined in file [[./src/error.v]]. Moreover, Coq forbids non-terminating function so we use a common Coq trick to define the evalator on diverging instructions such as =LOOP=: we make the evaluator structurally recursive on an extra argument of type =Datatypes.nat= called the =fuel= of the evaluator. #+BEGIN_SRC coq Fixpoint eval {A : stack_type} {B : stack_type} (i : instruction A B) (fuel : Datatypes.nat) {struct fuel} : stack A -> M (stack B) := match fuel with | O => fun SA => Failed _ Out_of_fuel | S n => match i in instruction A B return stack A -> M (stack B) with ... #+END_SRC * Overloading A Michelson instruction is called overloaded when it can be assigned several types. For example, the =NEG= instruction which replaces a number at the top of the stack by its opposite can have either the type =int : 'S -> int : 'S= or the type =nat : 'S -> int : 'S=. To handle this source of ambiguity in our typed AST =instruction=, we use *canonical structures*, a Coq feature that let the Coq refiner solve the ambiguity by inferring the missing piece of information from the context. Both versions of the =NEG= instruction are defined in [[./src/syntax.v]] as follows: #+BEGIN_SRC coq (* NEG takes either a nat or an int as argument *) Inductive neg_variant : type -> Set := | Neg_variant_nat : neg_variant nat | Neg_variant_int : neg_variant int. Structure neg_struct (a : type) := Mk_neg { neg_variant_field : neg_variant a }. Canonical Structure neg_nat : neg_struct nat := {| neg_variant_field := Neg_variant_nat |}. Canonical Structure neg_int : neg_struct int := {| neg_variant_field := Neg_variant_int |}. #+END_SRC The structure =neg_struct= is then used in the =instruction= datatype as follows: #+BEGIN_SRC coq Inductive instruction : list type -> list type -> Set := ... | NEG {n} {s : neg_struct n} {S} : instruction (n ::: S) (int ::: S) ... #+END_SRC so the =NEG= instruction receives a =neg_struct= as an implicit argument which is going to be provided by the canonical structure mechanism. Finally, the evaluator in file [[./src/semantics.v]] uses this implicit argument to call the correct function depending on the chosen type for the =NEG= instruction: #+BEGIN_SRC coq Definition neg a (v : neg_variant a) : data a -> data int := match v with | Neg_variant_nat => fun x => (- Z.of_N x)%Z | Neg_variant_int => fun x => (- x)%Z end. ... Fixpoint eval {A : stack_type} {B : stack_type} (i : instruction A B) (fuel : Datatypes.nat) {struct fuel} : stack A -> M (stack B) := match fuel with | O => fun SA => Failed _ Out_of_fuel | S n => match i in instruction A B return stack A -> M (stack B) with ... | @NEG _ _ _ s => fun '(x, SA) => Return _ (neg _ (neg_variant_field _ s) x, SA) ... #+END_SRC * TODO - Play with code extraction - Define an extension of Michelson where loops are annotated with invariants - check OCaml implem against the documentation - turn the type of operations into a Coq inductive To do this we need to implement the syntactic restriction over the types allowed for the storage to break the mutual dependency between this new inductive and the "semantics.data" function because the Create_contract operation requires an argument of type data storage_ty. - replace section variables by functors - Put notations in their own notation scope - Formalize gas (and remove fuel) eval should return (stack SA -> M (stack SB * {remaining_gas | remaining_gas < gas })) - Handle :-, %-, and @-annotations * Possible mistakes in Michelson specification - Full Grammar: + =FAILWITH = should be =FAILWITH= + the =INT= instruction is undocumented - NOOP should be documented - The type of the IF_... instruction is erroneous - Typo : "these are macros are simply" - The IF_SOME macro is semantically defined instead of syntactically