Duo

About

Duo is a statically-typed, functional research programming language exploring ideas of duality and subtyping. The general programming style is similar to programming languages in the ML tradition (SML, Miranda, Haskell, OCaml). This means that the language provides a rich language for defining new types, and these types can be inferred using a powerful type inference mechanism. Programs use familiar pattern matching constructs to work with algebraic datatypes, but also offer less familiar constructs like codata types and copattern matching. The language de-emphasizes mutation, but we explore efficiency-oriented "on-the-metal" implementation issues.

Guiding Ideas

Duo is a language focused on two central ideas; Duality and Subtyping.

  • Duality: Duality is a well-known principle in logic, type theory and programming language theory. For example:

    • Monads are dual to Comonads.
    • Call-by-value evaluation is dual to call-by-name evaluation.
    • Data types are dual to codata types.
    • Producers/proofs are dual to consumers/refutations.
    • Sending is dual to receiving.

    Duo takes such dualities and explores how they can be used as a guiding principle in programming language design. The first-class treatment of consumers, in particular, allows for new ways to express programs which could otherwise only be expressed using continuation-passing style.

  • Subtyping: Subtyping in programming languages is mostly associated with object-oriented programming and inheritance. Part of the reason for this association is that statically-typed functional programming languages heavily rely on type-inference, and it was widely thought that good type inference is incompatible with subtyping. We use the recently developed algebraic-subtyping approach to infer types. This approach allows to infer principal types with minimal need for user-provided type annotations, even in the presence of subtyping and parametric polymorphism.

About Us

Duo is developed on GitHub at the chair for programming languages at the University of Tübingen.

Installation

Duo is implemented in Haskell and can be compiled from the sources using the stack build tool.

Obtaining the stack build tool

There are two recommended ways to install stack on your system; only installing stack or using ghcup to manage your stack installation.

  • In order to just install the stack tool, follow the instructions on their website here.
  • You can use ghcup to manage multiple different versions of the ghc Haskell compiler and other tools from the Haskell ecosystem, including stack. To install stack using ghcup, follow the instructions on the ghcup website here.

Verify that your installation was succesfull and that the binary is on your path:

> stack --version
Version 2.7.3, Git revision 7927a3...

Building Duo from the sources

Clone the duo repository to your system:

git clone git@github.com:duo-lang/duo-lang.git

Then, change into the duo-lang directory and use stack to build the binary

> stack build

During the first compilation this can take a while, since a lot of the dependencies have to be compiled as well. Subsequent compilations will be much faster.

In order to use the editor plugin and the LSP server, the duo binary has to be on the path. We use stack install for this.

> stack install

It is possible that you have to add the directory that the program was installed to to your $PATH environment variable. Verify that the duo binary has been correctly installed by querying it for its version:

> duo --version
Duo Version: 0.1.0.0
Git Commit: 81ec55800707cfdcd43a685d2e6fecd44a0429b8
Git Branch: main

Editor Integration

Editor integration is currently available for VSCode and vim. The editor integration uses the language server protocol, so it should be possible to support other editors in the future.

Installing the VSCode extension

We provide a prebuilt .vsix extension which can be installed as an extension to VSCode. The prebuilt extension is available as a release on our GitHub page . Once you have downloaded the extension, use the following command to install the extension as a plugin in VSCode.

code --install-extension duo-lang-client-0.0.1.vsix

The extension itself provides syntax highlighting and indentation support. The advanced features are implemented using a language server, and require the duo binary to be available on the path.

Installing the vim extension

For installing the vim extension, please follow the instructions on the corresponding GitHub page.

Toplevel Declarations

The following declarations are available at the toplevel of a Duo module.

Import statements

Other Duo modules can be imported using an import declaration. The following snippet imports the two modules Function and I64, which have to be declared in corresponding files Function.duo and I64.duo. These files have to be available in the library search path.

import Function;
import I64;

Circular module imports are not allowed and will generate an error. It is currently not possible to only import some of the declarations of a module, or to import a module qualified.

Type synonyms

Type synonyms can be defined using the type keyword. In the following snippet, a type synonym SBool is defined as an alias for the structural type < True, False >. For the typechecker the two types SBool and < True, False > are indistinguishable.

constructor True;                                                   
constructor False;                                                  
                                                                    
type SBool := < True, False >;  

Parameterized type synonyms are currently not yet implemented.

Type operators

Duo supports binary type operators with user-specified associativity and precedence. For example, the function type and the -> operator are specified in the following manner:

-- | The function type
codata Fun : (-a : CBV, +b : CBV) -> CBN {
    Ap(a,return b)
};

type operator -> rightassoc at 0 := Fun;

This first defines the codata type Fun and then introduces the arrow symbol as a custom notation for the Fun type with right associativity and precedence level 0. Every data and codata type has to be introduced with a lexical identifier, Fun in this example, before a type operator can be introduced for the type. The right-hand side of a type operator declaration expects the name of a typeconstructor with arity 2. Nullary and unary type operators are currently not implemented.

Data and codata declarations

Constructor and destructor declarations for structural types

Structural types, i.e. polymorphic variants and structural records, require their respective constructors and destructors to be declared at the toplevel before they can be used. This is different to, for example, OCaml, where a backtick in front of the name of the constructor is used. This design was chosen for the following reasons:

  • Docstrings can be attached to the constructors at the declaration site, and these docstrings can be displayed on hover at the respective use sites.
  • Typos in the names of constructors of polymorphic variants can be detected early.
  • Since the evaluation of expressions in Duo is kind-directed, we need a place to specify how constructors and destructors of structural types are to be evaluated.

Constructor declarations

In order to define a structural variant of booleans, the following two declarations can be used.

constructor True;
constructor False;

In this case, no evaluation order has been specified, and in the case of constructors this defaults to CBV. I.e. the above declaration is a shorthand for the more explicit declaration:

constructor True : CBV;
constructor False : CBV;

For constructors which take arguments the user has to specify the evaluation order used for the respective argument position. Structural peano natural numbers where the argument to the successor constructor is evaluated eagerly are declared as follows:

constructor Z;
constructor S(CBV);

which, again, is shorthand for

constructor Z : CBV;
constructor S(CBV) : CBV;

These constructors then occur in structural data types like < True, False> or < S(< Z >)>.

Destructor declarations

Destructor declarations are similar to constructor declarations, with one difference. If you don't specify a evaluation order for the destructor, it defaults to CBN evaluation. Hence, the declaration

destructor Name(CBV);

is shorthand for the declaration

destructor Name(CBV) : CBN;

These destructors then occur in structural codata types like { Name(String) }.

Producer declarations

A producer is declared using the def prd syntax. An optional type signature can be specified.

def prd two := S(S(Z));

def prd three: Nat := S(S(S(Z)));

def prd id: forall a. a -> a := \x => x;

By default, producers defined using the def prd keywords are not recursive. For recursive definitions, the keyword rec has to be added.

Consumer declarations

A consumer is declared using the def cns syntax. An optional type signature can be specified.

def cns exitOnBool := case {
    True => #ExitSuccess,
    False => #ExitFailure
};

def cns exitOnBool : Bool := case {
    True => #ExitSuccess,
    False => #ExitFailure
};

By default, consumers defined using the def cns keywords are not recursive. For recursive definitions, the keyword rec has to be added.

Command declarations

A command is declared using the def cmd syntax. Commands have no type, and it is therefore not possible to write a type signature.

def cmd exit := #ExitFailure;

def cmd main := #ExitSuccess;

The command called "main" is treated specially, since execution starts with the execution of the "main" command.

Type Class declarations

A type class is declared using the class keyword. Currently only single parameter type classes are supported, thus a single type variable together with its kind and variance have to be declared.

class Show(+a : CBV){
    Show(a, return String)
};

Inside the body of the class declaration an arbitrary number of type class methods can be defined. The signature has to respect the variance of the type variable above. Method and type class name live in different name spaces, so it is possible to overload them.

Instance declarations

An instance of a type class is declared using the instance keyword. Unlike in languages such as Haskell, instances have to be named.

instance showBool : Show Bool {
    Show(b, k) => case b of {
        True => MkString("True"),
        False => MkString("False")
    } >> k
};

Type class methods are implemented as commands and can be used as such, e.g.:

def cmd main := Show(True, mu x. #Print(x, #ExitSuccess));

Kinds

Most languages choose one evaluation strategy and use that strategy globally. Strict languages like OCaml, SML or F# choose a call-by-value (CBV) evaluation strategy, whereas non-strict languages choose either call-by-name (CBN) or call-by-need. Instead of comitting to a global evaluation strategy, the evaluation strategy in Duo is specified for every type individually, and this choice is then reflected in the kind of that type. It is possible, for example, to have both strict booleans and lazy booleans within the same language. The difference between strict booleans and lazy booleans is then recorded on the level of kinds. For strict booleans we would have StrictBool : CBV, and for lazy booleans LazyBool : CBN, where CBV and CBN replace the single kind * of inhabitated types that other languages have.

Evaluation Orders

Duo currently supports two different evaluation strategies. The two evaluation strategies are call-by-value and call-by-name.

Evaluation OrderExplanation
CBVCall-by-value
CBNCall-by-name

Inhabitated Kinds

The different evaluation strategies are reflected in the kinds of types. Instead of having only one inhabitated kind, usually written * in the type theory literature, there are several different kinds which classify inhabitated types. These inhabitated kinds are called "MonoKind". Currently there are six different MonoKinds, one for each of boxed values of CBV and CBN types, and one for each of the primitive unboxed types.

MonoKindExplanation
CBVThe kind of boxed CBV types
CBNThe kind of boxed CBN types
I64RepThe kind of the unboxed #I64 type
F64RepThe kind of the unboxed #F64 type
CharRepThe kind of the unboxed Char type
StringRepThe kind of the unboxed String type

Higher Kinds

Higher kinds are the kinds given to type constructors. Since the only available type constructors in Duo construct CBV or CBN types, they follow the following restricted grammar. Each argument in a PolyKind has a variance, either + for covariant arguments or - for contravariant arguments, a type variable a, and a MonoKind of type type variable, mk.

PolyKindExplanation
(+-a : mk)* -> eoPolykind
CBVSyntactic sugar for () -> CBV
CBNSyntactic sugar for () -> CBN

Types

Duo currently has the following general classes of types:

  • Primitive types which are hardwired in the compiler and cannot be defined by the user.
  • Nominal types include the familiar algebraic data types from languages like Haskell, Scala, ML or Rust, but also their dual: codata types.
  • Structural types
  • Refinement types
  • Lattice types are the types which stem from the availability of subtyping in Duo. Lattice types include unions, intersections, top and bottom types.
  • Equi-recursive types are types rec a. tau which are indistinguishable from their unfolding tau [rec a.tau / a].

Primitive Types

There are four builtin types which correspond to numeric types supported by most modern architectures, as well as types for characters and strings. The builtin types are given in the following table.

TypeKindExplanation
#I64I64RepUnboxed signed 64-Bit integers
#F64F64RepUnboxed 64-Bit precision floating point numbers
#CharCharRepUnboxed character
#StringStringRepUnboxed string

Each Builtin type has its own kind, which reflects the fact that these types are unboxed. Since they are unboxed, it is in general not possible to apply a polymorphic function to an argument of a builtin type. The standard library provides wrappers in the std/Prim/I64.duo, std/Prim/F64.duo and std/Prim/Strings.duo modules which provide the boxed variants of the builtin unboxed types.

Nominal Types

Duo has two different variants of nominal types; nominal data types and nominal codata types.

Nominal Data Types

Nominal data types are declared using the data keyword. For example, the following two declarations declare Booleans and Peano natural numbers, respectively.

data Bool : CBV { True, False };

data Nat : CBV { Zero, Succ(Nat) };

Note that we have to indicate the evaluation order we want to use for that type. We can also use nominal data types to declare the unit and void type.

data Unit : CBV { MkUnit };
data Void : CBV {};

When we declare parameterized data types, we have to declare both the kind of the type parameter, and its variance. For example, we can declare the type of lists of CBV types.

data List : (+a : CBV) -> CBV {
    Nil,
    Cons(a, List(a))
};

In order for the algebraic subtyping approach to work, all type parameters have to be either covariant or contravariant. It currently isn't possible to use invariant type parameters, but such a feature might be added as syntactic sugar at a future point.

Nominal Codata Types

Codata types are introduced using the codata keyword.

One of the standard examples of a codata type is the example of an infinite stream of natural numbers, called NatStream. A stream has two possible observations, Head and Tail, which we call the destructors of the codata type. The destructor Head returns the first element of the stream, and the destructor Tail returns the remainder of the stream.

codata NatStream : CBN {
    Head(return Nat), 
    Tail(return NatStream)
};

Codata types can also be parameterized, similar to the List example of the previous section.

codata Stream : (+a : CBV) -> CBN {
    Head(return a),
    Tail(return Stream(a))
};

Codata types make it possible to define the function type, instead of having a builtin function type like most functional programming languages. The declaration of the function type looks like this.

codata Fun : (-a : CBV, +b CBV) -> CBN {
    Ap(a, return b)
}
Restriction on lattice types in arguments

It is not possible to use lattice types in the arguments of constructors or destructors of data and codata types. For example, the following type is not allowed:

data Foo :  CBN {
    Bar(Top)
}; 

Without this restriction, lattice types will occur in positions in which they are not allowed. For the example above, Top occurs in a negative position in the typing rule for the constructor:

Gamma |- e :prd tau     tau <: Top
------------------------------------T-Ctor
Gamma |- Bar(e) :prd Foo

but in a positive position in the typing rule for the destructor:

Gamma, x : Top |- c :cmd
--------------------------------------- T-Match
Gamma |- case { Bar(x) => c } :cns Foo

Structural Types

Nominal data and codata types define a new type together with all of its constructors (resp. destructors). But sometimes it is more useful to have a more structural type, where we can introduce new constructors and destructors at arbitrary places in the program. These types are often called extensible variants and records, and Duo supports them in the form of structural data and codata types.

Structural data types are similar to polymorphic variants in, for example, OCaml. Structural codata types are similar to extensible records in, for example, Purescript. In distinction to those systems, constructors and destructors of structural data and codata types have to be declared in Duo before they can be used. This approach has the following benefits:

  • It becomes possible to identify typos when using structural data and codata.
  • The structural constructors and destructors can be documented using a docstring, which can be shown on hover.

Structural Data Types

A constructor of a structural data type is introduced with a constructor declaration.

constructor True : CBV;

When occuring in types, structural data types are written using angle brackets. The structural encoding of booleans, for example, looks like this:

< True, False >

whereas the structural encoding of Peano natural numbers looks like this

mu a. < Z , S(a)>

Structural Codata Types

A destructor of a structural codata type is introduced with a destructor declaration.

destructor Ap(CBV, return CBN) : CBN;

Structural codata types are written using braces. The structural encoding of NatStreams, for example, are written as follows:

rec a. { Head(return Nat), Tail(return a) }

Structural Refinement Types

Structural refinement types are a novel idea in Duo which combine features from both nominal and structural types. They are declared similarly to nominal types, but use the additional keyword refinement in front of the declaration

refinement data Bool : CBV { True, False };
refinement codata NatStream : CBN {
    Head(return Nat),
    Tail(return NatStream)
};

The syntax for refinement types is very similar to the syntax for structural data and codata types, but the refined type is indicated by a vertical pipe in front of the constructors/destructors. The four possible refinements of the boolean type, for example, are written like this:

< Bool | True, False >
< Bool | True >
< Bool | False >
< Bool | >

And a refinement of the codata type of NatStream is written like this:

{ NatStream | Head[Nat] }

Lattice Types

Duo uses subtyping, and some of the operations of the subtyping lattice are available in the surface syntax. In particular, unions and intersections, as well as the top and bottom elements of the subtyping lattice can be used in types.

TypeExplanation
TopThe top element of the subtyping lattice
BotThe bottom element of the subtyping lattice
S \/ TThe union of S and T
S /\ TThe intersection of S and T

Equirecursive types

Duo accepts and infers equi-recursive types of the form rec a. tau which are, for the typechecker, indistinguishable from their unfolding tau [ rec a. tau / a ].

Terms

Duo is based on a term language for the sequent calculus. In distinction to lambda calculus, the term language of natural deduction, we have a first-class treatment of both producers and consumers of each type. In addition, there is a third syntactic category, commands. Commands are executable pieces of code. For example, it is possible to define producers, consumers and commands at the toplevel of the program:

def prd someProducer := ...;
def cns someConsumer := ...;
def cmd  someCommand := ...;

Toplevel declarations of consumers and producers can additionally be annotated with a type scheme.

def prd someProducer : ... := ...;
def cns someConsumer : ... := ...;

For example, the polymorphic identity function is defined like so:

def prd id : forall (a : CBV). a -> a :=
  \x => x;

If a function is used recursively in its own body, the rec keyword is necessary:

def rec prd add := \x y => case x of {
    Z => y,
    S(x) => S(add x y)
}

There are five categories of terms in Duo:

  • Core constructs, which are independent of any concrete type.
  • Producer introduction rules
  • Consumer introduction rules
  • Producer elimination rules
  • Consumer elimination rules

Core constructs

Duo supports control operators in the form of the \(\mu\) and \(\tilde\mu\) constructs of the \(\lambda\mu\tilde\mu\) calculus. Both constructs use identical syntax.

def prd ex1 := mu x. #ExitSuccess;
def cns ex2 := mu k. #ExitSuccess;

Producer Introduction Rules

The rules for data and codata types are given separately

Data Types

Introduction rules for producers of codata types are just uses of the corresponding constructors:

data Nat : CBV { Z, S(Nat) };
def prd ex1 := S(Z);

Codata Types

Introduction rules for producers of codata types consists of copattern matching.

codata Pair : (+a : CBV, +b : CBV) -> CBV {
    PiLeft(return a),
    PiRight(return b)
};

def prd ex1 : Pair(Nat,Bool) := cocase
  { PiLeft(*) => 5,
    PiRight(*) => True
  };

def prd ex2 : Pair(Nat,Bool) := cocase
  { PiLeft(k) => 5 >> k,
    PiRight(k) => True >> k
  };

Consumer Introduction Rules

The rules for data and codata types are given separately

Data Types

A consumer of a data type is introduced using pattern matching

def cns ex1 := case {
    Z => #ExitSuccess,
    S(x) => #ExitSuccess
};

Codata Types

Introduction rules for consumers of codata types use one of the available destructors.

def cns ex1 : NatStream := Head(mu x. #ExitSuccess);

Producer Elimination Rules

The rules for data and codata types are given separately

Data Types

TODO

Codata Types

TODO

Consumer Elimination Rules

The rules for data and codata types are given separately

Data Types

TODO

Codata Types

TODO

Commands

Commands are the syntactic category that actually gets executed when you run a program. The entry point to a program, the main function, is also a command.

The Exit Commands

The simplest commands are called #ExitSuccess and #ExitFailure, which are used to terminate the program.

Example 1:

We can write a simple command which just exits directly after it is called.

def cmd exitAtOnce := #ExitSuccess;

The Apply Command

The most important command is called "Apply". In the underlying logical calculus, the sequent calculus, it corresponds to the Cut rule. The user, on the other hand, doesn't need to know about this. Apply intuitively just combines a producer with a consumer of the same type.

We use the >> symbol to express a cut, and write the producer on the left side of it, and the consumer on the right hand side.

Example 1:

The following program cuts the producer True against a pattern match on booleans, and then exits the program.

def cmd exit :=  True >> case { True => #ExitSuccess; False => #ExitSuccess };

IO Actions

There currently are only two IO actions provided; #Print and #Read.

The following program uses both to read in two numbers from the console, and to print the output back to the console.

import Prelude;

def rec prd addA := \x => \y => case x of { Z => y
                                      , S(z) => addA z S(y)
					        };

def cmd main := Read[ mu x. #Read(mu y. Print(addA x y, #ExitSuccess));

LSP Features

Further Reading

Duo is based on the work of many other researchers. The general programming style and the syntax are inspired by languages from the ML family (SML, Miranda, Haskell, OCaml, Agda, Idris, Coq). Below, we include some further reading on specific aspects of the Duo language.

Algebraic Subtyping

Type inference uses the algebraic subtyping approach. More specifically, we use constraint-based type inference with subtyping constraints. We use a bi-unification algorithm to solve these constraints and type automata to simplify the resulting types. We relied heavily on the following resources describing this approach.

Codata Types

Codata types have been known for a long time, but they haven't caught on in a popular programming language yet. We provide some standard references on codata types below.

  • "Codatatypes in ML" by Tatsuyo Hagino is the canonical early reference to codata types.
  • "Codata in Action" by Paul Downen, Zachary Sullivan, Zena Ariola and Simon Peyton Jones provides a good motivation for, and introduction to, codata types.
  • "Copatterns: programming infinite structures by observations" by Andreas Abel, Brigitte Pientka, David Thibodeau and Anton Setzer introduced copattern matching as a convenient and principled syntactic construct to work with codata types, similar to how pattern matching works for data types.
  • "Automatic Refunctionalization to a Language with Copattern Matching: With Applications to the Expression Problem" by Tillmann Rendel, Julia Trieflinger and Klaus Ostermann show how data and codata types correspond to the two different axes of extensibility in the expression problem, and how defunctionalization and refunctionalization can be used to automatically transform data intro codata types, and vice versa. Two follow-up papers, "Decomposition Diversity with Symmetric Data and Codata", and "Dualizing Generalized Algebraic Data Types by Matrix Transposition" extend this approach to nested and polymorphic programs, respectively. We plan to provide these transformations as editor actions for the Duo language.

Producer/Consumer Duality

The duality between producers and consumers of a type corresponds, via the Curry-Howard isomorphism, to the duality between proofs and refutations in logic. A proof system which exploits this duality of proofs and refutations is the Sequent Calculus, developed by Gentzen at the same time as his more popular relative, natural deduction. At its core, Duo is based on a term assignment system for the sequent calculus, the \(\lambda \mu \tilde\mu\) calculus by Pierre-Louis Curien and Hugo Herbelin. In this term assignment system, consumers correspond to a first-class notion of continuations. We only cite some of the more important references of the extensive literature on term-assignment systems for the sequent calculus.

  • TODO