Duo
About
Duo is a statically-typed, functional research programming language exploring ideas of duality and subtyping. The general programming style is similar to programming languages in the ML tradition (SML, Miranda, Haskell, OCaml). This means that the language provides a rich language for defining new types, and these types can be inferred using a powerful type inference mechanism. Programs use familiar pattern matching constructs to work with algebraic datatypes, but also offer less familiar constructs like codata types and copattern matching. The language de-emphasizes mutation, but we explore efficiency-oriented "on-the-metal" implementation issues.
Guiding Ideas
Duo is a language focused on two central ideas; Duality and Subtyping.
-
Duality: Duality is a well-known principle in logic, type theory and programming language theory. For example:
- Monads are dual to Comonads.
- Call-by-value evaluation is dual to call-by-name evaluation.
- Data types are dual to codata types.
- Producers/proofs are dual to consumers/refutations.
- Sending is dual to receiving.
Duo takes such dualities and explores how they can be used as a guiding principle in programming language design. The first-class treatment of consumers, in particular, allows for new ways to express programs which could otherwise only be expressed using continuation-passing style.
-
Subtyping: Subtyping in programming languages is mostly associated with object-oriented programming and inheritance. Part of the reason for this association is that statically-typed functional programming languages heavily rely on type-inference, and it was widely thought that good type inference is incompatible with subtyping. We use the recently developed algebraic-subtyping approach to infer types. This approach allows to infer principal types with minimal need for user-provided type annotations, even in the presence of subtyping and parametric polymorphism.
About Us
Duo is developed on GitHub at the chair for programming languages at the University of Tübingen.
Installation
Duo is implemented in Haskell and can be compiled from the sources using the stack
build tool.
Obtaining the stack build tool
There are two recommended ways to install stack
on your system; only installing stack
or using ghcup
to manage your stack installation.
- In order to just install the stack tool, follow the instructions on their website here.
- You can use
ghcup
to manage multiple different versions of the ghc Haskell compiler and other tools from the Haskell ecosystem, including stack. To install stack usingghcup
, follow the instructions on the ghcup website here.
Verify that your installation was succesfull and that the binary is on your path:
> stack --version
Version 2.7.3, Git revision 7927a3...
Building Duo from the sources
Clone the duo repository to your system:
git clone git@github.com:duo-lang/duo-lang.git
Then, change into the duo-lang directory and use stack to build the binary
> stack build
During the first compilation this can take a while, since a lot of the dependencies have to be compiled as well. Subsequent compilations will be much faster.
In order to use the editor plugin and the LSP server, the duo
binary has to be on the path.
We use stack install
for this.
> stack install
It is possible that you have to add the directory that the program was installed to to your $PATH
environment variable.
Verify that the duo binary has been correctly installed by querying it for its version:
> duo --version
Duo Version: 0.1.0.0
Git Commit: 81ec55800707cfdcd43a685d2e6fecd44a0429b8
Git Branch: main
Editor Integration
Editor integration is currently available for VSCode and vim. The editor integration uses the language server protocol, so it should be possible to support other editors in the future.
Installing the VSCode extension
We provide a prebuilt .vsix
extension which can be installed as an extension to VSCode.
The prebuilt extension is available as a release on our GitHub page .
Once you have downloaded the extension, use the following command to install the extension as a plugin in VSCode.
code --install-extension duo-lang-client-0.0.1.vsix
The extension itself provides syntax highlighting and indentation support.
The advanced features are implemented using a language server, and require the duo
binary to be available on the path.
Installing the vim extension
For installing the vim extension, please follow the instructions on the corresponding GitHub page.
Toplevel Declarations
The following declarations are available at the toplevel of a Duo module.
Import statements
Other Duo modules can be imported using an import
declaration.
The following snippet imports the two modules Function
and I64
, which have to be declared in corresponding files Function.duo
and I64.duo
.
These files have to be available in the library search path.
import Function;
import I64;
Circular module imports are not allowed and will generate an error. It is currently not possible to only import some of the declarations of a module, or to import a module qualified.
Type synonyms
Type synonyms can be defined using the type
keyword.
In the following snippet, a type synonym SBool
is defined as an alias for the structural type < True, False >
. For the typechecker the two types SBool
and < True, False >
are indistinguishable.
constructor True;
constructor False;
type SBool := < True, False >;
Parameterized type synonyms are currently not yet implemented.
Type operators
Duo supports binary type operators with user-specified associativity and precedence.
For example, the function type and the ->
operator are specified in the following manner:
-- | The function type
codata Fun : (-a : CBV, +b : CBV) -> CBN {
Ap(a,return b)
};
type operator -> rightassoc at 0 := Fun;
This first defines the codata type Fun
and then introduces the arrow symbol as a custom notation for the Fun
type with right associativity and precedence level 0.
Every data and codata type has to be introduced with a lexical identifier, Fun
in this example, before a type operator can be introduced for the type.
The right-hand side of a type operator declaration expects the name of a typeconstructor with arity 2.
Nullary and unary type operators are currently not implemented.
Data and codata declarations
Constructor and destructor declarations for structural types
Structural types, i.e. polymorphic variants and structural records, require their respective constructors and destructors to be declared at the toplevel before they can be used. This is different to, for example, OCaml, where a backtick in front of the name of the constructor is used. This design was chosen for the following reasons:
- Docstrings can be attached to the constructors at the declaration site, and these docstrings can be displayed on hover at the respective use sites.
- Typos in the names of constructors of polymorphic variants can be detected early.
- Since the evaluation of expressions in Duo is kind-directed, we need a place to specify how constructors and destructors of structural types are to be evaluated.
Constructor declarations
In order to define a structural variant of booleans, the following two declarations can be used.
constructor True;
constructor False;
In this case, no evaluation order has been specified, and in the case of constructors this defaults to CBV
.
I.e. the above declaration is a shorthand for the more explicit declaration:
constructor True : CBV;
constructor False : CBV;
For constructors which take arguments the user has to specify the evaluation order used for the respective argument position. Structural peano natural numbers where the argument to the successor constructor is evaluated eagerly are declared as follows:
constructor Z;
constructor S(CBV);
which, again, is shorthand for
constructor Z : CBV;
constructor S(CBV) : CBV;
These constructors then occur in structural data types like < True, False>
or < S(< Z >)>
.
Destructor declarations
Destructor declarations are similar to constructor declarations, with one difference. If you don't specify a evaluation order for the destructor, it defaults to CBN evaluation. Hence, the declaration
destructor Name(CBV);
is shorthand for the declaration
destructor Name(CBV) : CBN;
These destructors then occur in structural codata types like { Name(String) }
.
Producer declarations
A producer is declared using the def prd
syntax.
An optional type signature can be specified.
def prd two := S(S(Z));
def prd three: Nat := S(S(S(Z)));
def prd id: forall a. a -> a := \x => x;
By default, producers defined using the def prd
keywords are not recursive.
For recursive definitions, the keyword rec
has to be added.
Consumer declarations
A consumer is declared using the def cns
syntax.
An optional type signature can be specified.
def cns exitOnBool := case {
True => #ExitSuccess,
False => #ExitFailure
};
def cns exitOnBool : Bool := case {
True => #ExitSuccess,
False => #ExitFailure
};
By default, consumers defined using the def cns
keywords are not recursive.
For recursive definitions, the keyword rec
has to be added.
Command declarations
A command is declared using the def cmd
syntax.
Commands have no type, and it is therefore not possible to write a type signature.
def cmd exit := #ExitFailure;
def cmd main := #ExitSuccess;
The command called "main" is treated specially, since execution starts with the execution of the "main" command.
Type Class declarations
A type class is declared using the class
keyword.
Currently only single parameter type classes are supported, thus a single type variable together with its kind and variance have to be declared.
class Show(+a : CBV){
Show(a, return String)
};
Inside the body of the class declaration an arbitrary number of type class methods can be defined. The signature has to respect the variance of the type variable above. Method and type class name live in different name spaces, so it is possible to overload them.
Instance declarations
An instance of a type class is declared using the instance
keyword.
Unlike in languages such as Haskell, instances have to be named.
instance showBool : Show Bool {
Show(b, k) => case b of {
True => MkString("True"),
False => MkString("False")
} >> k
};
Type class methods are implemented as commands and can be used as such, e.g.:
def cmd main := Show(True, mu x. #Print(x, #ExitSuccess));
Kinds
Most languages choose one evaluation strategy and use that strategy globally.
Strict languages like OCaml, SML or F# choose a call-by-value (CBV) evaluation strategy, whereas non-strict languages choose either call-by-name (CBN) or call-by-need.
Instead of comitting to a global evaluation strategy, the evaluation strategy in Duo is specified for every type individually, and this choice is then reflected in the kind of that type.
It is possible, for example, to have both strict booleans and lazy booleans within the same language.
The difference between strict booleans and lazy booleans is then recorded on the level of kinds.
For strict booleans we would have StrictBool : CBV
, and for lazy booleans LazyBool : CBN
, where CBV
and CBN
replace the single kind *
of inhabitated types that other languages have.
Evaluation Orders
Duo currently supports two different evaluation strategies. The two evaluation strategies are call-by-value and call-by-name.
Evaluation Order | Explanation |
---|---|
CBV | Call-by-value |
CBN | Call-by-name |
Inhabitated Kinds
The different evaluation strategies are reflected in the kinds of types.
Instead of having only one inhabitated kind, usually written *
in the type theory literature, there are several different kinds which classify inhabitated types.
These inhabitated kinds are called "MonoKind". Currently there are six different MonoKinds, one for each of boxed values of CBV and CBN types, and one for each of the primitive unboxed types.
MonoKind | Explanation |
---|---|
CBV | The kind of boxed CBV types |
CBN | The kind of boxed CBN types |
I64Rep | The kind of the unboxed #I64 type |
F64Rep | The kind of the unboxed #F64 type |
CharRep | The kind of the unboxed Char type |
StringRep | The kind of the unboxed String type |
Higher Kinds
Higher kinds are the kinds given to type constructors.
Since the only available type constructors in Duo construct CBV or CBN types, they follow the following restricted grammar.
Each argument in a PolyKind has a variance, either +
for covariant arguments or -
for contravariant arguments, a type variable a
, and a
MonoKind of type type variable, mk
.
PolyKind | Explanation |
---|---|
(+-a : mk)* -> eo | Polykind |
CBV | Syntactic sugar for () -> CBV |
CBN | Syntactic sugar for () -> CBN |
Types
Duo currently has the following general classes of types:
- Primitive types which are hardwired in the compiler and cannot be defined by the user.
- Nominal types include the familiar algebraic data types from languages like Haskell, Scala, ML or Rust, but also their dual: codata types.
- Structural types
- Refinement types
- Lattice types are the types which stem from the availability of subtyping in Duo. Lattice types include unions, intersections, top and bottom types.
- Equi-recursive types are types
rec a. tau
which are indistinguishable from their unfoldingtau [rec a.tau / a]
.
Primitive Types
There are four builtin types which correspond to numeric types supported by most modern architectures, as well as types for characters and strings. The builtin types are given in the following table.
Type | Kind | Explanation |
---|---|---|
#I64 | I64Rep | Unboxed signed 64-Bit integers |
#F64 | F64Rep | Unboxed 64-Bit precision floating point numbers |
#Char | CharRep | Unboxed character |
#String | StringRep | Unboxed string |
Each Builtin type has its own kind, which reflects the fact that these types are unboxed.
Since they are unboxed, it is in general not possible to apply a polymorphic function to an argument of a builtin type.
The standard library provides wrappers in the std/Prim/I64.duo
, std/Prim/F64.duo
and std/Prim/Strings.duo
modules which provide the boxed variants of the builtin unboxed types.
Nominal Types
Duo has two different variants of nominal types; nominal data types and nominal codata types.
Nominal Data Types
Nominal data types are declared using the data
keyword.
For example, the following two declarations declare Booleans and Peano natural numbers, respectively.
data Bool : CBV { True, False };
data Nat : CBV { Zero, Succ(Nat) };
Note that we have to indicate the evaluation order we want to use for that type. We can also use nominal data types to declare the unit and void type.
data Unit : CBV { MkUnit };
data Void : CBV {};
When we declare parameterized data types, we have to declare both the kind of the type parameter, and its variance. For example, we can declare the type of lists of CBV types.
data List : (+a : CBV) -> CBV {
Nil,
Cons(a, List(a))
};
In order for the algebraic subtyping approach to work, all type parameters have to be either covariant or contravariant. It currently isn't possible to use invariant type parameters, but such a feature might be added as syntactic sugar at a future point.
Nominal Codata Types
Codata types are introduced using the codata
keyword.
One of the standard examples of a codata type is the example of an infinite stream of natural numbers, called NatStream
.
A stream has two possible observations, Head
and Tail
, which we call the destructors of the codata type.
The destructor Head
returns the first element of the stream, and the destructor Tail
returns the remainder of the stream.
codata NatStream : CBN {
Head(return Nat),
Tail(return NatStream)
};
Codata types can also be parameterized, similar to the List example of the previous section.
codata Stream : (+a : CBV) -> CBN {
Head(return a),
Tail(return Stream(a))
};
Codata types make it possible to define the function type, instead of having a builtin function type like most functional programming languages. The declaration of the function type looks like this.
codata Fun : (-a : CBV, +b CBV) -> CBN {
Ap(a, return b)
}
Restriction on lattice types in arguments
It is not possible to use lattice types in the arguments of constructors or destructors of data and codata types. For example, the following type is not allowed:
data Foo : CBN {
Bar(Top)
};
Without this restriction, lattice types will occur in positions in which they are not allowed.
For the example above, Top
occurs in a negative position in the typing rule for the constructor:
Gamma |- e :prd tau tau <: Top
------------------------------------T-Ctor
Gamma |- Bar(e) :prd Foo
but in a positive position in the typing rule for the destructor:
Gamma, x : Top |- c :cmd
--------------------------------------- T-Match
Gamma |- case { Bar(x) => c } :cns Foo
Structural Types
Nominal data and codata types define a new type together with all of its constructors (resp. destructors). But sometimes it is more useful to have a more structural type, where we can introduce new constructors and destructors at arbitrary places in the program. These types are often called extensible variants and records, and Duo supports them in the form of structural data and codata types.
Structural data types are similar to polymorphic variants in, for example, OCaml. Structural codata types are similar to extensible records in, for example, Purescript. In distinction to those systems, constructors and destructors of structural data and codata types have to be declared in Duo before they can be used. This approach has the following benefits:
- It becomes possible to identify typos when using structural data and codata.
- The structural constructors and destructors can be documented using a docstring, which can be shown on hover.
Structural Data Types
A constructor of a structural data type is introduced with a constructor
declaration.
constructor True : CBV;
When occuring in types, structural data types are written using angle brackets. The structural encoding of booleans, for example, looks like this:
< True, False >
whereas the structural encoding of Peano natural numbers looks like this
mu a. < Z , S(a)>
Structural Codata Types
A destructor of a structural codata type is introduced with a destructor
declaration.
destructor Ap(CBV, return CBN) : CBN;
Structural codata types are written using braces. The structural encoding of NatStreams, for example, are written as follows:
rec a. { Head(return Nat), Tail(return a) }
Structural Refinement Types
Structural refinement types are a novel idea in Duo which combine features from both nominal and structural types.
They are declared similarly to nominal types, but use the additional keyword refinement
in front of the declaration
refinement data Bool : CBV { True, False };
refinement codata NatStream : CBN {
Head(return Nat),
Tail(return NatStream)
};
The syntax for refinement types is very similar to the syntax for structural data and codata types, but the refined type is indicated by a vertical pipe in front of the constructors/destructors. The four possible refinements of the boolean type, for example, are written like this:
< Bool | True, False >
< Bool | True >
< Bool | False >
< Bool | >
And a refinement of the codata type of NatStream is written like this:
{ NatStream | Head[Nat] }
Lattice Types
Duo uses subtyping, and some of the operations of the subtyping lattice are available in the surface syntax. In particular, unions and intersections, as well as the top and bottom elements of the subtyping lattice can be used in types.
Type | Explanation |
---|---|
Top | The top element of the subtyping lattice |
Bot | The bottom element of the subtyping lattice |
S \/ T | The union of S and T |
S /\ T | The intersection of S and T |
Equirecursive types
Duo accepts and infers equi-recursive types of the form rec a. tau
which are, for the typechecker, indistinguishable from their unfolding tau [ rec a. tau / a ]
.
Terms
Duo is based on a term language for the sequent calculus. In distinction to lambda calculus, the term language of natural deduction, we have a first-class treatment of both producers and consumers of each type. In addition, there is a third syntactic category, commands. Commands are executable pieces of code. For example, it is possible to define producers, consumers and commands at the toplevel of the program:
def prd someProducer := ...;
def cns someConsumer := ...;
def cmd someCommand := ...;
Toplevel declarations of consumers and producers can additionally be annotated with a type scheme.
def prd someProducer : ... := ...;
def cns someConsumer : ... := ...;
For example, the polymorphic identity function is defined like so:
def prd id : forall (a : CBV). a -> a :=
\x => x;
If a function is used recursively in its own body, the rec
keyword is necessary:
def rec prd add := \x y => case x of {
Z => y,
S(x) => S(add x y)
}
There are five categories of terms in Duo:
- Core constructs, which are independent of any concrete type.
- Producer introduction rules
- Consumer introduction rules
- Producer elimination rules
- Consumer elimination rules
Core constructs
Duo supports control operators in the form of the \(\mu\) and \(\tilde\mu\) constructs of the \(\lambda\mu\tilde\mu\) calculus. Both constructs use identical syntax.
def prd ex1 := mu x. #ExitSuccess;
def cns ex2 := mu k. #ExitSuccess;
Producer Introduction Rules
The rules for data and codata types are given separately
Data Types
Introduction rules for producers of codata types are just uses of the corresponding constructors:
data Nat : CBV { Z, S(Nat) };
def prd ex1 := S(Z);
Codata Types
Introduction rules for producers of codata types consists of copattern matching.
codata Pair : (+a : CBV, +b : CBV) -> CBV {
PiLeft(return a),
PiRight(return b)
};
def prd ex1 : Pair(Nat,Bool) := cocase
{ PiLeft(*) => 5,
PiRight(*) => True
};
def prd ex2 : Pair(Nat,Bool) := cocase
{ PiLeft(k) => 5 >> k,
PiRight(k) => True >> k
};
Consumer Introduction Rules
The rules for data and codata types are given separately
Data Types
A consumer of a data type is introduced using pattern matching
def cns ex1 := case {
Z => #ExitSuccess,
S(x) => #ExitSuccess
};
Codata Types
Introduction rules for consumers of codata types use one of the available destructors.
def cns ex1 : NatStream := Head(mu x. #ExitSuccess);
Producer Elimination Rules
The rules for data and codata types are given separately
Data Types
TODO
Codata Types
TODO
Consumer Elimination Rules
The rules for data and codata types are given separately
Data Types
TODO
Codata Types
TODO
Commands
Commands are the syntactic category that actually gets executed when you run a program. The entry point to a program, the main function, is also a command.
The Exit Commands
The simplest commands are called #ExitSuccess
and #ExitFailure
, which are used to terminate the program.
Example 1:
We can write a simple command which just exits directly after it is called.
def cmd exitAtOnce := #ExitSuccess;
The Apply Command
The most important command is called "Apply". In the underlying logical calculus, the sequent calculus, it corresponds to the Cut rule. The user, on the other hand, doesn't need to know about this. Apply intuitively just combines a producer with a consumer of the same type.
We use the >>
symbol to express a cut, and write the producer on the left side of it, and the consumer on the right hand side.
Example 1:
The following program cuts the producer True
against a pattern match on booleans, and then exits the program.
def cmd exit := True >> case { True => #ExitSuccess; False => #ExitSuccess };
IO Actions
There currently are only two IO actions provided; #Print
and #Read
.
The following program uses both to read in two numbers from the console, and to print the output back to the console.
import Prelude;
def rec prd addA := \x => \y => case x of { Z => y
, S(z) => addA z S(y)
};
def cmd main := Read[ mu x. #Read(mu y. Print(addA x y, #ExitSuccess));
LSP Features
Further Reading
Duo is based on the work of many other researchers. The general programming style and the syntax are inspired by languages from the ML family (SML, Miranda, Haskell, OCaml, Agda, Idris, Coq). Below, we include some further reading on specific aspects of the Duo language.
Algebraic Subtyping
Type inference uses the algebraic subtyping approach. More specifically, we use constraint-based type inference with subtyping constraints. We use a bi-unification algorithm to solve these constraints and type automata to simplify the resulting types. We relied heavily on the following resources describing this approach.
- "Algebraic Subtyping", the PhD thesis of Stephen Dolan.
- "The Simple Essence of Algebraic Subtyping" by Lionel Parreaux.
- "Polymorphism, Subtyping and Type Inference in MLsub" by Stephen Dolan and Alan Mycroft.
Codata Types
Codata types have been known for a long time, but they haven't caught on in a popular programming language yet. We provide some standard references on codata types below.
- "Codatatypes in ML" by Tatsuyo Hagino is the canonical early reference to codata types.
- "Codata in Action" by Paul Downen, Zachary Sullivan, Zena Ariola and Simon Peyton Jones provides a good motivation for, and introduction to, codata types.
- "Copatterns: programming infinite structures by observations" by Andreas Abel, Brigitte Pientka, David Thibodeau and Anton Setzer introduced copattern matching as a convenient and principled syntactic construct to work with codata types, similar to how pattern matching works for data types.
- "Automatic Refunctionalization to a Language with Copattern Matching: With Applications to the Expression Problem" by Tillmann Rendel, Julia Trieflinger and Klaus Ostermann show how data and codata types correspond to the two different axes of extensibility in the expression problem, and how defunctionalization and refunctionalization can be used to automatically transform data intro codata types, and vice versa. Two follow-up papers, "Decomposition Diversity with Symmetric Data and Codata", and "Dualizing Generalized Algebraic Data Types by Matrix Transposition" extend this approach to nested and polymorphic programs, respectively. We plan to provide these transformations as editor actions for the Duo language.
Producer/Consumer Duality
The duality between producers and consumers of a type corresponds, via the Curry-Howard isomorphism, to the duality between proofs and refutations in logic. A proof system which exploits this duality of proofs and refutations is the Sequent Calculus, developed by Gentzen at the same time as his more popular relative, natural deduction. At its core, Duo is based on a term assignment system for the sequent calculus, the \(\lambda \mu \tilde\mu\) calculus by Pierre-Louis Curien and Hugo Herbelin. In this term assignment system, consumers correspond to a first-class notion of continuations. We only cite some of the more important references of the extensive literature on term-assignment systems for the sequent calculus.
- TODO