Saturday, March 22, 2008
Sunday, March 16, 2008
Types at reddit
I have created a subreddit for types. (Static analysis is also welcome.)
Posted by Jim Apple at 3/16/2008 09:31:00 PM 1 comments
Friday, February 22, 2008
Types, named (was: "Name that type!"), plus two more questions
Here are the answers to the previous quiz.
The first datatype holds any factorial number of values. Here it is again, with one small change and a few de-obfuscations:
> data FacHelp (f :: * -> *) (y :: *) = > Stop y > | More (FacHelp (OneMore f) (f y)) > newtype OneMore (f :: * -> *) (y :: *) = > OneMore (y, f y) > newtype Identity y = Identity y > type Factorial y = FacHelp Identity y
Here's that datatype again, renamed:
> data LongList x = Two x x > | AndAnother x (LongList x) > data Composite x = Composite (LongList (LongList x))Update:
Chung-chieh Shan pointed out to me that I got that datatype wrong. (The other text above is correct, I think.) Anyway, here's his counterexample:
> test :: Composite Char > test = Composite (Two (Two 'a' 'b') > (AndAnother 'c' (Two 'd' 'e')))This is a correct (I think!) implementation of a container that can't be of a prime size:
> data Product f g a = Times (f (g a)) > | LeftIncr (Product (OneMore f) g a) > | RightIncr (Product f (OneMore g) a) > newtype Pair a = Pair (a,a) > type Comp = Product Pair PairEnd update
The third datatype uses the type H
, which encodes containers with a square number of elements. H
is based on the idea that the sequence of partial sums of the sequence of odd natural numbers is the same as the sequence of squares. That is, {0, 1, 1+3, 1+3+5, …} = {02, 12, 22, 32, …}
We can now use Lagrange's four-square theorem to see that question 3(b) is, strangely, a container that can hold any number of elements. An extension of this by Legendre showed [cite] that every number not of the form 4k(8m + 7) is representable by the sum of three squares, and this answers question 3(a).
Here are both parts again, renamed and cleaned up a bit:
> data SqHelp x y = End > | Extra x (SqHelp (y,y,x) y) > type Square a = SqHelp a a > data Legendre x = Legendre (Square x) (Square x) (Square x) > data Lagrange x = Lagrange (Square x) (Legendre x)Here are two more questions:
4. Can we write something analogous to 3(a) in a more direct way?
5.
> data M a b = N b > | O (M (a,b) a) > newtype P a = P (M a a)
Posted by Jim Apple at 2/22/2008 09:56:00 PM 4 comments
Labels: GADTs, nested types
Name that type! (nested types edition)
If we want to write a type for binary trees in Haskell, it is easy to do so:
> data BinaryTree a = Leaf > | InternalNode (BinaryTree a) a (BinaryTree a)The structure of the type guarantees certain things about its inhabitants. For instance, every node has exactly two children. If we want to guarantee stronger invariants, we can change the shape of the type. We can write a a type for complete binary tree (that is, binary trees in which every leaf is at the same depth), as
> data CompleteBinaryTree a = Leaves > | NonLeaves a (CompleteBinaryTree (a,a))The argument in the recursive part of this datatype is not the same as it is in the declaration. Datatypes with this property are called nested or non-regular or heterogeneous or non-uniform. We can express all sorts of invariants with these datatypes. Here's another example:
> data AList a b = Nil > | Cons a (AList b a)If we want to manipulate lists that alternate between two types
a
and b
, we could use the
type [Either a b]
, but this doesn't ensure that we don't
have two a
s or two b
s in a row. We could,
instead, use the type [(a,b)]
, but this would only allow
us to create lists of even length. By switching the position of the
parameters in the recursion in AList
, we can represent
alternating lists of any length.
Here are some other examples of data structure invariants we can ensure with nested types:
- Cyclic data structures (mirror)
- Many different types of lists with
fast
cons
,snoc
, concatenation, and indexing, including - Red-black trees (a paper, code for that paper), AVL trees (mirror, mirror 2)
- Square matrices, triangular matrices (in various works by Ralph Matthes), Toeplitz matrices, rectangular matrices (see "Manufacturing Datatypes", above)
- Efficient-merge heaps (see Okasaki's thesis & book, above), Braun trees, left-complete trees (see "Manufacturing Datatypes", above)
- Binomial queues (mirror, mirror 2)
- Tries (mirror, mirror 2, code, code mirror, old version, old version mirror) (See also Okasaki's thesis & book)
- Well-formed terms in the locally-nameless lambda calculus (mirror)
> data A x y = B (x y) > | C (A (D x) (x y)) > newtype D x y = D (y,x y) > newtype K x = K x > type QuestionOne = A K > > data E x = F x x > | G x (E x) > data QuestionTwo x = Q2 (E (E x)) > > data H x y = I > | J y x x (H (y,x) y) > type L = H () > data QuestionThreePartA x = Q3a (L x) (L x) (L x) > data QuestionThreePartB x = Q3b (L x) (QuestionThreePartA x)
Posted by Jim Apple at 2/22/2008 03:25:00 AM 7 comments
Labels: GADTs, Leibniz equality, nested types
Sunday, February 17, 2008
Leibniz and Liskov
Leibniz equality of two types a
and b
is the proposition that in any context, one can be substituted for the other:
f a -> f bA value of this type brings to the value level the proof that two types are equal. The only value of this type is the identity function.
The Liskov substitution principle is very similar to Leibniz equality, but only in the contravariant position: If a function takes an argument of type b
and a
is a subtype of b
, then the function must accept an argument of type a
. If f
is a type constructor that uses its argument only negatively, then a
is a subtype of b
if there is a value of type
f b -> f aFor example,
λx . x -> b
uses x
negatively, and any b -> b
is also trivially an a -> b
when a
is a subtype of b
. The only value of type f b -> f a
is the cast
function.
Let's write the kind for type constructors that use their argument negatively as * -(-)-> *
rather than just * -> *
, and the kind for type constructors that use their argument positively as * -(+)-> *
. Then, the following terms should validate our understanding that supertyping is dual to subtyping:
dual : ∀ a:* . ∀ b:* . ∀ g: * -(+)-> * . (∀ f: * -(-)-> * . f b -> f a) -> g a -> g b dual |a |b |g less bot = less [λx:* . g x -> g b] (λy:g b . y:g b) bot dual' : ∀ a:* . ∀ b:* . ∀ f: * -(-)-> * . (∀ g: * -(+)-> * . g a -> g b) -> f b -> f a dual' |a |b |f more top = more [λx:* . f x -> f a] (λy:f a . y:f a) topSo subtyping, like type equality, can be brought to the value domain. I now have two questions:
Posted by Jim Apple at 2/17/2008 03:02:00 PM 0 comments
Labels: Leibniz equality, Liskov substitutability
Wednesday, January 9, 2008
Extra type safety using polymorphic types as first-level refinements
This post is literate Haskell.
I will demonstrate a new technique for using polytypes as first-level refinement types. (mirror). The goal, as usual, is for types to better express program invariants and ensure programs are safe.
I'm going to demonstrate using the risers
function, as presented in Dana N. Xu's ESC/Haskell (mirror), which references Neil Mitchell's Catch.
> {-# OPTIONS -fglasgow-exts #-} > -- The LANGUAGE pragma is usually a pain for exploratory programming.
Below are the risers
functions as presented by Xu and
Mitchell. They are the same function, though slightly syntacticly
different. risers
returns the sorted sublists of a
list.
Risers has two properties that we are going to discuss:
- None of the lists in the returned value are empty
- If the argument is non-empty, the return value is also non-empty.
> risersXu :: (Ord t) => [t] -> [[t]] > risersXu [] = [] > risersXu [x] = [[x]] > risersXu (x:y:etc) = > let ss = risersXu (y : etc) > in case x <= y of > True -> (x : (head ss)) : (tail ss) > False -> ([x]) : ss > > risersMitchell :: Ord a => [a] -> [[a]] > risersMitchell [] = [] > risersMitchell [x] = [[x]] > risersMitchell (x:y:etc) = if x <= y > then (x:s):ss > else [x]:(s:ss) > where (s:ss) = risersMitchell (y:etc)
Neither one of these functions is obviously safe. Xu uses head and
tail and ESC/Haskell to create a proof of their safety. The unsafe part
of Mitchell's code is the where
clause, and Mitchell also presents a
tool to prove this safe.
Our goal will be to write this function in a safe way, with a type signature that ensures the two properties we expect from the function. We also aim to do this without having to change the shape of the code, only the list implementation we are using.
The present unsafe operations in risersXu and risersMitchell depend on
the second property of the function: non-null inputs generate non-null
outputs. We could write a type for this functions using a trusted
library with phantom types for branding (paper mirror). This technique (called lightweight static capabilities) can do this
and much else as well, but since clients lose all ability to pattern match
(even in case), risers becomes much more verbose. We could also write
a type signature guaranteeing this by using GADTs. Without using one
of these, incomplete pattern matching or calling unsafe head
and tail
on the result of the recursive call seems inevitable.
Here's another way to write the function which does away with the need for the second property on the recursive call, substituting instead the need for the first property that no lists in the return value are empty:
> risersAlt :: (Ord t) => [t] -> [[t]] > risersAlt [] = [] > risersAlt (x:xs) = > case risersAlt xs of > [] -> [[x]] > w@((y:ys):z) -> > if x <= y > then (x:y:ys):z > else ([x]):w > ([]:_) -> error "risersAlt"
The error is never reached.
Though ensuring the second property with our usual types seems tricky, ensuring the first is not too tough:
> type And1 a = (a,[a]) > > risersAlt' :: Ord a => [a] -> [And1 a] > risersAlt' [] = [] > risersAlt' (x:xs) = > case risersAlt' xs of > [] -> [(x,[])] > w@(yys:z) -> > if x <= fst yys > then (x,fst yys : snd yys):z > else (x,[]):w
It is now much easier to see that risers is safe: There is one pattern match and one case, and each is simple. No unsafe functions like head or tail are called. It does have three disadvantages, however.
First, the second property is still true, but the function type does not enforce it. This means that any other callers of risers may have to use incomplete pattern matching or unsafe functions, since they may not be so easy to transform. It is my intuition that it is not frequently the case that these functions are tricky to transform, but perhaps Neil Mitchell disagrees.
We could fix this by writing another risers function with
type And1 a -> And1 (And1 a)
, but this brings us to
the second problem: And1 a
is not a subtype
of [a]
. This means that callers of our hypothetical other
risers
function (as well as consumers of the output
from risersAlt'
) must explicitly coerce the results back
to lists.
Finally, if we are wrong about the first property, and risers does
return an empty list for some non-empty input i
, then for any x
,
risers (x:i)
is _|_
, while risersAlt'
(x:i)
is [(x,[])]
. Thus, the equivalence of these
two function definitions depends on the truth of the second property
on the first function, which is something we were trying to get out of
proving in the first place! Of course, if we're interested in the correctness of
risersAlt'
, rather than its equivalence with risersXu
or
risersMitchell
, then it is not to difficult to reason
about. But part of the point of this was to get they compiler to do some
of this reasoning for us, without having to change the shape of the
code.
Let's write more expressive signatures using something we may have noticed when using GHCi. The only value of type forall a. [a]
(excluding lists of _|_s) is []
. Every value of type forall a . Maybe a
is Nothing
, and every forall a . Either a Int
has an Int
in Right
. Andrew Koenig noticed something similar when learning ML (archived), and the ST monad operates on a similar principle. (paper with details about ST, mirror)
> data List a n = Nil n > | forall r . a :| (List a r)
The only values of type forall a . List a n
use the Nil
constructor, and the only values of type forall n . List a n
use the :|
constructor.
> infixr :| > > box x = x :| Nil () > > type NonEmpty a = forall n . List a n > > onebox :: NonEmpty Int > onebox = box 1 > > onebox' :: List Int Dummy > onebox' = onebox > > data Dummy > > -- This doesn't compile > -- empty :: NonEmpty a > -- empty = Nil ()
Unfortunately, we'll be forced to
Here is a transformed version of Mitchell's risers:
Since we can't put the recursive call in a where clause, we must use a case with some dead code. The type annotations are commented out here to show they are not needed, but uncommenting them shows that the recursive call really does return a non-empty lists, and so the
This type signature ensures both of the properties listed when introducing
Here we see that the type annotation isn't necessary to infer that
This is the first version of
With first-class existentials, this would look just like Xu's
For contrast, here is a GADT implementation of risers:
This is safe and has its safety checked by GHC. It also does not require existentials, though when using this encoding, many other list functions (such as filter) will.
Now here is a lightweight static capabilities version of risers:
There is a good bit of code that must go in the protected module, including two different functions for case dispatch. These functions are used instead of pattern matching, and make the definition of risers much more verbose, though I may not have written it with all the oleg-magic possible to make it more usable.
Out of the three, I think GADTs are still the winning approach, but it's fun to explore this new idea, especially since Haskell doesn't have traditional subtyping.NonEmpty a
is a subtype of List a x
for all types
x
.
> data Some fa = forall n . Some (fa n)
>
> safeHead :: NonEmpty a -> a
> safeHead x = unsafeHead x where
> unsafeHead (x :| _) = x
>
> safeTail :: NonEmpty a -> Some (List a)
> safeTail x = unsafeTail x where
> unsafeTail (_ :| xs) = Some xs
Some
and un-Some
some values, since Haskell does not have first-class existentials, and it takes some thinking to see that safeHead
and safeTail
are actually safe.
> risersMitchell' :: Ord a => List a n -> List (NonEmpty a) n
> risersMitchell' (Nil x) = (Nil x)
> risersMitchell' (x :| Nil _) = box (box x)
> risersMitchell' ((x {- :: a -}) :| y :| etc) =
> case risersMitchell' (y :| etc) {- :: NonEmpty (NonEmpty a) -} of
> Nil _ -> error "risersMitchell'"
> s :| ss -> if x <= y
> then (x :| s) :| ss
> else (box x) :| s :| ss
Nil
case really is dead code.
risers
. The key to the non-empty-arguments-produce-non-empty-results property is that the variable n
in the signature is used twice. That means applying risersMitchell'
to a list with a fixed (or existential) type as its second parameter can't produce a NonEmpty
list.
> risersXu' :: Ord a => List a r -> List (NonEmpty a) r
> risersXu' (Nil x) = Nil x
> risersXu' (x :| Nil _) = box (box x)
> risersXu' (((x :: a) :| y :| etc) :: List a r) =
> let ss = risersXu' (y :| etc)
> in case x <= y of
> True -> case safeTail ss of
> Some v -> (x :| (safeHead ss)) :| v
> False -> (box x) :| ss
risers
applied to a non-empty list returns a non-empty list. The value ss
isn't given a type signature, but we can apply safeHead
and safeTail
. The case matching on safeTail
is the pain of boxing up existentials.
risers
with a type signature that gives us the original invariant Xu and Mitchell can infer, as well as calling no unsafe functions and containing no incomplete case or let matching. It also returns a list of lists, just like the original function, and has a definition in the same shape.
risers
(modulo built-in syntax for lists). With let binding for polymorphic values, risersMitchell'
would look just like Mitchell's original risers, but be safe by construction. Let binding for polymorphic values would also allow non-trusted implementations of safeTail
and
safeHead
to be actually safe.
> data GList a n where
> GNil :: GList a IsNil
> (:||) :: a -> GList a m -> GList a IsCons
>
> infixr :||
>
> data IsNil
> data IsCons
>
> gbox x = x :|| GNil
>
> risersMitchellG :: Ord a => GList a n -> GList (GList a IsCons) n
> risersMitchellG GNil = GNil
> risersMitchellG (c :|| GNil) = gbox $ gbox c
> risersMitchellG (x :|| y :|| etc) =
> case risersMitchellG (y :|| etc) of
> -- GHC complains, "Inaccessible case alternative: Can't match types `IsCons' and `IsNil'
> -- In the pattern: GNil"
> -- GNil -> error "risers"
> s :|| ss ->
> if x <= y
> then (x :|| s) :|| ss
> else (gbox x) :|| s :|| ss
> -- module Protected where
>
> -- Export type constructor, do not export value constructor
> newtype LWSC b a = LWSC_do_not_export_me [a]
>
> data Full
> type FullList = LWSC Full
>
> data Any
>
> lnil :: LWSC Any a
> lnil = LWSC_do_not_export_me []
>
> lcons :: a -> LWSC b a -> FullList a
> lcons x (LWSC_do_not_export_me xs) = LWSC_do_not_export_me (x:xs)
>
> lwhead :: FullList a -> a
> lwhead (LWSC_do_not_export_me x) = head x
>
> data Some' f n = forall a . Some' (f a n)
>
> lwtail :: FullList a -> Some' LWSC a
> lwtail (LWSC_do_not_export_me a) = Some' (LWSC_do_not_export_me (tail a))
>
> deal :: LWSC b a -> LWSC Any c -> (FullList a -> FullList c) -> LWSC b c
> deal (LWSC_do_not_export_me []) _ _ = LWSC_do_not_export_me []
> deal (LWSC_do_not_export_me x) _ f =
> case f (LWSC_do_not_export_me x) of
> LWSC_do_not_export_me z -> LWSC_do_not_export_me z
>
> nullcase :: LWSC b a -> c -> (FullList a -> c) -> c
> nullcase (LWSC_do_not_export_me []) z _ = z
> nullcase (LWSC_do_not_export_me x) _ f = f (LWSC_do_not_export_me x)
>
> -- module Risers where
>
> lbox x = lcons x lnil
>
> risersXuLW :: Ord a => LWSC b a -> LWSC b (FullList a)
> risersXuLW x =
> deal x
> lnil
> (\z -> let x = lwhead z
> in case lwtail z of
> Some' rest ->
> nullcase rest
> (lbox (lbox x))
> (\yetc ->
> let y = lwhead yetc
> etc = lwtail yetc
> ss = risersXuLW yetc
> in if x <= y
> then case lwtail ss of
> Some' v -> lcons (lcons x $ lwhead ss) v
> else lcons (lbox x) ss))
Posted by Jim Apple at 1/09/2008 11:02:00 PM 1 comments
Labels: GADTs, lightweight static capabilities, polymorphism, refinement types
Sunday, January 6, 2008
POPL 2008 and affiliated events
This week is POPL and affiliated events. I'll be at the first and third days of VMCAI, the Coq tutorial, and POPL itself. Unfortunately, I'll have to miss PEPM.
Posted by Jim Apple at 1/06/2008 05:36:00 PM 0 comments
Labels: conferences