4. Types¶
4.1. Type variables¶
Types in Haskell may be parameterized over another type, which is not known at the time of defining the former type. This system is very similar to generics in many languages, but much more powerful as the type information is fully preserved.
The naming rules for type variables are the same as for Bindings. [1]
The whole type is then written as first the type name followed by a space and then followed by the parameters, also space separated. This is also called juxtaposition.
As an example for a parameterized type is the Either a b
type.
The name of the type is Either
and it is parameterized by a type variable a
and a type variable b
.
Note that there is no special significance to the name of the type variables themselves.
It would be semantically equivalent to call the type Either one the_other
.
Only if we were to name both variables the same would we change the meaning, because Either a a
would mean both types Either
is parameterized over are the same type.
We have now seen the type in its generic form.
By instantiating the type variables we can create a concrete form.
For instance Either Int String
or Either Bool Char
.
Note that Either a b
does not mean that a
and b
have to be distinct, but they are allowed to.
Either Int Int
is also a perfectly valid concrete form of Either a b
.
At compile time all of the type parameters must be known, i.e. only concrete form of types are allowed. The compiler will infer the concrete values of the type variables for you.
Note that if you wish to annotate a type which uses type variables you will have to fill in the concrete types for those variables unless they are unused. An example:
As you can see from the definition of Either
each type variable is used in one of the constructors.
If you now create one of theses values and wish to annotate it with a type you have to fill in the respective type variable.
However you do not have to fill in the second variable.
For instance if you create a Left
value, lets say containing a String
it does not matter what type b
is in the resulting Either
, because the Left
constructor only uses the a
variable and therefore the compiler will allow you to write anything for a
including a type variable (which means it can be anything).
If however you have an expression like the if
which may either return Left
or Right
you have to fill in both types properly.
data Either a b = Left a | Right b
x :: Either String b
x = Left "A String"
y :: Either a Int
y = Right 1
x_and_y :: Either String Int
x_and_y = if someBool then x else y
We could also have annotated x
and y
with concrete types for the respective other variable, however in that case we must make it the type the if
expression expects it to be or we get a type error.
Therefore is is usually advisable to leave the type unspecified unless necessary.
data Either a b = Left a | Right b
-- these definitions are ok
-- because the type lines up with the if expression
x :: Either String Int
x = Left "A String"
y :: Either String Int
y = Right 1
-- these definitions are problematic
-- they would cause a type error
x :: Either String Bool
x = Left "A String"
y :: Either (Either String String) Int
y = Right 1
x_and_y :: Either String Int
x_and_y = if someBool then x else y
If you don’t know the type of an expression but wish to annotate it or you don’t know the value of one of the type variables you can use a so called “type hole” to have the compiler figure it out for you.
If you annotate an expression with _
the compiler will throw an error and tell you what it infers the type for _
to be.
You can use multiple _
at the same time each of which will cause a compile error with information about the inferred type.
This can be used for full type signatures or even just parts of it, including type variables.
GHC generally tries to infer the most general type for you.
-- infer a full type signature
x :: _
x = Left "A String"
-- Infer a variable
y :: Either a _
y = Right 1
4.2. User defined types¶
Defining types in Haskell takes three forms.
4.2.1. Aliases¶
The type
keyword allows us to define a new name for an existing type.
This can have two different purposes:
- It allows us to define shorter names for long type.
For instance
type MakerM a = StateT (ALongStateName String Bool (HashMap Text Int)) (LoggingT IO) a
- We can abstract our API from the concrete type.
If our program uses a Map like structure for instance, but we are not sure yet that we want to stick with a concrete Map type we might write the following:
type MyMap key value = HashMap key value -- or (omitting the `value` variable) type MyMap key = HashMap key -- or (omitting both the `value` and `key` variable) type MyMap = HashMap
We can then later replace it with a different map type if we like and we do not need to change all of our type signatures.
type MyMap = Map
As you can see from these examples like in function signatures type aliases support polymorphism via type variables and the type variables support partial application like functions.
4.2.2. Algebraic datatypes¶
Algebraic datatypes are the “normal” user defined datatypes in Haskell. They are richer than datatypes from other languages such as Java classes or C structs in that each type can have more (or less) than one representation. Some modern languages such as Rust and Swift also support those types of data. They call them Enums.
A type is defined using the data
keyword, followed by the name of the type, which must begin with an upper case letter (see also here), followed by an equal sign.
This is followed by any number of |
separated constructor definitions.
data Coordinates = LongAndLat Int Int
data File = TextFile String | Binary Bytes
A constructor definition takes the form of first the constructor itself, followed by any number of type arguments, which are the types of the fields in the constructor. The naming constraints for the constructor are the same as for Types.[#type-operators]
Constructors serve two purposes.
- They are used, through normal function application, to construct a value of their type.
You can think of any constructor (like
Coordinates
) as a function, which takes arguments according to the number and type of its fields and produces a value of its type.LongAndLat :: Int -> Int -> Coordinates
These constructors can be used just just like any other function, which includes partial application and being arguments to higher order functions.
LongAndLat 8 :: Int -> Coordinates map (LongAndLat 9) [0,9,15] == [LongAndLat 9 0, LongAndLat 9 9, LongAndLat 9 15]
They are used in a pattern match to deconstruct a value of their type and gain access to its fields. (See next section)
It is very important to know the difference between a type(name) and a constructor in Haskell. Also not that it is allowed for a type and a constructor with the same name to be in scope, as the distinction between the two can be made from the context in which they are used. Type names only ever occur in a place where a type can occur, such as in the definition of another type and type signatures whereas a Constructor can occur in any expression.
4.2.3. Newtypes¶
Newtypes are basically a stricter version of the type
alias.
To be more concrete a newtype
is a wrapper for another type which completely hides the wrapped type.
The syntax is very similar to a data
definition, with two important restrictions.
- The newtype must have exactly one constructor.
- The constructor must have exactly one field.
What is so special about the newtype is that even though it may look like a data
definition the newtype does not exist at runtime and thus has no runtime overhead.
It is typically used to impose some restrictions on the creation of a type.
Whereas aliases created with type
may be used in just the same way that the type they alias can be used a newtype
creates a completely new type and the functions which work on the inner type do not work on the new type.
In the following example for instance we force the user to go through the createEmail
function to construct an Email
type.
If we used a type
alias the user could simply pass a String
to the sendEmail
function, because it is just an alias, but types created with newtype
are distinct from the type they wrap and thus this would cause a type error.
newtype Email = Email String
createEmail :: String -> Either String Email
createEmail str =
if conformsToEmailStandard str
then Right (Email str)
else Left "This is not a valid email"
sendEmail :: Email -> String -> IO ()
4.2.4. Using type variables¶
To use a type variable in a type you are defining yourself there is a very simple rule. You may use as many type variables as you like. Any type variable you use on the right side of the equal sign must also occur on the left side. Basically on the left you declare which variables the type is abstracted over and on the right you may use it as a type for your fields. [3]
Some examples:
data Maybe a = Just a | Nothing
data Either a b = Left a | Right b
newtype SetWrapper a = SetWrapper (Set a)
Aside
There are more ways to control type variables in Haskell using a generalised concept of algebraic datatypes.
4.3. The case
construct¶
The case
construct together with function application basically comprises everything which you can do in Haskell.
The case
construct is used to deconstruct a type and gain access to the data contained within.
This is easiest to see with a user defined type
data MyType = Constr1 Int
aValue = Constr1 5 :: MyType
theIntWithin =
case aValue of
Constr1 i -> i
theIntWithin == 5
Any Haskell expression is allowed in the case <expr> of
head of the construct.
The body of the case statement is a number of matchclause -> expr
pairs.
Each match clause is a combination of constructors and bindings for values. The expression to the right of the arrow may then use the values bound by these bindings.
A very simple case match (which does absolutely nothing) would be
case expr of
x -> doSomething x
Which is the same as doSomething expr
.
We simply bind the expression to x
.
However this is often used to create a default clause for a case match.
data MyType = Constr1 Int | Constr2 String
aValue = Constr1 5 :: MyType
theIntWithin =
case aValue of
Constr1 i -> i
x -> 0
Match clauses are always matched in sequence, from top to bottom until a matching clause is found.
A clause like x
, which does not contain a constructor will always match.
Therefore it is usually found as the last clause, often serving as a kind of default clause.
If the default clause does not need the value we often use _
as binding to indicate that we do not use the value.
The case
is an immensely powerful control structure as all other control structures can be defined in terms of case
and function application.
For instance we can define an if
using case.
if cond a b =
case cond of
True -> a
False -> b
You can also pattern match on all primitive, built-in types such as Char
, []
, String
, Int
, Float
and so on. Anything you can write as a literal you may use in a case pattern.
isC char = case char of
'c' -> True
_ -> False
isC 'l' == False
isC 'c' == True
is4 n = case n of
4 -> True
_ -> False
is4 4 == True
is4 0 == False
4.3.1. Different ways to write a case expression¶
Case expressions can either be written using indentation, or semicolons and braces in the same way we can do with let
.
Thereby we can use ;
to omit newlines and {}
to omit the indentation.
The following definitions are equivalent
case expr of
-- note the indent of the match clauses
Constr1 field1 field2 -> resultExpr
Constr2 f -> resultExpr2
case expr of
Constr1 field1 field2 ->
-- note the deeper indent for the result expression
resultExpr
Constr2 f ->
resultExpr2
-- indent is replaced with semicolons and braces
case expr of { Constr1 field1 field2 -> resultExpr; Constr2 f -> resultExpr2 }
4.3.2. Case match in function definition¶
A very common pattern in Haskell is to have a function and then directly perform a case
match on one or more of the arguments.
There is some syntactic sugar to make this more convenient.
If you define your function with the syntax where the arguments come before the =
you can directly perform a pattern match on them there.
Multiple case
options are hereby achieved by defining the function once for each option.
Note that in this pattern match constructors with more than zero fields need to be parenthesized (otherwise how can the compiler distinguish between field bindings and the next argument?).
data MyType = Constr1 Int | Constr2 String
-- before
getTheInt :: MyType -> Int
getTheInt t =
case t of
Constr1 i -> i
Consrt2 _ -> 0
-- after
getTheInt2 :: MyType -> Int
getTheInt2 (Constr1 i) = i
getTheInt2 (Constr2 _) = 0
-- or, alternatively with a "_" default case
getTheInt2 :: MyType -> Int
getTheInt2 (Constr1 i) = i
getTheInt2 _ = 0
You can also match on multiple arguments at the same time. (I have aligned the arguments so you can better see the different patterns, this is only for readability and not necessary.)
addTheInts :: MyType -> MyType -> Int
addTheInts (Constr1 i1) (Constr1 i2) = i1 + i2
addTheInts (Constr i) _ = i
addTheInts _ (Constr i) = i
addTheInts _ _ = 0
4.4. Special types¶
There are some notable exceptions to the type naming rule.
Those are the list type, which is []
or [a]
which means “a list containing elements of type a
” and the tuple type (a,b)
for “a 2-tuple containing a value of type a
and a value of type b
”.
There are also larger tuples (a,b,c)
, (a,b,c,d)
etc. [2]
These tuples are simply grouped data and very common in mathematics for instance.
Should you not be familiar with the mathematical notion of tuples it may help to think of it as an unnamed struct where the fields are accessed by “index”.
And the last special type is the function type a -> b
, which reads “a function taking as input a value of type a
and producing a value of type b
.
Some examples for concrete instances of special types:
myIntBoolTriple :: (Int, Int, Bool)
myIntBoolTriple = (5, 9, False)
aWordList = ["Hello", "Foo", "bar"] :: [String] -- Note: A different way to annotate the type
-- Note: we can also nest these types
listOfTuples :: [(Int, String)]
listOfTuples =
[ (1, "Marco")
, (9, "Janine")
]
4.5. Record syntax¶
For convenience reasons there is some extra syntax for defining data types which also automatically creates some field accessor functions.
We can write the following:
data MyType =
Constructor { field1 :: Int
, field2 :: String
}
This defines the type the same way as the other data
construct.
Meaning we can pattern match as usual on the constructor.
theData = Constructor 9 "hello" :: MyType
theInt = case theData of
Constructor i _ -> i
theInt == 9
But additionally it also defines two functions field1
and field2
for accessing the fields.
Aka it generates code similar to the following:
data MyType = Constructor Int String
field1 :: MyType -> Int
field1 (Constructor i _) = i
field2 :: MyType -> String
field2 (Constructor _ s) = s
Also the two accessor functions field1
and field2
may be used in a special record update syntax to create a new record from an old one with altered field contents.
Additionally the record may be created with a special record creation syntax.
data MyType =
Constructor { field1 :: Int
, field2 :: String
}
v1 = Constructor 9 "Hello" :: MyType
-- record creation syntax
v2 = Constructor { field2 = "World", field1 = 4 } :: MyType
-- update syntax
v3 = v2 { field1 = 9 }
-- updating multiple fields at once
v4 = v2 { field1 = 9, field2 "Hello" }
v1 == v4
-- old records are unchanged
v2 /= v3 /= v4
And finally it also enables a special record pattern match using the fields.
theData = Constructor 9 "hello" :: MyType
theInt = case theData of
Constructor{ field1 = i } -> i
theInt == 9
footnotes
[1] | The naming convention in Haskell is camel case. Meaning in each identifier (type variable, type or binding) all words composing the name are chained directly, with each new word starting with an upper case letter, except for the first word, who’s case is determined by the syntax contstraints (upper case for types, lower case for type variables and bindings). |
[2] | The source file for tuples in GHC defined tuples with up to 62 elements. Below the last declaration is a large block of perhaps 20 more declarations which is commented out, with a note above saying “Manuel says: Including one more declaration gives a segmentation fault.” |
[3] | It is possible to declare type variables on the left and then not use them on the right. This is often used to tag types with other types, but this is a topic for later. There are also a language extensions which let you use type variables which only occur on the right side, however this is a very advanced topic. For now we may simply assume that this is never necessary. |