Talk:Algebraic data type

WikiProject Computer science (Rated Start-class, Low-importance)
This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start  This article has been rated as Start-Class on the project's quality scale.
Low  This article has been rated as Low-importance on the project's importance scale.
 To-do list for Algebraic data type: ADT = product + sum + recursion + parametric (last one seems to be left out of the lede)

Algebraic?

Can someone take a step back and add more "primer" to this? In particular, why is this called algebraic? What makes it that way?

What are some non-algebraic types that may provide the same features, or be used in competition with these?

What makes a type algebraic? When should I use the word "algebraic" in programming dialog?

Frankly, I don't care about (your favorite programming language). I only code in (some language you don't like). Please tell me how this applies to me.

-Austin Hastings 68.37.47.32 (talk) 03:20, 24 September 2010 (UTC)

Exactly, the article should explain why an algebraic data type is called algebraic. There are two good answers on Stackoverflow that also contain some useful references one could use in the article: [1] [2]
(An algebra is usually defined as a set and one or more operations on it. With algebraic data types there are two operations, sum (union) and product (struct) - the set is the set of data constructor symbols. Defined like this you can reuse results from algebra/category theory.)
--Gms (talk) 10:12, 27 October 2011 (UTC)

Tagged union

I believe this page and tagged union refer to more or less the same concept, but they could be merged usefully. Does anyone disagree with this?

Derrick Coetzee 16:37, 4 Apr 2004 (UTC)

I think that would be a bad idea. A tagged union is different from an algebraic data type, but they can be used to implement algebraic data types, though this article probably needs more generalization. Dysprosia 22:05, 4 Apr 2004 (UTC)

Yes, I agree that these deserve different articles. I plan to clean this one up. Brighterorange 04:40, 13 May 2005 (UTC)

Languages

Which languages support algebraic data types? Using What links here, I only see Haskell, ML, Nemerle, Hope, Pizza.

So Haskell uses : for list constructor operator and ML uses :: and they use the other for type annotations. I suppose it's ok to give both alternatives then. But do we need to give Nil and Cons without capital letters as that makes them look like functions? I know Lisp has nil and cons but then again Lisp doesn't have algebraic data types and constructors. If it's meant to be mathematical notation, I must say I'm not a fan of it either in programming-related articles.

My proposal would be to revert Nil and Cons to have capital letters. --TuukkaH 15:51, 15 February 2006 (UTC)

I just removed the following discussion from the article. See below for why.

Languages with type systems based upon System-F, such as OCaml or SML, may not use constructors directly in substitution of normal functions within a given expression. Languages with more expressive type systems based upon something similar to System-Fω (i.e., systems that use kinds, where a kind serves as a "type of a type"), such as Haskell, may however use constructors directly in substitution of a normal function. The latter case also implies then that constructors in these more expressive languages may be partially applied (i.e., curried), whereas the same is not true for languages in the former case.
For example, the following is completely acceptable in Haskell (System-Fω-based):
map :: (a -> b) -> [a] -> [b]
map Leaf [0 .. 9]

But in OCaml (System-F-based), this approach yields:
List.map : ('a -> 'b) -> 'a list -> 'b list
List.map Leaf [0;1;2;3;4;5;6;7;8;9];;
The constructor Leaf expects 1 argument(s),
but is here applied to 0 argument(s)

And so, in OCaml (System-F-based), one must wrap the Leaf constructor in a lambda abstraction, since it cannot be substituted for a normal function, such as List.map expects. To make it acceptable then, it must be modified like so:
List.map (fun x -> Leaf x) [0;1;2;3;4;5;6;7;8;9]


The problem is that map Leaf [0,1,2,3,4] is legal SML, even if it is not legal OCaml (I wouldn't know about the latter). It follows that the reason for its being illegal in OCaml has nothing to do with predicativity or the restriction to prenex polymorphism, which is what the above seems to mean by being "System-F-based" -- that in itself, of course, is somewhere between misleading and false. Is there, in fact, a "reason" why OCaml cannot treat datatype constructors as values of function type, or is that just the way the language happened to be designed? -- Cjoev 17:22, 28 March 2006 (UTC)

Here, Empty, Leaf and Node are the constructors. They are used much like functions in that they can be (partially) applied to arguments of the appropriate type. For instance, Leaf has the functional type Int -> Tree meaning that giving an integer as an argument to Leaf produces a value of the type Tree. As Node takes two arguments of the type Tree itself, the datatype is recursive.
A constructor application cannot be reduced (evaluated) like a function application though since it is already in normal form (there is no function definition for it). Operations on algebraic data types can be defined by using pattern matching to retrieve the arguments, for example, consider a function to find the depth of a Tree:
At least it was clear even if it turns out it wasn't factually completely accurate. --TuukkaH 18:26, 28 March 2006 (UTC)
I tried to pick a simple example, but it appears the example I picked was too simple to illustrate the issue I was raising.
I suspect that what is happening in the case of SML is that the compiler is simply wrapping the constructor in a lambda abstraction in much the same way I did manually for the O'Caml example. But this still does not imply that constructors are treated the same in SML and O'Caml as are functions, and this is directly due to issues with the type system.
What do you mean, "treated the same"? In one sense, of course they are not, because constructors can appear in patterns but functions can't (duh). In every other sense, SML "treats" a constructor exactly as if it were a function of the appropriate type. The question of what function types can be inhabited by constructors is not the same as how such a value is "treated" once it exists. Cjoev 19:27, 31 March 2006 (UTC)
Here is a much more complex example that should not be possible in SML without many additional lambda abstractions, because of this issue:
data Triple a b c
= MkTriple a b c
deriving Show

zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]

foo = zipWith (\f x -> f x) (zipWith MkTriple [0..4] "abcd") ["foo","bar","baz","qux"]

... =>[MkTriple 0 'a' "foo",MkTriple 1 'b' "bar",MkTriple 2 'c' "baz",MkTriple 3 'd' "qux"]

bar = let f    =     MkTriple
f'   = f   0
f''  = f'  'a'
f''' = f'' "foo"
in f'''

... => MkTriple 0 'a' "foo"

baz = let f    =     MkTriple
f'   = f   f
f''  = f'  f
f''' = f'' f
in case f''' of
(MkTriple g h i) -> let g' = g 0 'a' "foo"
h' = h 1 'b' "bar"
i' = i 2 'c' "baz"
in f g' h' i'

... => MkTriple (MkTriple 0 'a' "foo") (MkTriple 1 'b' "bar") (MkTriple 2 'c' "baz")


Haskell can do this because it treats constructors the same as functions -- (i.e., constructors can be curried, and they can have an arity > 1, whereas in OCaml and SML, arity is always only 0 or 1, and multiple parameters must be expressed via tuples, which means constructors here are not as general as true functions). Again, this is related to the type system.
Well, yes, this is related to the type system in the sense that it is a feature of the language. I disagree that it is the result of some deep difference (e.g. ${\displaystyle F}$  versus ${\displaystyle F\omega }$ ) between the type systems as the section I removed from the article contended. Cjoev 19:27, 31 March 2006 (UTC)
The selection of text I replaced was wrong because it implied that constructors can be partially applied, but this is not universally true -- it is not true in the case of OCaml (which was given as one example of using ADT's). Secondly, the bit about constructors not reducing to functions, because of having no function definitions, etc., is also incorrect -- it again depends upon the type system, which is what I tried to clarify.
OK, so constructors in OCaml must be fully applied (taking your word for it -- I don't use OCaml myself). But the part about constructors "reducing" to functions or not makes no sense either to affirm or deny. In SML, a constructor is a function (except when it appears in a pattern), and it's a value so it doesn't have to reduce. In OCaml something slightly different is the case. I don't see how either the original example or your new code clarifies this. Cjoev 19:27, 31 March 2006 (UTC)
So, in short, I do not think my addition was wrong, but the example I chose was too simple to fully illustrate the issue, and in the case of SML, it appeared misleading because of the fact that the compiler most likely handles these very simple cases automatically.70.36.30.117 17:36, 31 March 2006 (UTC)
The following is legal SML code approximating the third part of your new example:
datatype ('a,'b,'c) triple = Triple of 'a * 'b * 'c

fun mktriple a b c = Triple(a,b,c)

val baz =
let
val f    =     mktriple
val f'   = f   f
val f''  = f'  f
val f''' = f'' f
in case f''' of
(Triple (g, h, i)) => let val g' = g 0 #"a" "foo"
val h' = h 1 #"b" "bar"
val i' = i 2 #"c" "baz"
in f g' h' i'
end
end

Yes, it's true that constructors in SML cannot be curried, but all you need to do is write the curried function yourself and you can use it wherever you like. The only obstacle is the value restriction, which is legitimately different between ML and Haskell but has nothing to do with constructors (does it?).
If I understand you correctly, the issues you raise are: (1) partial application of constructors is only possible if they are curried, which does not occur in either variant of ML; and (2) the parenthetical "there is no function definition for it" from the earlier version is meaningless and certainly does not explain "why" a constructor application is considered a normal form. Both of these are true, but I contend they have nothing to do with the presence or absence of kinds, with F-versus-F-omega, or with restrictions to predicative prenex polymorphism. Cjoev 19:27, 31 March 2006 (UTC)
What I am saying is that, due to the fact that Haskell uses System-Fω, constructors are treated quite differently than in SML and OCaml. The way that they are treated in Haskell makes them appear much more like ordinary functions, without the restrictions that are placed upon them in SML and OCaml. The original text made some claims that were incorrect with regards to the relationship between constructors and functions, and I attempted to clarify it. But the example I chose was admittedly misleading.
In something like System-Fω, where one has kinds, a "type" is not actually a type unless it is of kind *. Example:
data Triple a b c = MkTriple a b c  -- not a type, but kind: * -> * -> * -> *
type F            =     Triple      -- ...             kind: * -> * -> * -> *
type F'           = F   Int         -- ...             kind: * -> * -> *
type F''          = F'  Char        -- ...             kind: * -> *
type F'''         = F'' String      -- now a type,     kind: *

f :: Int -> Char -> String -> F'''
f = MkTriple

When one adds kinds, they modify the term language itself to allow for abstraction over constructors with *any* kind, not just kind *, or "type". This allows one to use constructors in essentially the same way as functions (incl. currying). This is because, via kinds, there is now a notion of what it actually means for a constructor to be partially applied (i.e., its "type" is still of some kind * -> ... -> *, rather than *). Can one "fake" treating constructors as functions by wrapping them in lambda abstractions? Sort of, yes. But it's not the same thing at a fundamental level. Furthermore, without something like System-Fω, you don't get application at the type level, as I showed above. 70.36.30.117 23:22, 31 March 2006 (UTC)
Ah, now I see that part of our failure to communicate is the overloading of the word "constructor". There are type constructors, or what SML/NJ's error messages call tycons, such as Triple, and then there are value constructors such as MkTriple. The former has kind * -> * -> * -> *; the latter is a term-level construct that (in Haskell) has type a -> b -> c -> Triple a b c (which has kind *). This type is universally quantified, but all the variables have kind * and so I maintain that it is not significantly different from the SML type 'a -> 'b -> 'c -> ('a,'b,'c) triple, which is the type of mktriple in my translation of your earlier example.
Partial application of type constructors is possible in SML, but you have to write it in a stupid way that looks like you've η-expanded everything (and combined with ML's postfix application for tycons ends up being mighty confusing):
type ('b,'c) F' = (int,'b,'c) triple
type 'c F'' = (char,'c) F'
type F''' = string F''
val f : int -> char -> string -> F''' = mktriple

As you point out, the real difference between ML and Haskell is the ability to write terms that quantify over kinds other than * (I'm taking your word that Haskell can do this), which appears to be lacking in ML (but see my next paragraph). However, I claim that (1) you still have not produced an example that does this (but you don't have to because I readily concede such examples exist), and (2) more importantly, since it's an issue of type constructors rather than value constructors it has nothing to do with algebraic datatypes per se and does not belong in this article. Maybe it belongs in Polymorphism (computer science).
Finally, I've been putting off saying this, but if you count the module system, then SML is Fω-based. (Think about the fact that a functor argument can have a parameterized type in it.) This is made very explicit in a number of papers by Bob Harper and others (I can provide references if you like) and will likely become even more evident in future versions of the language if proposals for extensions to SML with higher-order and/or recursive and/or first-class modules are ever adopted. The distinction between SML and Haskell is not F versus Fω, it's artificially-restricted-Fω versus full Fω. The restrictions are there for historical reasons to do with type inference; Haskell has since done away with them, but ML has not. Cjoev 01:09, 1 April 2006 (UTC)

ML-specific

This article appears to be talking about a concept specific to the ML family of languages. This should be mentioned somewhere near the top. --macrakis (talk) 16:57, 26 December 2008 (UTC)

technical

whoa. This is way beyond an average reader.

I'd like to make this understandable to guys like me, who live in the world of more applied languages like C++, Ruby, JavaScript, PHP. You say Pattern Matching and I say RegEx. Never used any of the languages you mention here, although I've heard of Haskell, and a lot of the concepts seem familiar. Don't roll your eyes and say it can't be done; it can.

Much of this seems like some of the symbolic algebra I did years ago. You give it "(x+1)^2", and instead of looking up the value of x, it would construct a tree like

• ^
• +
• variable x
• integer 1
• integer 2

Each node in the tree is like a constructor for a hypothetical calculation process, where the node 'integer 2' constructs the float number 2.000 into the calculation, etc. But the point of this pre-construction tree is to come up with other such trees, like x^2+2*x+1. Each node represents the potential for an object, but, by being unconstructed, you can do things not possible by calculation, like arrive at conclusions for all x, or prove things without roundoff error. More serious users work with linear algebra and functional values rather than just numbers, and these can only exist in pre-constructed form.

It would be nice to make an example, say in JavaScript, where you can make a char string of a JS expression, and then calculate it with eval(). All the while saying, "It's like if you could do this in JS, but without character strings". OsamaBinLogin (talk) 00:15, 24 March 2009 (UTC)

It may be a little late to comment on this, but:
• Yes, it is beyond the typical programmer. Haskell, ML, etc. are not less "applied" than C++ etc., but more mathematical. The languages you mention (even the object-oriented ones) are all imperative programming languages, meaning they say what to do rather than how mathematical functions are defined. The languages that have algebraic datatypes are called functional programming languages because they deal with mathematical functions — not the same as what C, C++, etc. call "functions". If you want to understand this article, you'll need to learn something about functional programming.
• The kind of "pattern matching" being talked about here is not the same as that done by regular expressions. RegExes are about matching text, i.e. parsing, but we are talking about matching forms of algebraic datatypes (in very specific ways, for very specific reasons). It's part of the way these languages work, not something they can be used to do.
• Having "heard of" Haskell is not enough to understand this article. You need to actually use it (or some other FP language). I would recommend reading the Haskell Wikibook, especially the "Beginner's Track".
• The "algebra" being talked about here is not elementary algebra, e.g. (x+1)^2 = x^2 + 2x + 1, but one of the more advanced kinds called abstract algebra. Where they study groups, rings, etc. The idea here is that we can think of ways of combining datatypes as algebraic operations, e.g. tuples can be thought of as Cartesian products.
• As far as I know, short of implementing a whole FP language from scratch, it is not possible to do this stuff in JavaScript.
--BlueGuy213 (talk) 00:16, 27 February 2012 (UTC)

Not technical enough

The article could benefit from some pointers to more detailed discussion.

I came here looking for some information about the formal underpinnings of the type systems in languages like Haskell, ML and Typed Racket, but the bulk of the article proceeds by examples (which aren't particularly helpful unless someone already groks Haskell or ML syntax, in which case they probably don't have much of a problem), save a brief lurch into formality in the 'Theory' section.

The article does cross-refer to Type theory, but that's a very high level introduction to types in the context of formal logic. The formal parts (ie the algebra) in the 'Theory' section are what I'm after, but they're not really explained in the cross-referred article. A pointer to a review article or textbook-level introduction would be perfect. I don't have any suggestions for that (that's what I came here looking for).

Overall, the article is indeed uneven. Type theory is technical, and of interest only to people who are already at least vaguely familiar with algebraically typed languages and want to dig more. It would therefore be useful to have, at the top, a remark this is not the article you're looking for, addressed to people who want to know the difference between dynamic and static types, say, and pointing to Type system. NormanGray (talk) 10:56, 17 August 2011 (UTC)

Explanation section

The explanation section mainly deals with the specifics of pattern matching but I think it's reasonable to assume the reader already knows what this is and/or how the Haskell code should be read. If they don't, they can read the article pattern matching (which is where the content under explanation could be moved to). The article is about algebraic data types after all; if you're not familiar with pattern matching, you should study that first before reading this article. Simeon (talk) 15:46, 5 April 2009 (UTC)

What makes it useful?

I'd like to see one or two more sentences, in the first paragraph (or perhaps a new second paragraph), explaining what makes the Algebraic data type useful. Why should I like it?

The current opening paragraph is an excellent descriptive "what" statement. It nicely summarizes what the Algebraic data type is.

The second paragraph gets right into technical details, and does so very well.

But there's something missing in between -- after the "what" statement, before the technical details, I'd like to see a sentence or two about how this is useful, and who might find it of interest and why.

I'm a programmer myself, this is exactly the kind of thing that catches my attention, but I'm not familiar with the subject so I'll leave the editing to someone else.

Karl gregory jones (talk) 03:35, 9 October 2010 (UTC)

Section on Implementation

This article could really use a section on how algebraic data types are implemented - that is, how they are represented in memory. I would start it, but I know almost nothing about the subject.

Anecdotally, I agree with the people that say this article is, if anything, not technical enough. It certainly isn't too technical. It reads too much like a tutorial or a textbook, as well. For a somewhat abstruse topic, though, it's a good start. Exercisephys (talk) 01:38, 18 July 2014 (UTC)

Is Minus unary or binary?

In the 'Examples' section, for the 'Expression' data type Minus is defined as

  Minus Expression Expression


yet it is used later in an expression as

  (Minus (Number 1))


It should probably defined as unary in type definition, to allow for negation. — Preceding unsigned comment added by 2620:0:1047:0:97A:A78A:439B:DC43 (talk) 10:22, 26 September 2016 (UTC)