3

I am not sure if I can use the words binomial and binary and boolean as synonyms to describe a data attribute of a data set which has two values (yes or no). Are there any differences in the meaning on a deeper level?

Moreover, if I have an attribute with three possible values (yes, no, unknown), this would be an attribute of type polynominal. What further names are also available for this type of attribute? Are they termed as "symbolic"?

I am interested in the realtion between the following attribute type: binary, boolean, binominal, polynominal (and alternative describtions) and nominal.

SmallChess
  • 1,421
  • 1
  • 9
  • 14
user3352632
  • 283
  • 1
  • 8

2 Answers2

1

@SmallChess's answer is a good start, but there are some additional parts to the question.

binary variables or binary data consist of data with the values 0 or 1, and no other values. We usually don't talk about "binary distributions", because it's only data, variables, or outcomes that can be binary. A distribution might produce binary data, but is not itself binary because its parameters typically take on real-values.

A binomial distribution is a distribution that produces binary data. In particular, it is a random process that produces the value 1 with probability $$p$$, and the value 0 with probability $$1-p$$. Notice that although it makes binary data, it is not itself a kind of data, and is in fact charactorized by a non-binary number (p).

Boolean data takes on the values true or false. Often, but not always, these are stored as 0's and 1's. The distinction is that boolean data may not be stored numerically. There might also be different expectations about how Boolean data should be processed (for instance, $$true + true = true$$, but $1 + 1 = 2$.

I am not aware of the term polynomial being applied to data. However, multinomial distributions are probability distributions that produce 0 with probability $p_0$, 1 with probability $p_1$, 2 with probability $p_2$, and so on, producing $p_n$ with probability $1 - \sum_{i=0}^{n-1} p_i$ for $n$ different numbers. Like binomial distributions, multinomial distributions are characterized by a set of real-valued numbers, and are distinct from the kind of data they generate.

Categorical data takes on values from a set of categories. The example you give (yes, no, maybe) is not strictly multinomial data, but could be generated from a multinomial distribution by mapping the values 0, 1 and 2 onto yes, no and maybe. Note again that categorical data might be non-numeric. Operations like adding might be non-sensical.

Cardinal data isn't something you asked about, but arises when data can be nicely ordered. For example, playing cards are easily mapped to the numbers 1-13, and can have reasonable semantic meaning when represented this way (e.g. A + 2 = 3, and 1 + 2 = 3).

Nominal Data is just literal numbers that mean exactly what they purport to mean. For example, if you store the number of cans of beer a customer purchased, that would be nominal data.

John Doucette
  • 9,452
  • 1
  • 19
  • 52
0

Binomial is a distribution characterised by $p$, the probability of success for an independent trial. Each sample you get from the distribution is a binary variable, 0 or 1.

SmallChess
  • 1,421
  • 1
  • 9
  • 14