# What is the difference between categorial and continuous datatypes

Insights from "Deep Learning for Coders with fastai & PyTorch" and from around the world

When it comes to both your inputs and targets, knowing whether they are categorial or continuous guides how you represent them, the loss function you use, and the metrics you choose in measuring performance.

## What is a categorical datatype?

** Categorical** data "contains values that are one of a discrete set of choice" such as gender, occupation, day of week, etc...

^{1}

###
**target** is categorical?

What if our If your target/lables are categorical, then you have either a **multi-classification classification** problem (e.g., you are trying to predict a single class) or a **multi-label classification problem** (e.g., you are trying to predict whether your example belongs to zero or multiple classes).

#### Multi-classification tasks

For multi-classification tasks, a sensible loss function would be cross entropy loss (`nn.CrossEntropyLoss`

) and useful metrics are likely to include error rate, accuracy, F1, recall, and/or precision depending on your business objectices and the make up of your dataset. For example, if you're dealing with a highly imbalanced dataset, choosing accuracy would lead to an inflated sense of model performance since it may be learning to just predict the most common class.

**Note:**What if you need to predict "None"? This is more real world and covered nicely in Zach Mueller’s Recognizing Unknown Images (or the Unknown Label problem).

#### Multi-label tasks

For multi-label tasks, a sensible loss function would be binary cross entropy loss (BCE) (`nn.BCEWithLogitsLoss`

) and useful metrics are likely to include F1, recall, and/or precision depending on your business objectices and the make up of your dataset. Notice that I didn't include error rate, or its opposite accuracy, as their datasets are generally highly imbalanced.

###
**input** is categorical?

What if our Categorical inputs are generally represented by an **embedding** (e.g., a vector of numbers). ** Why?** Mostly because it gives your model the ability to provide a more complex representation of your category than a single numer would.

For example, imagine that one of your inputs is day of week (e.g., Sunday, Monday, etc.) ... what does that mean? When combined with other inputs, its likely that the meaning of it is going to be much more nuanced than a single number can represent, and so we'd like to use multiple learned numbers. This is what an embedding is.

## What is a continuous datatype?

** Continuous** data is numerical that represents a quantity such as age, salary, prices, etc...

###
**target** is continuous?

What if our If your target/labels are continuous, then you have a regression problem and the most likely loss function you would choose would be mean-square-error loss (MSE) (`nn.MSELoss`

) and your metric MSE as well

"... MSE is already a a useful metric for this task (although its' probably more interpretable after we take the square root)" ... the

RMSE(% fn 3 %}Note:For tasks that predict a continuous number, consider using

`y_range`

to constrain the network to predicting a value in the known range of valid values.^{2}

###
**input** is continuous?

What if our In many cases there isn't anything special you need to do, in others, it makes sense to scale these numbers so they are in the same range (usually 0 to 1) as the rest of your continuous inputs. This process is called **normalization**. ^{4}. The reason you would want to do this is so continuous values with bigger range of values (say 1000) don't drown out those with a smaller range (say 5) during model training.

#### Normalization

**Note:**"When training a model, if helps if your input data is

*normalizaed*- that is, has a mean of 0 and a standard deviation of 1.

See How To Calculate the Mean and Standard Deviation — Normalizing Datasets in Pytorch

```
import torch
print('Example 1')
nums = torch.tensor([0, 50, 100], dtype=float)
print(f'Some raw values: {nums}')
# 1. calculate their mean and standard deviation
m = nums.mean()
std = nums.std()
print(f'Their mean is {m} and their standard deviation is {std}')
# 2. normalize their values
normalized = (nums - m) / std
print(f'Here are their values after normalization: {normalized}')
print('')
print('Example 2')
nums = torch.tensor([0, 5000, 10000], dtype=float)
print(f'Some raw values: {nums}')
# 1. calculate their mean and standard deviation
m = nums.mean()
std = nums.std()
print(f'Their mean is {m} and their standard deviation is {std}')
# 2. normalize their values
normalized = (nums - m) / std
print(f'Here are their values after normalization: {normalized}')
print('')
```

fastai supplies a `Normalize`

transform you can use to do this ... "it acts on a whole mini-batch at once, so you can add it to the `batch_tfms`

secion of your data block ... you need to pass to this transform the mean and standard deviation that you want to use. If you don't, "fastai will automatically calculate them from a single batch of your data). p.241

**Note:**"This means that when you distribute a model, you need to also distribute the statistics used for normalization." (p.242)

**Important:**"... if you’re using a model that someon else has trained, make sure you find out what normalization statistics they used an match them" (p.242)

1. "Chaper 1: Your Deep Learning Journey". In *The Fastbook* p.46↩

3. Ibid. p.236. A good examle of how RMSE provides a reasonable metric for regression tasks is included on this page in reference to KeyPoint detection (e.g., detecting a point/coordinate, an x and y)↩

2. Ibid., p.47↩

4. Ibid., pp.241-42, 320 includes an extended discussion of the why, how, and where "normalization" is needed. ↩