# Labeling Recipes with Logistic Regression

*Part 1 - Introduction and Mathematical Theory*

__: code.txt__

**Python code in this project**In recent years, online recipe repositories have exploded in both size and popularity. This gave me a very nice opportunity to combine both of my interests. In this project, I will be using logistic regression to label a set of recipes compiled from www.epicurious.com, with breakfast, lunch or dinner labels.

There are two main uses for this. First, by looking at important features, we can understand more about consumer food preferences. Second, out of the 15710 recipes in the dataset, only 3337 were labelled as breakfast, lunch or dinner. Our classifier could help label the rest.

## Logistic Regression

Suppose we want to predict a label \( \hat{y}_i \) based on input features \( \{ x_1,...,x_n \} \). If we use linear regression, the model that we are fitting to the data is,

To get around this problem, we could instead try to fit the logistic regression model,

To the best of my knowledge, this is a great example of the saying "all models are wrong, but some are useful".

## Deriving the Logit Model

The main motivation for the logit model was the desire to apply linear regression methods to probabilities. The following train of thought might approximate what the creators of the model had in mind.

- We want the fitted probability \( \hat{p}_i \) to be dependent on input \( \{ x_1,...,x_n \} \), in a way that is somehow linear.
- We want \( \hat{p}_i \) to be within the interval \( (0,1) \).
- We want to incorporate diminishing returns, so that it takes a bigger change to make a large \( \hat{p}_i \) larger, or a small \( \hat{p}_i \) smaller.

The "best" set of parameters \( \{ \beta_0, \beta_1, ..., \beta_n \} \) is defined to be the one that maximizes the probability of generating our data. This is known as maximum likelihood estimation. However, there is no way to solve for these parameters exactly. Numerical methods such as Newton-Raphson have to be used.

Why is logistic regression considered a linear model? Because \( y_i \) is set to \( 1 \) if and only if \( \hat{p}_i > 0.5 \). But \( \hat{p}_i > 0.5 \) if and only if \( \beta_0 + \beta_1 x_1 + ... + \beta_n x_n > 0 \). This \( \beta_0 + \beta_1 x_1 + ... + \beta_n x_n = 0 \) boundary is a linear combination of \( \{ x_1,...,x_n \} \), which is why the model is considered linear.

## Comparison with Linear Regression

Why not just use linear regression? Well, I gave three reasons in the previous section. Another possible reason could be the desire to obtain a closer fit to the data, as shown by the figure below.

With the introduction and mathematical foundations out of the way, we can start working with the data in part 2 of this article.