Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting out-of-sample tasks of known individuals - Mixed logit on panel data #38

Open
armoutihansen opened this issue Aug 10, 2022 · 3 comments

Comments

@armoutihansen
Copy link

Hi @jhelvy,

First of all, thanks a lot for this contribution. I am usually using python, but logitr managed to solve some convergence issues I was having with xlogit using panel data.

My current question/issue revolves around the following: I have estimated a mixed logit model on a panel of individuals in a set of tasks/problems. Now suppose I have a separate panel data set containing the same individuals on which I would like to make predictions. Using the (unconditional) estimated distribution over the parameters in order to make predictions is then not optimal, since we already have additional information on them from their prior choices. To be specific, let $g(\beta|\theta)$ be the population distribution of the parameters $\beta$, let $L(i,t|\beta)=\frac{e^{\beta'X_{it}}}{\sum_j e^{\beta'X_{it}}}$ be the probability of choosing $i$ in task $t$ conditional on $\beta$. Then, by Bayes' rule, the distribution over parameters conditional on having observed a sequence of choices $y$ is given by:

$$h(\beta|y,\theta)=\frac{P(y|\beta)g(\beta|\theta)}{P(y|\theta)}$$

Where $P(y|\beta)= L(y_1,1|\beta)\times\dots\times L(y_T,T|\beta)$ is the probability of the individual's sequence conditional on $\beta$ and $P(y|\theta)=\int P(Y|\beta)g(\beta|\theta)d\theta$ the unconditional probability. Based on this, an individual's estimated probability of choosing $i$ in out-of-sample task $T+1$ is given by:

$$\tilde{P}(i, T+1|y,\theta)=\frac{\sum_{r}L(i, T+1|\beta^r)P(y|\beta^r)}{\sum_{r}P(y|\beta^r)}$$

I should note that the above notation is from Revelt & Train (2000): "Customer-Specific Taste Parameters and Mixed Logit: Households' Choice of Electricity Supplier."

From my (limited) understanding of R, your predict method uses the population distribution over parameters to make predictions and does not allow for a panelID option to use the conditional distribution, is that correct? If so, do you know of any way I could use logitr to (1) derive the conditional distribution for each individual, and (2) make predictions based on this conditional distribution?

On a unrelated note, I think that I have spotted to bugs:

  1. If I estimate a multinomial logit using a single parameter, I get the following error when executing the summary method if I specify a clusterID:

image

Note that it works for two or more parameters. Furthermore, the summary method also work for a single parameter if I leave out clusterID.

  1. If I estimate a mixed logit using a single parameter, I get the following error in the estimation if I specify clusterID:

image

Note that the estimation works for two or more parameters. The estimation also works for a single parameter if I leave out clusterID.

Many thanks in advance for your time.

@jhelvy
Copy link
Owner

jhelvy commented Aug 10, 2022

You are correct that the logitr predict method uses the population distribution over parameters to make predictions. I believe what you described would be an appropriate way to make predictions using the panelID. Implementing this in the package would take some time to work out, and right now I cannot make it a priority. If you come up with a way to implement it even for a specific example, that would help me have something to work with and I could perhaps integrate it into the package.

As for the bugs, can you try installing the latest development version (0.7.2)?

# install.packages("remotes")
remotes::install_github("jhelvy/logitr")

I fixed several bugs regarding the clusterID recently, but I haven't gotten those fixes on CRAN yet.

@armoutihansen
Copy link
Author

Thanks a lot for the quick response!
I'll try to come up with something and get back to you.

I just installed the latest version (0.7.2) and it seems that the mentioned bugs persist in the newest version when only using one parameter. For instance by using the yogurt data with price being the only parameter.

@jhelvy
Copy link
Owner

jhelvy commented Aug 11, 2022

Ah okay so that's probably another bug. I may have actually introduced it when I made the fixes to the other ones. Would you mind opening up a separate issue about this to keep it separate from the prediction feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants