Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect intercept in getfe(ef = zm2) with x:f interactions #28

Open
lcurrier opened this issue Apr 28, 2020 · 0 comments
Open

Incorrect intercept in getfe(ef = zm2) with x:f interactions #28

lcurrier opened this issue Apr 28, 2020 · 0 comments

Comments

@lcurrier
Copy link

Hi,

When using zm2, my understanding is that the intercept should be the sum of the means of the coefficients within the connected component. However, in this case getfe is returning the mean of coefficients on the interacted variable. Here's a reproducible example:

library(tidyverse)
library(lfe)

# generate data 
set.seed(2)
x1 <- rnorm(1000,sd=1)
x2 <- rnorm(1000,sd=2)
f1 <- sample(1:10,1000,replace =TRUE)
f2 <- sample(1:10,1000,replace =TRUE)
y <- x1 + 3*f1 - 2*f2 + rnorm(1000, sd=0.5)
dat <- tibble(y,x1,x2,f1,f2)

lfe:::rankDefic(list(factor(f1),factor(f2)))
# (there is 1 connected component)

# regression with and without x:f interaction 
fit1 <- felm(y ~ x1 | f1 + f2, data = dat)
fit2 <- felm(y ~ x1 | x2:as.factor(f1) + f1 + f2, data = dat)

# get fixed effects with different estimable functions
fe1_ref <- getfe(fit1,ef = "ref",se=F)
fe1_zm2 <- getfe(fit1,ef = "zm2",se=F)
fe2_ref <- getfe(fit2,ef = "ref",se=F) 
fe2_zm2 <- getfe(fit2,ef = "zm2",se=F)

# ref gives a column called "fe", but zm2 doesn't, so adding one for easy filtering
fe1_zm2 <- fe1_zm2 %>% 
  mutate(fe = str_extract(rownames(.),".*(?=\\.)"))
fe2_zm2 <- fe2_zm2 %>% 
  mutate(fe = str_extract(rownames(.),".*(?=\\.)"))

For the first regression, zm2 gives the intercept I expect, i.e. the sum of the means of coefficients of the first two factors from ref:

fe1_zm2$effect[fe1_zm2$fe=='icpt'] 
# 5.513931

mean(fe1_ref$effect[fe1_ref$fe=="f1"]) + mean(fe1_ref$effect[fe1_ref$fe=="f2"])  
#  -10.52456 + 16.03849 = 5.513931

However, this is not the case when an interaction term is added - zm2 gives an intercept that is not equal to the sum of the means of the coefficients of the factors from ref:

fe2_zm2$effect[fe2_zm2$fe=='icpt']
# -0.008670201

# when I expected the intercept to be:
mean(fe2_ref$effect[fe2_ref$fe=="f1"]) + mean(fe2_ref$effect[fe2_ref$fe=="f2"]) 
# -10.52231 + 16.03643 =  5.514116

Further, the intercept from zm2 is the mean of the coefficients on the interaction term from ref:

mean(fe2_ref$effect[fe2_ref$fe=="x2:as.factor(f1)"]) 
# -0.008670201

And the coefficients on the interaction term from zm2 have zero mean:

mean(fe2_zm2$effect[fe2_zm2$fe=="x2:as.factor(f1)"]) 
# 0

However, I don't think there is a reason to demean the interactions, because one of the variables is continuous.

The intercept from zm is as expected, i.e. the mean of the coefficients on the first factor:

fe2_zm <- getfe(fit2,ef = "zm",se=F)
fe2_zm <- fe2_zm %>% 
  mutate(fe = str_extract(rownames(.),".*(?=\\.)"))
fe2_zm$effect[fe2_zm$fe=='icpt']
#  16.03643
mean(fe2_ref$effect[fe2_ref$fe=="f2"]) 
#  16.03643

What I think is happening in the code:

In line 100 of efactory.R, the number of components ncomp is identified as 1 correctly. However, in line 200, the line comp <- allcomp redefines the components to include a 0 for the interaction terms. This means that there are 2 components (0,1) and means are calculated and subtracted for both. However, since ncomp = 1, we only keep the first mean as the intercept, which is the mean of the interactions.

Suggested fix:

At line 277, add zfact[as.integer(comp) == 0] <- NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant