Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the dimension discrepancy of the magnitude vector #11

Open
Swiminn opened this issue Jun 2, 2024 · 4 comments
Open

About the dimension discrepancy of the magnitude vector #11

Swiminn opened this issue Jun 2, 2024 · 4 comments

Comments

@Swiminn
Copy link

Swiminn commented Jun 2, 2024

Hello,

Thank you for sharing your great research results. I enjoyed reading your paper and have been trying to run your code.

However, while reviewing the code, I encountered an issue regarding the dimension of the magnitude vector.

In the paper, it is mentioned that the size of the magnitude vector is $\mathbb{R}^{1 \times k}$. However, in the actual code, the magnitude is calculated as follows:

magnitude = (torch.linalg.norm(new_module.weight.detach(), dim=1)).unsqueeze(1).detach()

This results in magnitude.shape being $(d \times 1)$, if new_module.weight.shape is $(d \times k)$.

Therefore, it seems that there is a discrepancy between the description of the magnitude shape in the paper and the actual shape in the code. Could you please explain this?

Thank you.

@dt-3t
Copy link

dt-3t commented Jun 7, 2024

Hello,

Thank you for sharing your great research results. I enjoyed reading your paper and have been trying to run your code.

However, while reviewing the code, I encountered an issue regarding the dimension of the magnitude vector.

In the paper, it is mentioned that the size of the magnitude vector is R1×k. However, in the actual code, the magnitude is calculated as follows:

magnitude = (torch.linalg.norm(new_module.weight.detach(), dim=1)).unsqueeze(1).detach()

This results in magnitude.shape being (d×1), if new_module.weight.shape is (d×k).

Therefore, it seems that there is a discrepancy between the description of the magnitude shape in the paper and the actual shape in the code. Could you please explain this?

Thank you.

I've noticed a similar issue regarding the calculation of the norm for a matrix of size d*k (where d is the output dimension and k is the input dimension, if I'm not mistaken). In the code, the norm is calculated with dim=1, which actually computes the norm of each row vector. However, in the paper, it is stated that the calculation should be "vector-wise norm of a matrix across each column". So according to the paper, we should get a tensor of length k (column number), not d as currently implemented.

Additionally, I observed that the Hugging Face peft library also specifies dim=1 in their implementation:

weight_norm = torch.linalg.norm(weight, dim=1)

@nbasyl
Copy link
Collaborator

nbasyl commented Jun 11, 2024

Hi, the normalization is always applied on the out_feature dimension. We did not formulate W in the first figure in the paper the same as nn.Linear.weight. The shape of W in this figure is of the shape (in_features, out_features) and the shape of nn.Linear.weight is of the shape (out_features,in_features) which might be causing your confusion.

@Swiminn
Copy link
Author

Swiminn commented Jun 11, 2024

Hi, the normalization is always applied on the out_feature dimension. We did not formulate W in the first figure in the paper the same as nn.Linear.weight. The shape of W in this figure is of the shape (in_features, out_features) and the shape of nn.Linear.weight is of the shape (out_features,in_features) which might be causing your confusion.

Thank you for your response, but I am still confused.
In equation (1), it is formulated as

$W' = W_0 + \Delta W = W_0 + BA$
where $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$

Then isn't the shape of $W$ is (out_features, in_features)?

@nbasyl
Copy link
Collaborator

nbasyl commented Jun 12, 2024

In our paper, W is (in_feature, out_features), but for our implementation W is (out_features, in_features) following the config of nn.Linear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants