Adds a new method to shuffle/swap values #167

Ar57m · 2024-02-15T04:55:26Z

in each tensor.
I'm bad at describing what it does, so here's the exact function in a code to serve as an example/demo.

import torch

x = torch.arange(2, 2*6*4 + 2, 2).view(6, 4)


base = torch.arange(1, 2*6*4, 2).view(6, 4)


print("Tensor x:")
print(x)

print("\nTensor base:")
print(base)

def swap_values(shape, n, base, x):
    if x.dim() == 2:
       rows, cols = shape
       rows_range = torch.arange(rows).view(-1, 1)
       cols_range = torch.arange(cols).view(1, -1)
       mask = ((rows_range + cols_range) % n == 0).bool()
       x = torch.where(mask, x, base)
       print("\nMask:\n",mask)
    else:
       rows_range = torch.arange(shape[0])
       mask = ((rows_range) % n == 0).bool()
       x = torch.where(mask, x, base)
    return x

print("\nTensor Mask and Swapped(n=2):")
swapped = swap_values(x.shape,2,base, x)
print(swapped)


print("\nTensor Mask and Swapped with inverted offset(n=2):")
offset = swap_values(x.shape,2,x, base)
print(offset)



print("\nTensor Mask and Swapped(n=3):")
swapped = swap_values(x.shape,3,base, x)
print(swapped)


print("\nTensor Mask and Swapped with inverted offset(n=3):")
offset = swap_values(x.shape,3,x, base)
print(offset)

I didn't mess with the legacy.py(don't know what to do there, if I need to do something there).
Feel free to do anything you want with it, I did as far as I could to improve or optimize my shenanigans 🙂

You can use task_swapping, task_swapping_ties or task_swapping_dare_ties.

Here's a yaml example:

merge_method: task_swapping
base_model: NousResearch/Yarn-Mistral-7b-128k
models:
  - model: senseable/WestLake-7B-v2
    parameters:
      weight: 0.666
      diagonal_offset: 2     # 2 basically uses a chess board mask
      invert_offset: False    # by default is set to False even without this parameter

# Idk, but I think if you put another model with the same parameters here,
# it will just use one of them, so I recommend changing parameters of the other models, but I'm not sure

dtype: bfloat16

cg123 · 2024-02-21T09:02:32Z

This is really interesting! I definitely have to spend some time to get a better feel for how this is working.

From my first read through it looks like this might be effectively negating the delta values in a checkerboard pattern. Does that jive with your thoughts?

Thanks again for the PR, I'm looking forward to playing with it. :)

Ar57m · 2024-02-21T10:58:17Z

From my first read through it looks like this might be effectively negating the delta values in a checkerboard pattern. Does that jive with your thoughts?

Yep, I think that's correct, Thanks for replying 🤗

Ar57m · 2024-03-01T03:28:26Z

I fixed some things inverted wrongly.
And I added an optional random mask to merge random parts of the tensor.
here what's doing:

import torch

x = torch.arange(2, 2*10*5 + 2, 2).view(10, 5)


base = torch.arange(1, 2*10*5, 2).view(10, 5)

print(x, '\n', base)

def rand_mask(base, x, percent, seed=None):

    oldseed = torch.seed()

    if seed is not None:
        torch.manual_seed(seed)

    random = torch.rand(base.shape)
    mask = random <= percent
    del random
    print('\n', mask,'\n' , torch.sum(mask).item()/(mask.shape[0]*mask.shape[1]), '% of the base swapped\n')
    torch.manual_seed(oldseed)
    x = torch.where(mask, x, base)
    return x

result = rand_mask(base, x, 0.2, seed=1337)

print(result)

And here is the yaml example of use:

merge_method: task_swapping
base_model: NeuralNovel/Senzu-7B-v0.1-DPO
models:
  - model: senseable/WestLake-7B-v2
    parameters:
      weight: 0.75
      diagonal_offset: 2    #it doesn't do anything when you use random_mask
      random_mask: 0.3333  # if it's 0.0 or not present the normal behavior will happen(chessboard-like mask) 
      random_mask_seed: 98557 # if it's not present it will probably be completely random every time(not reproducible)
dtype: bfloat16

the model ⬆️ merged

I hope I didn't mess up this time 😅

linux-leo · 2024-03-07T15:36:14Z

Hey @Ar57m, this method is really cool! Do you know how it compares to slerp interpolation or standard task arithmetic?

Ar57m · 2024-03-07T17:19:34Z

Hey @Ar57m, this method is really cool! Do you know how it compares to slerp interpolation or standard task arithmetic?

thanks @linux-leo , I don't have much knowledge in calculus to understand what exactly Slerp does, but afaik my method seems to be much simpler and different (not related to what Slerp does).
I build it on top of generalized_task_arithmetic.py, so after the swapping, it will be applied task_arithmetic normally.

Imagine you have two sorted decks of cards(equal in amount of cards) base and X, each card has a number 0 to n. Imagine that each card is an element(or value) in a tensor, You take the base deck and replace the base even cards by the X even cards. That's in the normal behavior.
Using the random mask, it selects random cards from the base to be swapped by the respective cards of X.

trying to resolve conflicts

Ar57m closed this May 7, 2024

Ar57m force-pushed the swapping branch from fa0ed43 to 46b432f Compare May 7, 2024 19:09

Ar57m added 2 commits May 7, 2024 16:10

Add files via upload

9f3c378

trying to resolve conflicts

29d3de8

trying to resolve conflicts

Ar57m reopened this May 7, 2024

Ar57m added 8 commits May 12, 2024 00:11

Merge branch 'arcee-ai:main' into swapping

f2afaa9

Fix running on GPU

cc92003

Merge branch 'arcee-ai:main' into swapping

39ebee8

Merge branch 'arcee-ai:main' into swapping

c1c2abd

Merge branch 'main' into swapping

3058592

Merge branch 'arcee-ai:main' into swapping

6af0dad

Merge branch 'arcee-ai:main' into swapping

46a575d

Merge branch 'arcee-ai:main' into swapping

4ede103

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a new method to shuffle/swap values #167

Adds a new method to shuffle/swap values #167

Ar57m commented Feb 15, 2024 •

edited

Loading

cg123 commented Feb 21, 2024

Ar57m commented Feb 21, 2024

Ar57m commented Mar 1, 2024 •

edited

Loading

linux-leo commented Mar 7, 2024

Ar57m commented Mar 7, 2024 •

edited

Loading

Adds a new method to shuffle/swap values #167

Are you sure you want to change the base?

Adds a new method to shuffle/swap values #167

Conversation

Ar57m commented Feb 15, 2024 • edited Loading

cg123 commented Feb 21, 2024

Ar57m commented Feb 21, 2024

Ar57m commented Mar 1, 2024 • edited Loading

linux-leo commented Mar 7, 2024

Ar57m commented Mar 7, 2024 • edited Loading

Ar57m commented Feb 15, 2024 •

edited

Loading

Ar57m commented Mar 1, 2024 •

edited

Loading

Ar57m commented Mar 7, 2024 •

edited

Loading