Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model support: GPT #159

Open
bonham79 opened this issue Feb 6, 2024 · 6 comments
Open

Model support: GPT #159

bonham79 opened this issue Feb 6, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request new architecture

Comments

@bonham79
Copy link
Collaborator

bonham79 commented Feb 6, 2024

MIght as well set up an autoregressive decoder since T5 is on the docket. This shouldn't be too much of a hassle since the Transformer model works, but leaving as an open issue to do validation testing on.

@bonham79 bonham79 self-assigned this Feb 6, 2024
@kylebgorman
Copy link
Contributor

Dumb question but how is this different than the type of decoder-only LM we were talking about?

@kylebgorman kylebgorman added the enhancement New feature or request label Feb 6, 2024
@bonham79
Copy link
Collaborator Author

bonham79 commented Feb 6, 2024

It's exactly that. It's just running transformer with --encoder_layers=0. Why I'm saying it shouldn't be much of a hassle (technically done already, just needs some benchmarking).

@kylebgorman
Copy link
Contributor

kylebgorman commented Feb 6, 2024 via email

@Adamits
Copy link
Collaborator

Adamits commented Feb 6, 2024

Yes, though I specifically have a prefix-LM. This can be used like GPT if you just ensure the prefix is always 0. I have some currently dirt code that takes a source and target, and concatenates them and always assumes the source is the prefix for training.

I can work on a PR in the next few weeks.

@bonham79
Copy link
Collaborator Author

bonham79 commented Feb 6, 2024

Perfect, give me a ping when ready and i'll do some benchmarking at home.

Any issue in adding features to the prefix concat? Should allow an easy hack of doing task/multitask specific training (just make a treat a target task as a feature in training).

@Adamits
Copy link
Collaborator

Adamits commented Feb 6, 2024

Sorry, yes, the features are in the prefix too by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new architecture
Projects
None yet
Development

No branches or pull requests

3 participants