Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wikipedia dataset loader needs to be updated to calculate stats for MEMIT and ROME #457

Open
KeremZaman opened this issue Dec 23, 2024 · 2 comments
Labels
question Further information is requested

Comments

@KeremZaman
Copy link

https://github.com/zjunlp/EasyEdit/blob/2bbf0e1e878b355e77279e76fe1f167991a6f19e/easyeditor/models/rome/layer_stats.py#L102C1-L105C10

        raw_ds = load_dataset(
            ds_name,
            dict(wikitext="wikitext-103-raw-v1", wikipedia="20200501.en")[ds_name]
        )

20200501.en is no longer available in datasets library. So, it needs to be updated according to up-to-date usage (see https://huggingface.co/datasets/wikimedia/wikipedia):

ds = load_dataset("wikimedia/wikipedia", "20231101.en")

Since this is likely to affect results, it would be nice to have a way to use old wikipedia dataset too.

@zxlzr zxlzr added the question Further information is requested label Dec 24, 2024
@JizhanFang
Copy link
Collaborator

Hello, we noticed that the file 20200501.en no longer exists at the path you provided. However, we found the same dataset at another location: ( https://huggingface.co/datasets/SamuelYang/wikipedia_20200501.en ). Additionally, we have some precomputed layer stats weight files corresponding to the following models: gpt-j-6B, llama2-7b, llama2-7b-chat, mistral-7b. I will upload the weights to the cloud drive and write a README file specifying the download links and the corresponding model files by the day after tomorrow.

@KeremZaman
Copy link
Author

@JizhanFang thanks for your reply. Then, also it might be good to modify that line to allow users select SamuelYang/wikipedia_20200501.en as dataset in hparams file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants