-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] basic use of pipeline to generate SFT dataset from documents #1076
base: develop
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
for more information, see https://pre-commit.ci
Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1076/ |
CodSpeed Performance ReportMerging #1076 will not alter performanceComparing Summary
|
@burtenshaw can we get rid of the from datasets import Dataset
import wikipedia
from distilabel.pipeline import InstructionResponsePipeline
pipeline = InstructionResponsePipeline(num_instructions=5)
distiset = pipeline.pipeline.run(
use_cache=False,
dataset=Dataset.from_list(
[
{
"input": wikipedia.page(title="Transfer_learning").content,
}
]
),
) |
for more information, see https://pre-commit.ci
…github.com/argilla-io/distilabel into feat/dataset-instruction-response-pipeline
for more information, see https://pre-commit.ci
…github.com/argilla-io/distilabel into feat/dataset-instruction-response-pipeline
This is a continuation of this: #1059
It implements a pipeline abstraction template that runs on
SelfInstruct
step and text generation on a dataset of documents. This should help boot strap basic users to build SFT datasets.