Skip to content

Using transform() to get train and test data #529

Answered by kvnkho
ArijitSinghEDA asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @ArijitSinghEDA ,

Sorry for the late response. Just got back from the holidays.

Your understanding is right with regards to the transform() function. It is intended to run on a single worker of the cluster. The operation of doing a train test split needs to see the whole data. There are a few ways to do this. But I think it helps to think of Fugue as a mindset, and all of the solutions presented below will work on both small and big data.

ML with Big Data

If you are doing a train test split, you must be doing machine learning afterwards. One way you can do the machine learning instead is by doing the train-test split on each partition of data, and running the machine learning for each …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@ArijitSinghEDA
Comment options

@kvnkho
Comment options

Answer selected by kvnkho
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants