-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running time #60
Comments
Disclaimer: I am a regular fastLink user, not a fastLink developer. It depends. Details matter. Please show the fastLink code that you used. Do you use blocking? |
Hi @MAranzazuRU89, Like @aalexandersson mentions, a bit more context could be of help here. If it happens that your data allows for blocking (creating subsets of observations similar in at least one dimension), then I have no doubt the task you have in mind can be scaled and perhaps finished in less than 12 hours. If blocking is not an option, then computing power could be a solution. Keep us posted! Ted |
Hi! |
Hi! I have a question directly related to run time reduction. I am trying to run fastLink on a cluster computer (matching a few million firms), and was wondering if I needed to specify the number of nodes available (and perhaps structure the code differently)? I didn't see a mention of how to do this in the documentation, but perhaps missed it. Thanks in advance! |
Hi @ishanaratan, If you are using a cluster computer. I would do the following:
fastLink runs in parallel within a node, but not across nodes. If the nodes have multiple threads, fastLink will make use of all of them if the size of the data is significant. If it is small, then it will use the minimum number of threads needed. Please, if anything, let us know. All my best, Ted |
I had a question with the expected running time and computing capacity I need to plan to use fastLink. I am trying to run it on a database of 1.7M observations, only matching on two variables. However, so far (and the code has been running for 12h) I have not been able to run past the first task of calculating matches for each variable. So I was wondering whether this is to be expected and I should move to a cluster or whether this sounds weird and I am doing something wrong.
Thank you!
The text was updated successfully, but these errors were encountered: