Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure distance to nearest group #57

Open
shamahutoto opened this issue Oct 7, 2021 · 2 comments
Open

Measure distance to nearest group #57

shamahutoto opened this issue Oct 7, 2021 · 2 comments

Comments

@shamahutoto
Copy link

Hi there,

I want to find items that aren't matched but were just under the threshold for matching with a group. Is there a way to do this?

@aalexandersson
Copy link

Disclaimer: I am a regular fastLink user, not a developer.

Please give an example to make the issue easier to understand.

For example, this copy-pasted code will to subset to threshold match 0.85 and above:

matched_dfs <- getMatches(
  dfA = dfA, dfB = dfB, 
  fl.out = matches.out, threshold.match = 0.85
)

I guess that you need to subset with blocking which is doable but more complicated. The developers are working on improving the blocking functionality.

@tedenamorado
Copy link
Collaborator

Hi @shamahutoto,

As @aalexandersson mentions, one idea here would be to lower the matching threshold. By default fastLink only returns pairs of records with a matching probability larger than 0.85. However, you can lower that value to e.g., 0.001 and recover pairs with a matching probability larger than that value which will be a larger group than the one produced by the default value. However, I would not recommend going too low as you will get pairs of records with a value that is basically 0 and if the datasets you are matching are large, then the fastLink objects will be incredibly large.

If anything, let us know.

All my best,

Ted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants