Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create table with Language info based on paper titles #52

Open
chrished opened this issue Sep 27, 2024 · 0 comments
Open

Create table with Language info based on paper titles #52

chrished opened this issue Sep 27, 2024 · 0 comments
Assignees

Comments

@chrished
Copy link
Collaborator

https://fasttext.cc/

Following example from Flavio:

import re
import fasttext


try:
    from importlib.resources import files
except ImportError: # python < 3.10. https://setuptools.pypa.io/en/latest/userguide/datafiles.html
    from importlib_resources import files


fasttext_model_path = str(files('ivory_data_forge.data').joinpath("lid.176.ftz")) # this works only for python >=3.10


print(fasttext_model_path)
model = fasttext.load_model(fasttext_model_path)

def detect_language(s):
    """Detect whether a string is in English, Dutch or another language.

    When used on short strings (such as job titles), the confidence can be quite low (0.2) 
    even though reading the string makes clear that the prediction is correct.  
    
    Args:
        s (str): Text to infer language from.  
    """
    predictions = model.predict(s) 
    language = re.sub("__label__", "", predictions[0][0])
    # confidence = predictions[1][0] 
    if language in ["nl", "en"]:
        return language 
    return "other"
@chrished chrished self-assigned this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant