Skip to content

Commit

Permalink
fix: adding missing abbreviations files for SentenceSplitter (#8660)
Browse files Browse the repository at this point in the history
* adding missing abbreviations files for SentenceSplitter

* fixing tests path
  • Loading branch information
davidsbatista authored Dec 19, 2024
1 parent 91619a7 commit c306bee
Show file tree
Hide file tree
Showing 5 changed files with 2,075 additions and 2 deletions.
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ repos:
rev: v2.3.0
hooks:
- id: codespell
exclude: "haystack/data/abbreviations"
args: ["--toml", "pyproject.toml"]
additional_dependencies:
- tomli
Expand Down
2 changes: 1 addition & 1 deletion haystack/components/preprocessors/sentence_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ def _read_abbreviations(lang: Language) -> List[str]:
:param lang: The language to read the abbreviations for.
:returns: List of abbreviations.
"""
abbreviations_file = Path(__file__).parent.parent / f"data/abbreviations/{lang}.txt"
abbreviations_file = Path(__file__).parent.parent.parent / f"data/abbreviations/{lang}.txt"
if not abbreviations_file.exists():
logger.warning("No abbreviations file found for {language}. Using default abbreviations.", language=lang)
return []
Expand Down
Loading

0 comments on commit c306bee

Please sign in to comment.