Skip to content

Commit

Permalink
Ljubešić and Pandžić stemmer bug correction
Browse files Browse the repository at this point in the history
  • Loading branch information
vukbatanovic committed Oct 16, 2018
1 parent 6b448c0 commit 0fb4ccc
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 6 deletions.
6 changes: 3 additions & 3 deletions Description.props
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
PackageName=SCStemmers

# Version (required)
Version=1.1.0
Version=1.1.1

# Date
Date=2018-02-24
Date=2018-10-16

# Title (required)
Title=A collection of stemmers for Serbian and Croatian.
Expand All @@ -27,7 +27,7 @@ License=GPL 3.0
Description=This package contains Java implementations of three previously published stemmers for Serbian - two of them by Keselj and Sipka, one by Milosevic - and one for Croatian by Ljubesic and Pandzic. All stemmers require the input text to be in UTF-8. The stemmers accept text in both the Cyrillic and Latin scripts as input, and give the output in the Latin script. Performance comparisons between the stemmers (on the task of sentiment analysis) can be found in the paper "Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset," Vuk Batanovic, Bosko Nikolic, Milan Milosavljevic, in Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 2688-2696, Portoroz, Slovenia (2016). See the webpage for the list of reference papers and more information.

# Package URL for obtaining the package archive (required)
PackageURL=https://github.com/vukbatanovic/SCStemmers/releases/download/v1.1.0/SCStemmers_1.1.0.zip
PackageURL=https://github.com/vukbatanovic/SCStemmers/releases/download/v1.1.1/SCStemmers_1.1.1.zip

# URL for further information
URL=https://github.com/vukbatanovic/SCStemmers/
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ public void stemFile (String fileInput, String fileOutput)
```

### Command-line interface
The supplied [SCStemmers.jar](https://github.com/vukbatanovic/SCStemmers/releases/download/v1.1.0/SCStemmers.jar) file makes it possible to stem the contents of textual files using the command line. Stemmers from the SCStemmers package can be invoked by the following command:
The supplied [SCStemmers.jar](https://github.com/vukbatanovic/SCStemmers/releases/download/v1.1.1/SCStemmers.jar) file makes it possible to stem the contents of textual files using the command line. Stemmers from the SCStemmers package can be invoked by the following command:
```
java -jar SCStemmers.jar StemmerID InputFile OutputFile
```
Expand All @@ -50,7 +50,7 @@ where *StemmerID* is a number identifying the stemming algorithm:

### Weka
Alternatively, the stemmers can be utilized as an unofficial plug-in module within Weka (Waikato Environment for Knowledge Analysis).
To do so, download the [SCStemmers Weka package](https://github.com/vukbatanovic/SCStemmers/releases/download/v1.1.0/SCStemmers_1.1.0.zip).
To do so, download the [SCStemmers Weka package](https://github.com/vukbatanovic/SCStemmers/releases/download/v1.1.1/SCStemmers_1.1.1.zip).
Open the Weka package manager (available in Weka >= 3.7) and use the "Unofficial - File/URL" option to select and install SCStemmers.
After restarting Weka, the list of available stemmers (within the StringToWordVector filter) will also contain the four stemmers from this package.

Expand Down
2 changes: 1 addition & 1 deletion src/weka/core/stemmers/LjubesicPandzicStemmer.java
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ public String stemLine(String line) {
private String transform (String word) {
for (String key: transformations.keySet())
if (word.endsWith(key))
return word.replace(key, transformations.get(key));
return word.substring(0, word.length()-key.length()) + transformations.get(key);
return word;
}

Expand Down

0 comments on commit 0fb4ccc

Please sign in to comment.