Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSI is broken af. #69

Open
henrebotha opened this issue May 20, 2016 · 21 comments
Open

LSI is broken af. #69

henrebotha opened this issue May 20, 2016 · 21 comments
Labels

Comments

@henrebotha
Copy link

henrebotha commented May 20, 2016

I'm on Ruby 2.2.4. I'm trying to use LSI. Nothing works, and the error messages SUCK. I've tried both the last release (i.e. the gem version) and the latest commit from Github.

lsi = ClassifierReborn::LSI.new
training_data = ["Bcom", "Corporate Administration", "Forensic Auditing"]
category = :accounting
training_data.each do |d|
  begin
    lsi.add_item(d, category)
  rescue StandardError => e
    puts "#{d} misbehaving: #{e.message}"
  end
end

#=> Forensic Auditing misbehaving: comparison of Float with NaN failed

Better yet, if I swap the order of the training data, I get this:

lsi = ClassifierReborn::LSI.new
training_data = ["Corporate Administration", "Forensic Auditing", "Bcom"]
category = :accounting
training_data.each do |d|
  begin
    lsi.add_item(d, category)
  rescue StandardError => e
    puts "#{d} misbehaving: #{e.message}"
  end
end

#=> Forensic Auditing misbehaving: comparison of Float with NaN failed
#=> Bcom misbehaving: comparison of Float with NaN failed
@Ch4s3
Copy link
Member

Ch4s3 commented May 20, 2016

There are some known issues with LSI. Are you using GNU GSL or the native Ruby version? If you're using the native ruby version, it relies on a buggy Ruby implementation of a matrix transform (discussed here #30) and throws this type of error for some input. If that's the case, using GNU GSL will fix this. If you're using GNU GSL, this will require some digging.

@henrebotha
Copy link
Author

You're fast! I am using the native Ruby version. I'll hit up GNU GSL and see what happens.

If I were you I'd mention this in the Readme.

@Ch4s3
Copy link
Member

Ch4s3 commented May 20, 2016

I happened to be on the issues. Yeah, let me know how GNU GSL works out. I need to rewrite the SVD, but I'm not a great C programmer so the process has been slow to say the least. If you're trying to train with small inputs especially ones that use abbreviations, the matrix transform is highly likely to break in the Ruby only version.

@henrebotha
Copy link
Author

While I have you, I'm getting this:

GSL::ERROR::EUNIMPL: Ruby/GSL error code 24, svd of MxN matrix, M<N, is not implemented (file svd.c, line 61), the requested feature is not (yet) implemented
from /Users/leaply/.rbenv/versions/2.2.4/lib/ruby/gems/2.2.0/bundler/gems/classifier-reborn-4e3bb14d6388/lib/classifier-reborn/lsi.rb:292:in `SV_decomp'

@Ch4s3
Copy link
Member

Ch4s3 commented May 20, 2016

Hum, could be related to this SciRuby/rb-gsl#21. I'm investigating.

@Ch4s3
Copy link
Member

Ch4s3 commented May 20, 2016

Which version of GSL did you pull down?

@henrebotha
Copy link
Author

1.16 via homebrew

@Ch4s3
Copy link
Member

Ch4s3 commented May 20, 2016

1.16 might work, let me try to pull down fresh versions later and try locally.

@Ch4s3
Copy link
Member

Ch4s3 commented May 28, 2016

I haven't gotten anywhere with this, can anyone else reproduce this?

@Ch4s3
Copy link
Member

Ch4s3 commented Nov 29, 2016

@henrebotha can you try with the latest master to see if #77 raises an error on your input?

@henrebotha
Copy link
Author

That's gonna take some doing. I'll try when I have access to a Mac.

@Ch4s3
Copy link
Member

Ch4s3 commented Dec 30, 2016

@henrebotha have you tried this yet?

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 6, 2017

I intend to close this if there's no more action in the next few days.

@timcraft
Copy link

@Ch4s3 @henrebotha I'm seeing the same issue with my data and can reproduce with this script:

require 'classifier-reborn'

lsi = ClassifierReborn::LSI.new

# Without gsl this raises NoMethodError
# /classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:143:
# in `block in build_index': undefined method `normalize' for nil:NilClass

# With gsl this raises GSL::ERROR::EUNIMPL
# /classifier-reborn-2.0.4/lib/classifier-reborn/lsi.rb:292:in `SV_decomp':
# Ruby/GSL error code 24, svd of MxN matrix, M<N, is not implemented (file svd.c, line 60),
# the requested feature is not (yet) implemented

lsi.add_item 'England', 'xx'
lsi.add_item 'England & Wales', 'xx'
lsi.add_item 'England And Wales', 'xx'

Using GNU GSL, tried upgrading from 2.2.1 to 2.3 and that didn't fix it.

Related to this TODO in lsi.rb?

@mepatterson
Copy link

mepatterson commented Mar 9, 2017

Any ideas on this? I'm seeing the Ruby/GSL-derived exception in SV_decomp whenever I try to build an index on more than around 2,000 sentences. I have 4,007 sentences I'd like to index. For those 2000 the classifier works great for my purpose, so I'm really eager to find a way to get this working properly, if possible...

(to be fair, it probably has nothing to do with how many sentences I have and more to do with some sentence entering the index beyond 2000 that is causing a problem like seen in other comments above...)

@Ch4s3
Copy link
Member

Ch4s3 commented Mar 10, 2017

@mepatterson I'd guess you have some malformed input. Can you throw a begin rescue around your training and see which doc/line blows it up?

@timcraft I know this sounds stupid, but have you double checked that you're actually using GNU GSL? It may not have loaded correctly.

@mepatterson
Copy link

mepatterson commented Mar 10, 2017 via email

@Ch4s3
Copy link
Member

Ch4s3 commented Mar 10, 2017

Ok, I'll try to dig in this weekend.

@timcraft
Copy link

timcraft commented Mar 11, 2017

@Ch4s3 Yep, it appears to be loaded ok. I added this at the top of the script (matrix code from gsl-2.1.0.2/examples/linalg/SV.rb which uses SV_decomp):

puts "Using GSL/#{GSL::VERSION} RubyGSL/#{GSL::RUBY_GSL_VERSION}"
a = GSL::Matrix[[3, 5, 2], [6, 2, 1], [4, 7, 3]]
u, v, s = a.SV_decomp
p u*GSL::Matrix.diagonal(s)*v.trans

Output is Using GSL/2.3 RubyGSL/2.1.0.2, and the correct matrix.

@Ch4s3 Ch4s3 added the bug label Jul 31, 2017
@elisaado
Copy link

elisaado commented Nov 28, 2017

Same here.

I have GSL installed but it's not even loaded

@Ch4s3
Copy link
Member

Ch4s3 commented Nov 28, 2017

@elisaado can you post any details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants