Vector Space Model implementation in Go.
This package provides document search based on the algebraic Vector Space Model. The weighting scheme used is the TF-IDF.
import "github.com/quan-to/go-vsm/vsm"
Construct a VSM object and use the methods of the VSM object for training:
vs := vsm.New(nil)
docs := []vsm.Document{
{
Sentence: "Shipment of gold damaged in a fire.",
Class: "d1",
},
{
Sentence: "Delivery of silver arrived in a silver truck.",
Class: "d2",
},
{
Sentence: "Shipment of gold arrived in a truck.",
Class: "d3",
},
...,
}
// Statically training
for _, doc := range docs {
if err := vs.StaticTraining(doc); err != nil {
// Error occurred during training.
}
}
Static training is executed once, and for most cases it's enough:
docs := []vsm.Document{
{
Sentence: "Shipment of gold damaged in a fire.",
Class: "d1",
},
{
Sentence: "Shipment of gold arrived in a truck.",
Class: "d3",
},
}
vs := vsm.New(nil)
for _, doc := range docs {
err := vs.StaticTraining(doc)
fmt.Println(err)
}
But if you've got a stream of data and need a more reactive behaviour for the training process, the dynamic training might be the best choice:
docCh := make(chan Document)
go func() {
defer close(docCh)
// Loads document from some source dynamically
// and sends it to the training channel.
docCh <- vsm.Document{
Sentence: "Delivery of silver arrived in a silver truck.",
Class: "d2",
}
}()
trainCh := vs.DynamicTraining(context.Background(), docCh)
// Checks if error occurred during the training process.
for {
res, ok := <-trainCh
// trainCh closed. All train data was consumed.
if !ok {
break
}
if res.Err != nil {
// Handles error.
}
}
Search applies the Vector Space Model to compare the deviation of angles between each document vector and the query vector.
doc, err := vs.Search("gold silver truck.")
fmt.Println(doc.Class, err)
Go to vsm
folder and run:
go test -v -cover
This package provides a way of testing through file:
go test -v -fromfile
The -fromfile
flag tells the test to run tests over the testdata/training.json
file.
If you want to specify another testing file:
go test -v -fromfile -filename="training-2.json"
The -filename
flag should point to a file inside the testdata
folder. See the training.json file for details on its format.
see LICENSE