Skip to content

vasileknik76/dummysearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dummysearch

DummySearch is Full Text Search and text comparsion engine. Its work is based on the TF-IDF metric. All operations with data are performed via REST API. Documents in index may have some extra data, but it not uses in search.

You can use any language, but engine uses snowball stemmer (https://github.com/kljensen/snowball), so languages list restricted with:

  • English,
  • Spanish (español),
  • French (le français),
  • Russian (ру́сский язы́к),
  • Swedish (svenska),
  • Norwegian (norsk)

Dummysearch calculates TF-IDF automatically in background every UpdatePeriod time.

Contents

Build and run:

Build native:

$ go build -o build/dummysearch cmd/dummysearch/main.go

Build docker:

$ docker build -t dummysearch .

Run native:

$ ./build/dummysearch

Run in docker:

$ docker run -p 6745:6745 -it -d dummysearch

Operations:

Creating new index:

Creates index with specified config. See: config

$ curl --location --request POST 'http://localhost:6745/' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "lol",
    "config": {
        "language": "english",
        "updatePeriod": "120s",
        "autoUpdate": true,
        "customIds": false
    }
}'

Response:

{
  "status": true,
  "payload": {
    "Message": "OK"
  }
}

Remove index:

curl --location --request DELETE 'http://localhost:6745/lol/'

Response:

{
  "status": true,
  "payload": {
    "Message": "OK"
  }
}

Add document to index:

curl --location --request POST 'http://localhost:6745/lol/' \
--header 'Content-Type: application/json' \
--data-raw '{
    "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
    "meta": {
        "someField": "any value",
        "otherField": 1
    },
    "id": "1"
}'

Response:

{
  "status": true,
  "payload": {
    "Message": "OK",
    "DocumentId": "1"
  }
}

Bulk add document to index:

curl --location --request POST 'http://localhost:6745/lol/batch' \
--header 'Content-Type: application/json' \
--data-raw '[
    {
        "content": "some text!",
        "meta": {
            "foo": "bar"
        },
        "id": "1"
    },
    {
        "content": "london is the capital of great britain",
        "meta": {
            "bar": "baz"
        },
        "id": "2"
    },
    {
        "content": "any other, text.",
        "meta": {
            "foo": "bar2"
        },
        "id": "3"
    },
]'

Response:

{
  "status": true,
  "payload": {
    "Message": "OK",
    "DocumentIds": [
      "1",
      "2",
      "3"
    ]
  }
}

Calculate TFIDF:

curl --location --request GET 'http://localhost:6745/lol/update'

Response:

{
  "status": true,
  "payload": {
    "Message": "Index updating"
  }
}

Get document by id:

Source text content not stored, so you can only receive document meta.

curl --location --request GET 'http://localhost:6745/lol/0'

Response:

{
  "status": true,
  "payload": {
    "Doc": {
      "Meta": {
        "otherField": 1,
        "someField": "any value"
      }
    }
  }
}

Delete document by id:

curl --location --request DELETE 'http://localhost:6745/lol/0'

Response:

{
  "status": true,
  "payload": {
    "Message": "OK"
  }
}

Search documents by query:

curl --location --request GET 'http://localhost:6745/lol/search?query=lorem%20london'

Response:

{
  "status": true,
  "payload": [
    {
      "DocId": "2",
      "Meta": {
        "bar": "baz"
      },
      "Score": 0.26726124191242445
    },
    {
      "DocId": "0",
      "Meta": {
        "otherField": 1,
        "someField": "any value"
      },
      "Score": 0.07669649888473704
    }
  ]
}

Compare two documents:

curl --location --request GET 'http://localhost:6745/lol/compare?doc1=1&doc2=3'

Response:

{
  "status": true,
  "payload": {
    "score": 0.14907119849998599
  }
}

Index config

  • Language - language for index. One index have only one language. If text in document contain other language words simply will not stemmed.
  • UpdatePeriod - duration for update TF-IDF values. For example if UpdatePeriod is "60s" and AutoUpdate enabled. Calculating will be started every 60 seconds, but process check that index has changes
  • AutoUpdate - Enable or disable AutoUpdate. If AutoUpdate disabled you must call Calculate TF-IDF endpoint

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published