Skip to content

araobp/learning-langchain

Repository files navigation

Learning LangChain

(I am finishing this project, because I am in favour of the OpenAI's APIs and a light-weight vector DB such as sqlite-vec, rather than LangChain.

OpenAI API Usage

https://platform.openai.com/organization/usage

My bible

Marketing 5.0: Technology for Humanity

As "Marketing 5.0" says, NLP (with LLM) is a core part of Data Driven Marketing.

Background and Motivation

In the past half year, I have learned NLP with spaCy and SQLite. I am using the NLP skill in my work for marketing these days.

Project Goal

I am a fan of SQLite, so I study in this project how I can use SQLite as a part of RAG.

My final goal is to realize Data Driven Marketing framework with NLP and LLM. The framework will not be included in this project.

Code

独自LLM APIサービスが提供されている環境でもLangChain使えるようにしたい。例えば社内で提供されるLLMのAPIを使う場合。

購入した入門書、最新のLangChainのAPI仕様と合っていない。コード部分を最新仕様に合わせて更新。

Test LangChain's RAG capabilities with OpenAI.

I conclude that the document sources are much more important than RAG.

Test spaCy's built-in embedding capabilities.

I conclude that the built-in embedding capabilities are not useful in my work.

Use ChromaDB for keyphrase similality search with textacy. This code use neither LangChain nor OpenAI.

I conclude that Sentence Transformers are useful in my work.

Since the APIs changes frequently, I have started learning LangChain on this site in Aug 2024.

OpenAI API

https://platform.openai.com

Embeddings

I test OpenAI, spaCy and Sentence Transformers to generate embeddings.

Note: spaCy's "en-core-web-lg" and ""ja-core-news-lg"" seems to output embeddings the size of 300 dimensions. On the other hand, "en-core-web-trf" does not seem to support embeddings because of an interoperability problem with the other packages.

VectorDB

Chroma

I use Chroma and my original GraphDB to achive my goal for Data Driven Marketing.

I am also interested in sqlite-vec which is more suitable for my gloal. sqlite-vec is still in an alpha version, so I use Chroma for the time being.

My original GraphDB (private project)

In a real world, SQL DB needs to coexist with GraphDB and VectorDB to meet various demands from marketing teams.

I have already developed GraphDB with SQLite and networkx on my own:

  • My original schema to store graph entities (nodes).
  • My original SQL to dynamically generate triplets on a certain condition (i.e., edges between nodes with dependencies).
  • Run Graph Theory on the generated network to generate a sub graph.
     Network Graph A    Network Graph C
               |           |   <- - - Connect networks where similality distance is smaller than the threshold
              Network Graph B

          Database stack

[            NetworkX            ]  ==> Graph theory for knowledge graph
[           Shim Layer           ]  ==> Dynamic knowledge graph generation
[SQLite database][Chroma database]  ==> SQL and Semantic Search
[            SQLite3             ]  ==> Base

The GraphDB is not included in this project.

Reference

参考

About

Learning LLM with LangChain, OpenAI and spaCy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published