(I am finishing this project, because I am in favour of the OpenAI's APIs and a light-weight vector DB such as sqlite-vec, rather than LangChain.
https://platform.openai.com/organization/usage
Marketing 5.0: Technology for Humanity
As "Marketing 5.0" says, NLP (with LLM) is a core part of Data Driven Marketing.
In the past half year, I have learned NLP with spaCy and SQLite. I am using the NLP skill in my work for marketing these days.
I am a fan of SQLite, so I study in this project how I can use SQLite as a part of RAG.
My final goal is to realize Data Driven Marketing framework with NLP and LLM. The framework will not be included in this project.
How to create a custom chat model class (Work in progress)
独自LLM APIサービスが提供されている環境でもLangChain使えるようにしたい。例えば社内で提供されるLLMのAPIを使う場合。
LangChain完全入門のコード部分を更新 (Work in progress)
購入した入門書、最新のLangChainのAPI仕様と合っていない。コード部分を最新仕様に合わせて更新。
Test LangChain's RAG capabilities with OpenAI.
I conclude that the document sources are much more important than RAG.
Test spaCy's built-in embedding capabilities.
I conclude that the built-in embedding capabilities are not useful in my work.
Use ChromaDB for keyphrase similality search with textacy. This code use neither LangChain nor OpenAI.
I conclude that Sentence Transformers are useful in my work.
Since the APIs changes frequently, I have started learning LangChain on this site in Aug 2024.
I test OpenAI, spaCy and Sentence Transformers to generate embeddings.
Note: spaCy's "en-core-web-lg" and ""ja-core-news-lg"" seems to output embeddings the size of 300 dimensions. On the other hand, "en-core-web-trf" does not seem to support embeddings because of an interoperability problem with the other packages.
I use Chroma and my original GraphDB to achive my goal for Data Driven Marketing.
I am also interested in sqlite-vec which is more suitable for my gloal. sqlite-vec is still in an alpha version, so I use Chroma for the time being.
In a real world, SQL DB needs to coexist with GraphDB and VectorDB to meet various demands from marketing teams.
I have already developed GraphDB with SQLite and networkx on my own:
- My original schema to store graph entities (nodes).
- My original SQL to dynamically generate triplets on a certain condition (i.e., edges between nodes with dependencies).
- Run Graph Theory on the generated network to generate a sub graph.
Network Graph A Network Graph C
| | <- - - Connect networks where similality distance is smaller than the threshold
Network Graph B
Database stack
[ NetworkX ] ==> Graph theory for knowledge graph
[ Shim Layer ] ==> Dynamic knowledge graph generation
[SQLite database][Chroma database] ==> SQL and Semantic Search
[ SQLite3 ] ==> Base
The GraphDB is not included in this project.
- LangChain完全入門 ==> NLPのスキルあればサッとRAGを理解出来る。そういう意味で良書。
- AIビジネスチャンス 技術動向と事例に学ぶ新たな価値を生成する攻めの戦略(できるビジネス) ==> 頭の中を整理するのに良さそうなので書店で購入した。
- 【考察】RAGはマニュアル人間で、ファインチューニングは新卒育成? ==> VERY GOOD!マーケ部門で、どのようにRAG向け文章を準備したらよいか?何に適しているか?ヒントを与えてくれる。
- 「コンテンツの構造化が大変!?」顧客接点へのAI導入の課題はRAG技術で解決 ==> 私が日々オーバーワークになっている理由がこれ。構造化が異常に大変!
- ローカル で Llama 2 + LangChain の RetrievalQA を試す
- 生成 AI(LLM)のビジネス適用の潮流 ==> AGREE! 私の業務においてもSQLがコアにありLLMを適材適所で使う。SQL知らずしてLLM語るマーケッターにならないように。