-
Notifications
You must be signed in to change notification settings - Fork 579
langchain_en
LangChain is a framework for developing LLM-driven applications, designed to assist developers in building end-to-end applications using LLM.
With the components and interfaces provided by LangChain, developers can easily design and build various LLM-powered applications such as question-answering systems, summarization tools, chatbots, code comprehension tools, information extraction systems, and more.
The following documentation provides two examples of how to use Chinese-Alpaca in LangChain for
- Retrieval QA
- Summarization
The hyperparameters and prompt templates in the examples are not optimal and are only meant for demonstration. For more detailed instructions on using LangChain, please refer to its official documentation.
pip install langchain==0.0.351
pip install sentence_transformers==2.2.2
pip install pydantic==1.10.13
pip install faiss-gpu==1.7.2
Download the full weights, or refer to the Manual Conversion to merge the LoRA weights with the original Llama-2 to obtain the complete set of weights, and save the model locally.
In Retrieval QA, LangChain selects the most relevant part of a document as context by matching the similarity between the query and the document content. This context is then combined with the question to generate the input for the LLM. Therefore, it is necessary to prepare a suitable embedding model for text/question vectorization during the matching process. We takes GanymedeNil/text2vec-large-chinese as an example for demonstration (in practice, you can choose other suitable embedding models based on your specific needs).
This task utilizes LLM to perform automatic question answering for specific documents. The process includes reading texts, text segmentation, text/question vectorization, text-question matching, using the matched text as context along with the question to generate corresponding prompts as input to LLM, and generating answers.
cd scripts/langchain
python langchain_qa.py \
--embedding_path text2vec-large-chinese \
--model_path chinese-alpaca-2-7b \
--file_path doc.txt \
--chain_type refine
Parameter description:
-
--embedding_path
: Directory where the embedding model is located, or the model_id on the HuggingFace Hub. -
--model_path
: Directory where the merged Chinese-Alpaca model is located. -
--file_path
: Document for retrieval QA. -
--chain_type
:refine
(default) orstuff
, which represents different chains. For detailed explanations, refer to here。In simple terms,stuff
is suitable for shorter documents, whilerefine
is suitable for longer documents. -
--gpu_id
: the GPU id(s) to use, default 0. Currently, only single-GPU inference is supported.
Running example:
> python langchain_qa.py --embedding_path text2vec-large-chinese --model_path chinese-alpaca-2-7b --file_path doc.txt --chain_type refine
# 中间输出信息省略
> 请输入问题:李白的诗是什么风格?
> 李白的诗歌风格是浪漫主义。
This task utilizes LLM to generate summarizations of given documents, helping to extract the core information.
cd scripts/langchain
python langchain_sum.py \
--model_path chinese-alpaca-2-7b \
--file_path doc.txt \
--chain_type refine
Parameter description:
-
--model_path
: Directory where the merged Chinese-Alpaca model is located. -
--file_path
: Document to be summarized. -
--chain_type
:refine
(default) orstuff
, which represents different chains. For detailed explanations, refer to here。In simple terms,stuff
is suitable for shorter documents, whilerefine
is suitable for longer documents. -
--gpu_id
: the GPU id(s) to use, default 0. Currently, only single-GPU inference is supported.
Running example:
> python langchain_sum.py --model_path chinese-alpaca-2-7b --file_path doc.txt
# 中间输出信息省略
> 李白(701年5月19日-762年11月30日),字太白,号青莲居士,唐代著名诗人。他在少年时代就展现出了非凡的才华,但由于缺乏正规教育,他放弃了学业并开始漫游生涯,以写作诗歌为主要职业。尽管经历了许多困难和挫折,他始终坚持自己的理想,努力追求卓越。在盛唐时期,他活跃于文学界,成为了当时最杰出的浪漫主义诗人之一。他的诗歌充满着想象力和创造力,经常使用夸张和比喻来表达深刻的思想感情。他的作品至今仍是中国古典文学的重要组成部分。