範例程式: git clone https://github.com/rkuo2000/GenAI
- download flux1-dev-fp8.safetensors
- download t5xxl_fp8_e4m3fn.safetensors
- download clip_l.safetensors
- download ae.safetensors
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
mv ~/Downloads/flux1-dev-fp8.safetensors ~/ComfyUI/models/unet/
mv ~/Downloads/t5xxl_fp8_e4m3fn.safetensors ~/ComfyUI/models/clip/
mv ~/Downloads/clip_l.safetensors ~/ComfyUI/models/clip/
mv ~/Downloads/ae.safetensors ~/ComfyUI/models/vae/
python main.py
-
open Browser at
http:127.0.0.1:8188
-
drag flux_dev_fp8_example.png to browser window to generate the work-flow chart
- edit text in
CLIP Text Encode (Positive Prompt)
美圖產生提示詞
pretty Asian woman was holding the flowers in her hands, Korean Model, real photo style, full body shot.
- click
Queue Prompt
to generate image
gTranslate + SDXL-Lightning + TripoSR + Blender
- https://www.kaggle.com/code/rkuo2000/zero123plus
- https://www.kaggle.com/code/rkuo2000/zero123-controlnet
Kaggle: https://www.kaggle.com/code/rkuo2000/triposr
Code: https://github.com/apple/ml-depth-pro
Kaggle: https://www.kaggle.com/code/rkuo2000/depth-pro
SV4D
SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution
musk_musetalk.mp4
python gTTS.py "How are you" en
: generate gTTS.mp3python gT2T.py "How are you" fr
: deep-translatorpython gSpeak.py "How are you" fr
: deep-translator, gTTS & Mpg123
- Parler TTS:
python parler.py
- Bark TTA:
python bark_en.py
,python bark_cn.py
- Coqui TTS:
python coqui_en.py
,python coqui_zh.py
- text-to-speech:
python text_to_speech.py
- gTTS:
python gTTS.py "你好?" zh
- gTranslate:
python gTranslate.py
Blog: 語音辨識API
- run server on local PC (with GPU):
python whisper_llm_server.py
- Generate audio file:
python ../gTTS.py "Hello, how are you?" en
- Post Audio to Server:
python post_audio.py
Large Language Models 教材
Prompt Engineering 教材
git clone https://github.com/rkuo2000/GenAI
cd GenAI/Text-to-Text
python gpt4free.py
(gpt-3.5-turbo)python gpt4all_prompting.py
python LLM_prompting.py
- colab_LLM_prompting.ipynb (on Colab T4)
python llm_server.py
(on GPU)python post_text.py
(on PC)
- colab_pyNgrok_LLM_server (on Colab T4)
- post-text client (on PC)
- Open colab to run pyngrok_Whisper_LLM_Server.ipynb on Colab T4
- Generate audio file:
python ../gTTS.py "Hello, how are you?" en
- Post Audio to Server:
python post_audio.py
ollama list
ollama run llama3.2
python ollama_chat.py
python ollama_stream.py
(print text in streaming mode)python ollama_curl.py
python ollama_speak.py
(ollama generated text, gTTS to speech, then mpg123 to speak)python ollama_speak_t2t.py
(ollama generated text, gTTS to speech, deep-translator to zh-TW, mpg123 to speak)
-
Gemini_Talk.aia
: MIT App Inventor 2 example for using Google Gemini
fine-tune-gemma-7b-it-for-sentiment-analysis
fine-tune-llama-3-for-sentiment-analysis
fine-tune-gemma-models-in-keras-using-lora
- python llava-1.5-7b-hf.py
- python llava-1.6-7b-hf.py
- phi-3.5-vision.py
- pixtral.py
- llama-3.2-vision.py
For running server, (use one of the following)
python llava_server.py
python llava_next_server.py
python phi3-vision_server.py
For running client, (post image & text to VLM server)
python post_imgtxt.py images/barefeet1.jpg
python whisper_llava_server.py
python ../gTTS.py "這是什麼有名的台南美食?" zh
(TTS)python post_imgau.py
(client)
python gemini_image.py
python gemini_jpg2csv.py
- https://www.kaggle.com/code/rkuo2000/langchain-rag-chromadb
- https://www.kaggle.com/code/rkuo2000/llm-llamaindex = LlamaIndex-RAG-pdf
- Langchain-RAG-text.ipynb
- Langchain-ReAct.ipynb
- LlamaIndex-RAG-pdf.ipynb
- LlamaIndex-RAG-pdf-community.ipynb
- LlamaIndex-RAG-pdf-community.py
cd ~/GenAI
git clone https://github.com/openai/swarm
pip install git+https://github.com/openai/swarm.git