Skip to content

rkuo2000/GenAI

Repository files navigation

Generative AI

AIGC 教材
GenAI-projects 教材

範例程式: git clone https://github.com/rkuo2000/GenAI


1. Text-to-Image

Image Creators


ComfyUI / WebUI



Flux1-dev-fp8 model files

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
mv ~/Downloads/flux1-dev-fp8.safetensors ~/ComfyUI/models/unet/
mv ~/Downloads/t5xxl_fp8_e4m3fn.safetensors ~/ComfyUI/models/clip/
mv ~/Downloads/clip_l.safetensors ~/ComfyUI/models/clip/
mv ~/Downloads/ae.safetensors ~/ComfyUI/models/vae/
python main.py
  1. open Browser at http:127.0.0.1:8188

  2. drag flux_dev_fp8_example.png to browser window to generate the work-flow chart

  1. edit text in CLIP Text Encode (Positive Prompt)
    美圖產生提示詞
pretty Asian woman was holding the flowers in her hands, Korean Model, real photo style, full body shot.
  1. click Queue Prompt to generate image

AI繪畫(Stable Diffusion),在WebUI Forge和ComfyUI使用


Krita

FLUX.1[dev]模型在Krita完美整合


2. Text-to-3D

gTranslate + SDXL-Lightning + TripoSR + Blender


Image-to-3D


Kaggle: https://www.kaggle.com/code/rkuo2000/triposr


Depth Pro

Code: https://github.com/apple/ml-depth-pro Kaggle: https://www.kaggle.com/code/rkuo2000/depth-pro


3. Text-to-Video/Motion





SV4D
SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution


Gen-3 Alpha Prompting Guide



SORA



4. Text-to-Avatar

GAN 教材

sample

Tutorial

Tutorial

ComfyUI-MuseTalk

musk_musetalk.mp4


Charactor Builder


5. Text-to-Song


SunoAI + RVC WebUI + ChatGPT

RVC WebUI


  • python gTTS.py "How are you" en : generate gTTS.mp3
  • python gT2T.py "How are you" fr : deep-translator
  • python gSpeak.py "How are you" fr : deep-translator, gTTS & Mpg123

6. Text-to-Speech

  • Parler TTS: python parler.py
  • Bark TTA: python bark_en.py, python bark_cn.py
  • Coqui TTS: python coqui_en.py, python coqui_zh.py
  • text-to-speech: python text_to_speech.py
  • gTTS: python gTTS.py "你好?" zh
  • gTranslate: python gTranslate.py

7. Audio-to-Text (ASR)

webkitSpeechRecognition

Blog: 語音辨識API

asr.html

Google Speech Demo


Whisper



local ASR+LLM Server running on GPU

  1. run server on local PC (with GPU): python whisper_llm_server.py
  2. Generate audio file: python ../gTTS.py "Hello, how are you?" en
  3. Post Audio to Server: python post_audio.py

8. Text-to-Text (LLMs)

Large Language Models 教材
Prompt Engineering 教材

git clone https://github.com/rkuo2000/GenAI
cd GenAI/Text-to-Text

  • python gpt4free.py (gpt-3.5-turbo)
  • python gpt4all_prompting.py
  • python LLM_prompting.py
  • colab_LLM_prompting.ipynb (on Colab T4)

local LLM Server & Client

  • python llm_server.py (on GPU)
  • python post_text.py (on PC)

Colab running LLM Server


Colab running ASR+LLM Server

  1. Open colab to run pyngrok_Whisper_LLM_Server.ipynb on Colab T4
  2. Generate audio file: python ../gTTS.py "Hello, how are you?" en
  3. Post Audio to Server: python post_audio.py

ollama list
ollama run llama3.2

ollama chat/generate

  • python ollama_chat.py
  • python ollama_stream.py (print text in streaming mode)
  • python ollama_curl.py

ollama speak

  • python ollama_speak.py (ollama generated text, gTTS to speech, then mpg123 to speak)
  • python ollama_speak_t2t.py (ollama generated text, gTTS to speech, deep-translator to zh-TW, mpg123 to speak)


  • gemini.html

  • Gemini_Talk.aia : MIT App Inventor 2 example for using Google Gemini


9. LLM Fine-Tuning

LLM Fine-Tuning 教材

PEFT

fine-tune-gemma-7b-it-for-sentiment-analysis
fine-tune-llama-3-for-sentiment-analysis

LoRA

fine-tune-gemma-models-in-keras-using-lora


10. Image-to-Text (VLM)


examples


VLM servers

For running server, (use one of the following)

  1. python llava_server.py
  2. python llava_next_server.py
  3. python phi3-vision_server.py

For running client, (post image & text to VLM server)
python post_imgtxt.py images/barefeet1.jpg


ASR + VLM servers

  1. python whisper_llava_server.py
  2. python ../gTTS.py "這是什麼有名的台南美食?" zh (TTS)
  3. python post_imgau.py (client)

  • python gemini_image.py
  • python gemini_jpg2csv.py

11. RAG

RAG 教材

Sampe Codes



12. Agent

Agent 教材

cd ~/GenAI
git clone https://github.com/openai/swarm
pip install git+https://github.com/openai/swarm.git

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published