Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper-melo-llama3 not receiving my voice #307

Open
acastry opened this issue Jul 3, 2024 · 0 comments
Open

whisper-melo-llama3 not receiving my voice #307

acastry opened this issue Jul 3, 2024 · 0 comments

Comments

@acastry
Copy link

acastry commented Jul 3, 2024

Hi
I am trying to deploy whisper-melo-llama3

I created an agent with my ngrok adress coming from my ngrok token :
curl --location 'https://XXXXXXXXXXXX.ngrok-free.app/agent' \ --header 'Content-Type: application/json' \ --data '{ "agent_config": { "agent_name": "Alfred", "agent_type": "other", "tasks": [ { "task_type": "conversation", "tools_config": { "llm_agent": { "model": "deepinfra/meta-llama/Meta-Llama-3-70B-Instruct", "max_tokens": 123, "agent_flow_type": "streaming", "use_fallback": true, "family": "llama", "temperature": 0.1, "request_json": true, "provider":"deepinfra" }, "synthesizer": { "provider": "melotts", "provider_config": { "voice": "Casey", "sample_rate": 8000, "sdp_ratio" : 0.2, "noise_scale" : 0.6, "noise_scale_w" : 0.8, "speed" : 1.0 }, "stream": true, "buffer_size": 123, "audio_format": "wav" }, "transcriber": { "encoding": "linear16", "language": "en", "model": "whisper", "stream": true, "task": "transcribe" }, "input": { "provider": "twilio", "format": "wav" }, "output": { "provider": "twilio", "format": "wav" } }, "toolchain": { "execution": "parallel", "pipelines": [ [ "transcriber", "llm", "synthesizer" ] ] } } ] }, "agent_prompts": { "task_1": { "system_prompt": "What is the Ultimate Question of Life, the Universe, and Everything?" } }

It returns
"{"agent_id":"*************-3409-4f09-a1a7-582b12232444","state":"created"}"

Then i try to do

curl --location 'https://XXXXXXXXXXXX.ngrok-free.app/call' \ --header 'Content-Type: application/json' \ --data '{ "agent_id": "*************-3409-4f09-a1a7-582b12232444", "recipient_phone_number": "+590690320620" }'
{"detail":"Not Found"}

So i do

curl --location '[http://0.0.0.0:/call](http://0.0.0.0:8001/call)' \ --header 'Content-Type: application/json' \ --data '{ "agent_id": "*************-3409-4f09-a1a7-582b12232444", "recipient_phone_number": "+590690320620" }'

to get it working don't know why

It calls me but the system doesn't hear my voice. DO i have to enter any endpoint into TWILIO ? Please help me @prateeksachan

2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {telephony} [handle] Sending Message None and MZbcf6c8ebc7391c74914520cf4cfa7639 and {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 15} 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {telephony} [handle] Sending message 4096 linear16 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {twilio} [form_media_message] Converting to mulaw 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {task_manager} [__process_output_loop] Duration of the byte 0.256 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.256 to maintain quueue on our side 8000 2024-07-03 01:23:25 2024-07-03 05:23:25.988 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1719984205.988593 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {task_manager} [__process_output_loop] Started transmitting at 1719984205.9892044 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {task_manager} [__process_output_loop] ##### Start response is True for 16 and hence starting to speak {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 16} Current sequence ids {-1} 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {telephony} [handle] Sending Message None and MZbcf6c8ebc7391c74914520cf4cfa7639 and {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 16} 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {telephony} [handle] Sending message 4096 linear16 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {twilio} [form_media_message] Converting to mulaw 2024-07-03 01:23:25 2024-07-03 05:23:25.990 INFO {task_manager} [__process_output_loop] Duration of the byte 0.256 2024-07-03 01:23:25 2024-07-03 05:23:25.990 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.256 to maintain quueue on our side 8000 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1719984206.2192206 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {task_manager} [__process_output_loop] Started transmitting at 1719984206.219604 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {task_manager} [__process_output_loop] ##### Start response is True for 17 and hence starting to speak {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 17, 'is_final_chunk_of_entire_response': True} Current sequence ids {-1} 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {telephony} [handle] Sending Message None and MZbcf6c8ebc7391c74914520cf4cfa7639 and {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 17, 'is_final_chunk_of_entire_response': True} 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {telephony} [handle] Sending message 828 linear16 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {twilio} [form_media_message] Converting to mulaw 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] Duration of the byte 0.05175 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] ##### End of synthesizer stream and 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] Making first message passed as True 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.05175 to maintain quueue on our side 8000 2024-07-03 01:23:26 2024-07-03 05:23:26.244 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1719984206.2447424 2024-07-03 01:23:26 2024-07-03 05:23:26.244 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 2024-07-03 01:23:26 2024-07-03 05:23:26.345 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result 2024-07-03 01:23:26 2024-07-03 05:23:26.591 INFO {task_manager} [__check_for_completion] Only 0.34679651260375977 seconds since last spoken time stamp and hence not cutting the phone call 2024-07-03 01:23:28 2024-07-03 05:23:28.586 INFO {task_manager} [__handle_initial_silence] Checking for initial silence 15 2024-07-03 01:23:28 2024-07-03 05:23:28.594 INFO {task_manager} [__check_for_completion] Only 2.349334239959717 seconds since last spoken time stamp and hence not cutting the phone call 2024-07-03 01:23:30 2024-07-03 05:23:30.596 INFO {task_manager} [__check_for_completion] Only 4.351584434509277 seconds since last spoken time stamp and hence not cutting the phone call 2024-07-03 01:23:31 2024-07-03 05:23:31.587 INFO {task_manager} [__handle_initial_silence] Checking for initial silence 15 2024-07-03 01:23:32 2024-07-03 05:23:32.601 INFO {task_manager} [__check_for_completion] Asking if the user is still there 2024-07-03 01:23:32 2024-07-03 05:23:32.605 INFO {task_manager} [_synthesize] ##### sending text to melotts for generation: Hey, are you still there? 2024-07-03 01:23:32 2024-07-03 05:23:32.605 INFO {melo_synthesizer} [push] Pushed message to internal queue 2024-07-03 01:23:32 2024-07-03 05:23:32.606 INFO {twilio} [handle_interruption] interrupting because user spoke in between 2024-07-03 01:23:32 2024-07-03 05:23:32.607 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': 'Hey, are you still there?', 'leg_id': 'eadcdfac-26f4-458b-9773-88a260359249', 'time': '2024-07-03 05:23:32', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'melotts', 'cached': False, 'latency': None, 'is_final': False, 'engine': 'default'}

Full logs attached

bolna-app.log

Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant