support facebook video url download via yt-dlp #469

devxpy · 2024-09-23T08:59:36Z

Q/A checklist

If you add new dependencies, did you update the lock file?

poetry lock --no-update

Run tests

ulimit -n unlimited && ./scripts/run-tests.sh

Do a self code review of the changes - Read the diff at least twice.
Carefully think about the stuff that might break because of this change - this sounds obvious but it's easy to forget to do "Go to references" on each function you're changing and see if it's used in a way you didn't expect.
The relevant pages still run when you press submit
The API for those pages still work (API tab)
The public API interface doesn't change if you didn't want it to (check API tab > docs page)
Do your UI changes (if applicable) look acceptable on mobile?
Ensure you have not regressed the import time unless you have a good reason to do so.
You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

devxpy · 2024-09-23T09:00:16Z

doesnt work yet because fb doesnt do --format bestaudio

daras_ai_v2/vector_search.py

+def is_yt_dlp_able_url(url: str) -> bool:
+    f = furl(url)
+    return (
+        "youtube.com" in f.origin


To fix the problem, we need to ensure that the URL's host is exactly "youtube.com" or a valid subdomain of "youtube.com". This can be achieved by parsing the URL and checking the hostname directly. We will use the urlparse function from the urllib.parse module to extract the hostname and then perform the necessary checks.

Import the urlparse function from the urllib.parse module.

Replace the substring checks with hostname checks using urlparse.

daras_ai_v2/vector_search.py

+        or "youtu.be" in f.origin
+        or "fb.watch" in f.origin
+        or (
+            ("facebook.com" in f.origin or "fb.com" in f.origin)


To fix the problem, we need to parse the URL and check the host value to ensure it matches the allowed domains correctly. This involves using the urlparse function from the urllib.parse module to extract the hostname and then performing the check. This approach ensures that the check is not bypassed by embedding the allowed host in an unexpected location within the URL.

daras_ai_v2/vector_search.py

+        or "youtu.be" in f.origin
+        or "fb.watch" in f.origin
+        or (
+            ("facebook.com" in f.origin or "fb.com" in f.origin)


To fix the problem, we need to ensure that the URL's hostname is properly checked against the allowed hosts. Instead of using a substring match, we should parse the URL and check the hostname directly. This can be done using the urlparse function from the urllib.parse module.

Parse the URL using urlparse.

Extract the hostname from the parsed URL.

Check if the hostname matches any of the allowed hosts.

support facebook video url download via yt-dlp

51407ec

github-advanced-security bot found potential problems Sep 23, 2024

View reviewed changes

devxpy assigned devxpy and unassigned devxpy Sep 23, 2024

@@ -20,2 +20,3 @@
                 from furl import furl
+                from urllib.parse import urlparse
                 from loguru import logger
@@ -743,13 +744,14 @@
                 def is_yt_dlp_able_url(url: str) -> bool:
-                    f = furl(url)
+                    parsed_url = urlparse(url)
+                    hostname = parsed_url.hostname
                     return (
-                        "youtube.com" in f.origin
-                        or "youtu.be" in f.origin
-                        or "fb.watch" in f.origin
+                        hostname == "youtube.com"
+                        or hostname == "youtu.be"
+                        or hostname == "fb.watch"
                         or (
-                            ("facebook.com" in f.origin or "fb.com" in f.origin)
+                            (hostname == "facebook.com" or hostname == "fb.com")
                             and (
-                                "videos" in f.path.segments
-                                or "/share/v/" in f.pathstr
-                                or "v" in f.query.params
+                                "videos" in parsed_url.path
+                                or "/share/v/" in parsed_url.path
+                                or "v" in parsed_url.query
                             )

@@ -743,13 +743,17 @@
                 def is_yt_dlp_able_url(url: str) -> bool:
-                    f = furl(url)
+                    from urllib.parse import urlparse
+                    parsed_url = urlparse(url)
+                    host = parsed_url.hostname
                     return (
-                        "youtube.com" in f.origin
-                        or "youtu.be" in f.origin
-                        or "fb.watch" in f.origin
-                        or (
-                            ("facebook.com" in f.origin or "fb.com" in f.origin)
-                            and (
-                                "videos" in f.path.segments
-                                or "/share/v/" in f.pathstr
-                                or "v" in f.query.params
+                        host and (
+                            host.endswith("youtube.com")
+                            or host == "youtu.be"
+                            or host == "fb.watch"
+                            or (
+                                (host.endswith("facebook.com") or host == "fb.com")
+                                and (
+                                    "videos" in parsed_url.path
+                                    or "/share/v/" in parsed_url.path
+                                    or "v" in parsed_url.query
+                                )
                             )

@@ -743,13 +743,13 @@
                 def is_yt_dlp_able_url(url: str) -> bool:
-                    f = furl(url)
+                    from urllib.parse import urlparse
+                    parsed_url = urlparse(url)
+                    hostname = parsed_url.hostname
                     return (
-                        "youtube.com" in f.origin
-                        or "youtu.be" in f.origin
-                        or "fb.watch" in f.origin
+                        hostname in ["youtube.com", "youtu.be", "fb.watch"]
                         or (
-                            ("facebook.com" in f.origin or "fb.com" in f.origin)
+                            hostname in ["facebook.com", "fb.com"]
                             and (
-                                "videos" in f.path.segments
-                                or "/share/v/" in f.pathstr
-                                or "v" in f.query.params
+                                "videos" in parsed_url.path
+                                or "/share/v/" in parsed_url.path
+                                or "v" in parsed_url.query
                             )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support facebook video url download via yt-dlp #469

support facebook video url download via yt-dlp #469

devxpy commented Sep 23, 2024

devxpy commented Sep 23, 2024

support facebook video url download via yt-dlp #469

Are you sure you want to change the base?

support facebook video url download via yt-dlp #469

Conversation

devxpy commented Sep 23, 2024

Q/A checklist

Legal Boilerplate

devxpy commented Sep 23, 2024