-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support facebook video url download via yt-dlp #469
base: master
Are you sure you want to change the base?
Conversation
doesnt work yet because fb doesnt do |
def is_yt_dlp_able_url(url: str) -> bool: | ||
f = furl(url) | ||
return ( | ||
"youtube.com" in f.origin |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
youtube.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 7 days ago
To fix the problem, we need to ensure that the URL's host is exactly "youtube.com" or a valid subdomain of "youtube.com". This can be achieved by parsing the URL and checking the hostname directly. We will use the urlparse
function from the urllib.parse
module to extract the hostname and then perform the necessary checks.
- Import the
urlparse
function from theurllib.parse
module. - Replace the substring checks with hostname checks using
urlparse
.
-
Copy modified line R21 -
Copy modified lines R745-R746 -
Copy modified lines R748-R750 -
Copy modified line R752 -
Copy modified lines R754-R756
@@ -20,2 +20,3 @@ | ||
from furl import furl | ||
from urllib.parse import urlparse | ||
from loguru import logger | ||
@@ -743,13 +744,14 @@ | ||
def is_yt_dlp_able_url(url: str) -> bool: | ||
f = furl(url) | ||
parsed_url = urlparse(url) | ||
hostname = parsed_url.hostname | ||
return ( | ||
"youtube.com" in f.origin | ||
or "youtu.be" in f.origin | ||
or "fb.watch" in f.origin | ||
hostname == "youtube.com" | ||
or hostname == "youtu.be" | ||
or hostname == "fb.watch" | ||
or ( | ||
("facebook.com" in f.origin or "fb.com" in f.origin) | ||
(hostname == "facebook.com" or hostname == "fb.com") | ||
and ( | ||
"videos" in f.path.segments | ||
or "/share/v/" in f.pathstr | ||
or "v" in f.query.params | ||
"videos" in parsed_url.path | ||
or "/share/v/" in parsed_url.path | ||
or "v" in parsed_url.query | ||
) |
or "youtu.be" in f.origin | ||
or "fb.watch" in f.origin | ||
or ( | ||
("facebook.com" in f.origin or "fb.com" in f.origin) |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
facebook.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 7 days ago
To fix the problem, we need to parse the URL and check the host value to ensure it matches the allowed domains correctly. This involves using the urlparse
function from the urllib.parse
module to extract the hostname and then performing the check. This approach ensures that the check is not bypassed by embedding the allowed host in an unexpected location within the URL.
-
Copy modified lines R744-R746 -
Copy modified lines R748-R758
@@ -743,13 +743,17 @@ | ||
def is_yt_dlp_able_url(url: str) -> bool: | ||
f = furl(url) | ||
from urllib.parse import urlparse | ||
parsed_url = urlparse(url) | ||
host = parsed_url.hostname | ||
return ( | ||
"youtube.com" in f.origin | ||
or "youtu.be" in f.origin | ||
or "fb.watch" in f.origin | ||
or ( | ||
("facebook.com" in f.origin or "fb.com" in f.origin) | ||
and ( | ||
"videos" in f.path.segments | ||
or "/share/v/" in f.pathstr | ||
or "v" in f.query.params | ||
host and ( | ||
host.endswith("youtube.com") | ||
or host == "youtu.be" | ||
or host == "fb.watch" | ||
or ( | ||
(host.endswith("facebook.com") or host == "fb.com") | ||
and ( | ||
"videos" in parsed_url.path | ||
or "/share/v/" in parsed_url.path | ||
or "v" in parsed_url.query | ||
) | ||
) |
or "youtu.be" in f.origin | ||
or "fb.watch" in f.origin | ||
or ( | ||
("facebook.com" in f.origin or "fb.com" in f.origin) |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
fb.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 7 days ago
To fix the problem, we need to ensure that the URL's hostname is properly checked against the allowed hosts. Instead of using a substring match, we should parse the URL and check the hostname directly. This can be done using the urlparse
function from the urllib.parse
module.
- Parse the URL using
urlparse
. - Extract the hostname from the parsed URL.
- Check if the hostname matches any of the allowed hosts.
-
Copy modified lines R744-R746 -
Copy modified line R748 -
Copy modified line R750 -
Copy modified lines R752-R754
@@ -743,13 +743,13 @@ | ||
def is_yt_dlp_able_url(url: str) -> bool: | ||
f = furl(url) | ||
from urllib.parse import urlparse | ||
parsed_url = urlparse(url) | ||
hostname = parsed_url.hostname | ||
return ( | ||
"youtube.com" in f.origin | ||
or "youtu.be" in f.origin | ||
or "fb.watch" in f.origin | ||
hostname in ["youtube.com", "youtu.be", "fb.watch"] | ||
or ( | ||
("facebook.com" in f.origin or "fb.com" in f.origin) | ||
hostname in ["facebook.com", "fb.com"] | ||
and ( | ||
"videos" in f.path.segments | ||
or "/share/v/" in f.pathstr | ||
or "v" in f.query.params | ||
"videos" in parsed_url.path | ||
or "/share/v/" in parsed_url.path | ||
or "v" in parsed_url.query | ||
) |
Q/A checklist
You can visualize this using tuna:
To measure import time for a specific library:
To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.