Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong error on emails in Gitlab-CI #136

Closed
fauust opened this issue Feb 9, 2021 · 18 comments
Closed

Wrong error on emails in Gitlab-CI #136

fauust opened this issue Feb 9, 2021 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@fauust
Copy link
Contributor

fauust commented Feb 9, 2021

Hi,
lychee in GitLab-CI wrongly reports error on email addresses, see:
https://salsa.debian.org/faust/go-team.pages.debian.net/-/jobs/1427125#L150-L151

screenshot_20210209_154154

Here is the yaml part:

check:
  stage: check
  script:
    - apt-get update -qq && apt-get install -y -qq wget
    - wget -qO- "https://github.com/lycheeverse/lychee/releases/download/v0.5.0/lychee-v0.5.0-x86_64-unknown-linux-gnu.tar.gz" | tar -xz
    - ./lychee --verbose --exclude="irc://irc.debian.org:6667" --exclude="https://anonscm.debian.org" build/*.html

When running it locally, no error is detected:

./lychee --verbose --exclude="irc://irc.debian.org:6667" --exclude="https://anonscm.debian.org" build/*.html | grep @
[email protected] [200 OK]
[email protected] [200 OK]

Am I missing something?

@mre
Copy link
Member

mre commented Feb 9, 2021

So I tested it locally by cloning your repo and running the same command and I can verify that it also works for me. Perhaps the CI server is preventing the check.

  • Are the necessary ports open? (Namely SMTP port 25)
  • Is an SMTP server configured in your CI pipeline?

Here are some tests that you could try: https://www.sparkpost.com/blog/how-to-check-an-smtp-connection-with-a-manual-telnet-session/
You should be able to run them as a throwaway shell script to get the answers or perhaps you can even log in and run a few tests.

This is lychees entire e-mail checking code:

lychee/src/client.rs

Lines 216 to 230 in 4d5b989

pub async fn valid_mail(&self, address: &str) -> bool {
let input = CheckEmailInput::new(vec![address.to_string()]);
let results = check_email(&input).await;
let result = results.get(0);
match result {
None => false,
Some(result) => {
// Accept everything that is not invalid
!matches!(
result.is_reachable,
check_if_email_exists::Reachable::Invalid
)
}
}
}

So all of the above works but lychee does not, that would probably be a case for https://github.com/reacherhq/check-if-email-exists, which is the library we are using under the hood.

@fauust
Copy link
Contributor Author

fauust commented Feb 10, 2021

Ok, so I guess you pointed to the exact pb, I am pretty sure that the CI server does not allow SMTP output (and it does not surprise me, wouldn't that be a very nice spam vector).
I didn't know that lychee was doing such a deep check for email addresses, I though it would maybe just verify that email syntax looks good and also that there is a MX for the domain. But very nice to have this BTW!

In a CI context I guess that this check will fail most of the time? I did not had time to check if it works on https://github.com/lycheeverse/lychee-action but I will in the next days. Do you have any idea on how to fully disable email check then for CI context?

So I will close this as it's clearly not a bug!

@mre
Copy link
Member

mre commented Feb 10, 2021

Do you have any idea on how to fully disable email check then for CI context?

I've added support for skipping e-mails entirely in #137.
Now you can add a --exclude-mail to your lychee call.

So I will close this as it's clearly not a bug!

That's fine, though. 😊 Your issue inspired a nice discussion and now we have support for skipping e-mails. 👍

@fauust
Copy link
Contributor Author

fauust commented Feb 10, 2021

wow, that was fast! Thanks a lot!

@fauust fauust closed this as completed Feb 10, 2021
@fauust
Copy link
Contributor Author

fauust commented Feb 10, 2021

and I have just checked, the email check works fine on GitHub actions:
https://github.com/fauust/docker-systemd/runs/1871010280?check_suite_focus=true#step:4:22

@fauust
Copy link
Contributor Author

fauust commented Mar 19, 2021

Hi @mre, I am trying to implement the latest alpha version in my workflow, it seems that the option to exclude email --exclude-mail is not working:
https://salsa.debian.org/faust/go-team.pages.debian.net/-/jobs/1520656

using:

...
  - wget -qO- "https://github.com/lycheeverse/lychee/releases/download/v0.5.1-alpha/lychee-v0.5.1-alpha-x86_64-unknown-linux-gnu.tar.gz" | tar -xz
  - ./lychee --verbose --exclude-mail --exclude="irc://irc.debian.org:6667" --exclude="https://anonscm.debian.org" build/**/*.html

@mre
Copy link
Member

mre commented Mar 22, 2021

The exclusion is working, but the parsing of email addresses happened before exclusion; and the parsing failed. @pawroman fixed that in #177 and the change got merged to master. The release of 0.6 is blocked on #59 (comment) at the moment.

@mre mre reopened this Mar 22, 2021
@mre mre added the bug Something isn't working label Mar 29, 2021
@mre mre self-assigned this Mar 29, 2021
@mre
Copy link
Member

mre commented Mar 29, 2021

@fauust, can you give it another shot? Just released v0.6.0-alpha2 with the fix:
https://github.com/lycheeverse/lychee/releases/tag/v0.6.0-alpha2

This was made possible by #196 thanks to @lebensterben.

@fauust
Copy link
Contributor Author

fauust commented Mar 29, 2021

@mre
Copy link
Member

mre commented Mar 29, 2021

Oh my. 🤦 Forgot to pass the option to the builder. #200 (nice, even number!) should fix this. Added a unit- and and integration test to make sure it never breaks again.

@fauust
Copy link
Contributor Author

fauust commented Mar 30, 2021

I will test again later today!

@fauust
Copy link
Contributor Author

fauust commented Mar 30, 2021

https://salsa.debian.org/faust/go-team.pages.debian.net/-/jobs/1543957 all good!
There is still 1 error but not related to emails.

@mre
Copy link
Member

mre commented Mar 30, 2021

Alright! Sorry that you had to hang in there for a while and thanks for the feedback. The remaining error seems to be a true 404 link, correct?
We should probably not show the github token message, as it isn't really a github link. It's because our regex is too permissive:

let re = Regex::new(r"github\.com/([^/]*)/([^/]*)")?;
.

It should only match URLs starting with a scheme and www:

http://github.com
https://github.com
https://www.github.com
www.github.com
github.com

and perhaps a few more.
That's unrelated as you said, so I guess we can close this and open a new issue for the remaining work.

@mre
Copy link
Member

mre commented Mar 30, 2021

Gave it a shot here: #201
Let's move the discussion there.

@fauust
Copy link
Contributor Author

fauust commented Mar 30, 2021

The remaining error seems to be a true 404 link, correct?

Yes, sorry my message was not very clear!

@fauust
Copy link
Contributor Author

fauust commented Mar 30, 2021

BTW, I see a diff between https://salsa.debian.org/faust/go-team.pages.debian.net/-/jobs/1543957 and https://salsa.debian.org/faust/go-team.pages.debian.net/-/jobs/1520656, is this something wanted (verbose is less verbose now)?

@mre
Copy link
Member

mre commented Mar 30, 2021

Nope, that wasn't on purpose. Don't know exactly what's going on there. Tried to reproduce it locally with your settings. Everything works as expected (lots of links printed in verbose mode). Here is what I tried:

  • ✅ Using v0.6.0-alpha4 on macOS,
  • ✅ Compiled from scratch on macOS, everything works as expected (lots of links printed in verbose mode)
  • ✅ Using v0.6.0-alpha4 on Debian Linux.

I also tried running it with Docker, and I saw that if I don't set --tty (Allocate a pseudo-TTY) as a parameter, I only get the status output. Wild guess: maybe that something changed on the CI system?
I noticed that the two CI runs used a different gitlab runner:

  • Running with gitlab-runner 13.10.0 (HEAD)
  • Running with gitlab-runner 13.9.0 (HEAD)

Could be that something changed there.
Perhaps you can try to run the job and the standalone binary locally on your machine and compare the results?

@fauust
Copy link
Contributor Author

fauust commented Mar 31, 2021

Your guesses are correct! Salsa gitlab instance was upgrated very recently. I will make deeper tests in the next days and keep you informed if I find something interesting. Closing this as the skip email option seems OK now.

@fauust fauust closed this as completed Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants