Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email matcher fails when email address is in a question #201

Open
michaelkantor opened this issue Oct 7, 2021 · 3 comments
Open

email matcher fails when email address is in a question #201

michaelkantor opened this issue Oct 7, 2021 · 3 comments

Comments

@michaelkantor
Copy link
Contributor

Email matcher handles email addresses that end in question marks as follows:

INPUT: "Can you please send an email to [email protected]?"

OUTPUT: "can you email {{{url0}}}[email protected]?{{{/url0}}}"

PROBLEM: that question mark is doing something to the regex, which matches correctly without it.

Note: I git cloned the repo and tried to take a closer look, but was unable to get any unit tests running. The rest of this message details problems related to dev setup rather than email matcher:

npm test -- packages/autolink/tests/EmailMatcher.test.ts

returns

> [email protected] pretest /Users/michaelkantor/Documents/interweave
> yarn run type

yarn run v1.22.4
$ beemo typescript --reference-workspaces --build
[1/2] CONFIG Creating config files
[2/2] DRIVER Running TypeScript v4.4.3 driver (failed)

packages/ssr/src/index.ts(138,10): error TS7017: Element implicitly has an 'any' type because type 'typeof globalThis' has no index signature.

Failed to execute pipeline. The following errors have occurred:

Command failed with exit code 1: tsc --build


    at makeError (/Users/michaelkantor/Documents/interweave/node_modules/@boost/core/node_modules/execa/lib/error.js:56:11)
    at handlePromise (/Users/michaelkantor/Documents/interweave/node_modules/@boost/core/node_modules/execa/index.js:114:26)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
@michaelkantor
Copy link
Contributor Author

Some further investigations:

Fails to get flagged as invalid

Input: "can you email [email protected]?"

Url Matcher returns: '[email protected]?' rather than '[email protected]'

if (response != null && response.match.match(EMAIL_DISTINCT_PATTERN)) {
      response.valid = false;
}

URL Matcher vs Email Matcher

So... why can't I replicate this in unit tests? Well... it turns out that if we have EmailMatcher BEFORE URL Matcher, everything just works. Unit tests were setup that way, and resulted in successful test after test, causing much confusion.

Potential Fixes:

  1. Update the regex so that "[email protected]?" is never returned by the URL Matcher's doMatch call. I think that any URL that ends with ? probably does not expect ? to be part of the url, its probably just a question. The rare case where we are wrong... is really on the web engineer who hosts that unusual site that requires a lone ? at the end.
  2. Always force Email Matcher to come first.
  3. Make response.match.match(EMAIL_DISTINCT_PATTERN) more forgiving of extra stuff captured by the URL matcher. For example, removing the $ from the end of the regular expression so that if response matches rather than is an email address, its treated as an invalid url match.

All that said: my own code can move EmailMatcher to be the first matcher, and that appears to resolve the issue for my usage.

@milesj
Copy link
Owner

milesj commented Mar 4, 2022

Unfortunately [email protected]? is a valid URL, as it's merely a shorthand for http://[email protected]?. The mkantor@ is the "auth" while ? is an empty query string.

Let me think about this a bit more.

@daria-github
Copy link

Any update on this issue? Running into this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants