Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASCII output support #568

Open
rzhw opened this issue Jan 11, 2019 · 53 comments
Open

Add ASCII output support #568

rzhw opened this issue Jan 11, 2019 · 53 comments

Comments

@rzhw
Copy link

rzhw commented Jan 11, 2019

I've been working on a product using libsass/sassc and have migrated it to Dart Sass. Presently, this product doesn't support UTF-8 characters in stylesheets.

It looks like Dart Sass only supports outputting as UTF-8, with dart-lang/sdk#11744 being the blocker given in the README for why there's no support for more encodings (UTF-16, etc). Dart does however appear to have an AsciiEncoder.

For the time being, we've added an extra step to CSS escape non-ASCII characters in generated stylesheets. (On a related note, we also remove the @charset 'UTF-8'; atrule, both because it would be technically incorrect for an ASCII-encoded stylesheet, and because of #567.)

This isn't trivial because of sourcemaps, so we're doing this step with a PostCSS plugin.

Would adding ASCII-encoded output support be in scope of Dart Sass? I'd imagine when Dart adds other encoders, having this ready would let other encodings be sibling output options alongside UTF-8 and ASCII.

@nex3
Copy link
Contributor

nex3 commented Jan 17, 2019

This is something I could see adding as a command-line flag (--ascii-only or something like that) to serialize Unicode characters as ASCII escapes.

@bit-wise
Copy link

@nex3 will this command-line flag be ready when Ruby sass is deprecated?

@nex3
Copy link
Contributor

nex3 commented Feb 18, 2019

Ruby Sass has been deprecated for almost a year now. And no, there's no plan to tie this issue to its release cycle. It's marked as "help wanted", which means it's not a priority for the Sass team, but if an external user wanted to contribute a fix we'd help them land it.

@jpuncle

This comment has been minimized.

@BuptStEve
Copy link

// @source - [@Stephn-R](https://github.com/sass/sass/issues/1395#issuecomment-57483844)
// @description converts 1 or more characters into a unicode
// @markup {scss}
// unicode("e655"); // "\e655"
@function unicode($str){
    @return unquote("\"")+unquote(str-insert($str, "\\", 1))+unquote("\"")
}

@clshortfuse
Copy link

clshortfuse commented Jul 28, 2020

So, I'm trying to get a zero-width unicode character to work with SASS. It won't appear without a hex editor because of SASS reinterpreting that. On normal CSS, it's:

div::before {
  content: "\200B";
}

But SASS will rewrite it as:

div::before {
  content: "";
}

It's a little frustrating trying to debug with an invisible character since SASS wants to rewrite it. A flag would unfortunately be global to everything when it's somewhat of an edge case where you want raw/literal characters to be outputted as a string. I just have these few instances where it's better to not convert to Unicode. I can imagine there's a lot of other characters, both printable and non-printable, that would greatly benefit from not being rewritten as Unicode, such as:

@watershed
Copy link

watershed commented Jan 29, 2021

Finding this thread today because of an issue similar to @clshortfuse above.

I learn that Dart Sass is converting my authored:

content: '\00A0/\00A0';

to:

content: ' / ';

…where the spaces written do seem to be no breaking spaces, but they are prone to rendering in the browser as:

A

No such problem with Libsass, but then Libsass fails at some other stuff.

@watershed
Copy link

Here’s another example that erratically fails due to to the following authored Sass:

content: '\0231F';
transform: rotate(45deg);

…ending up as:

content:"⌟";transform:rotate(45deg);

what-you-can-do-generated-content-fail

See also this Twitter thread.

@nex3
Copy link
Contributor

nex3 commented Jan 29, 2021

@watershed As mentioned earlier in this thread, Sass emits a @charset declaration or a BOM whenever it emits non-ASCII output, which will force browsers to interpret the stylesheet as UTF-8 even if it's served with non-UTF-8 headers. If that's not working, chances are you're doing some sort of post-processing that's incorrectly stripping that extra information.

@cbush06
Copy link

cbush06 commented Mar 1, 2021

@nex3 -- I have not touched any settings of Angular CLI that (to my knowledge) would affect the inclusion or omission of @charset. In fact, I haven't configured any of the compilation process (it's using defaults). However, I'm encountering this issue.

@nex3
Copy link
Contributor

nex3 commented Mar 4, 2021

@cbush06 Are you seeing a case where your CSS is being served with @charset (or a UTF-8 BOM) and is still being interpreted as the wrong character set? Otherwise, I'm not sure how Sass can address your issue.

@cbush06
Copy link

cbush06 commented Mar 4, 2021

@nex3 -- After reading your posts earlier, I went and checked. What I discovered is the CSS generated for each angular component do have the @charset. Other SCSS files (e.g. from my assets folder) do not have it included.

@nmoresco
Copy link

I think the assumption that all CSS output by dart-sass will be loaded directly by a browser is not a given. For example, I was running into this problem because my compiled CSS files are loaded by GWT, which doesn't know about the @charset annotation. Thus, you get this while compiling and it swallows the css block.

[WARN] Line 13 column 12: encountered """. Was expecting one of: "}" "+" "-" "," ";" "/" <STRING> <IDENT> <NUMBER> <URL> <PERCENTAGE> <PT> <MM> <CM> <PC> <IN> <PX> <EMS> <EXS> <DEG> <RAD> <GRAD> <MS> <SECOND> <HZ> <KHZ> <DIMEN> <HASH> <IMPORTANT_SYM> <UNICODERANGE> <FUNCTION>

I think the newer versions of GWT that use GSS might not have this problem, but I can't move to that easily. Regardless of my specific situation, my point is that Sass output is used in many kinds of toolchains that aren't the browser.

@nex3
Copy link
Contributor

nex3 commented Aug 11, 2022

Sass targets the CSS specification. We'll make exceptions for browser behavior that's contrary to the spec only because browsers are the overwhelming majority of CSS consumers. Any other tool should follow the specification when consuming CSS, and if it doesn't it's pretty clearly a bug in that tool and not in Sass.

@jerryephicacy
Copy link

@nex3 ,

  1. When running the application in dev mode using webpack, we have charset utf8 present at the top of the compiled application.css file. But, it is removed in prod mode and the charset utf8 is not there.
  2. But in both dev mode and prod mode, the meta charset utf8 is present in the head tag.
  3. I tried bumping up css-loader, sass-loader, postcss-loader, etc., and still not successful.

Hence, when I followed what @jpcamara mentioned in the comment above, I have modified the fa-content and removed the slash ( \ ) symbol on the variables and the resulting output in the css is fa-font-awesome-flag:before{content:"\f425"} which renders correctly everytime in the browser.

@kdagnan
Copy link

kdagnan commented Aug 15, 2022

Would like to add to this discussion:
We are currently moving to dart-sass from sass (node). Our build step results in the correct unicode character. IE:

icon-flag: before {
   content: "\E95E"
}

result of compilation:
icon-flag:before{content:""}
(This character: https://utf8-icons.com/utf-8-character-59742)

However, randomly the browser will not accept the encoding and will display the strange characters. I've added @charset "utf-8" to my SCSS, and to my index.html. It seems to happen maybe 1 in 50 reloads. The escaping function mentioned above seems to fix it but it seems hacky.

@nex3
Copy link
Contributor

nex3 commented Aug 15, 2022

@jerryephicacy

  1. When running the application in dev mode using webpack, we have charset utf8 present at the top of the compiled application.css file. But, it is removed in prod mode and the charset utf8 is not there.

Emphasis mine. Something in your stack is removing the @charset declaration, which seems like the actual bug here. You can't just delete parts of a file and expect it to work the same way.

@kdagnan I'll ask you the same thing I've asked everyone else in this thread: provide a working reproduction of this bug, including the specific browser version in which you're seeing the error.

@Yegorich555
Copy link

For webpack it can be fixed with css-unicode-loader

@jerryephicacy
Copy link

@Yegorich555 , thanks for the suggestion.
Are you sure that we can go with a 2 year old package? Do you have any information on the maintenance and any top projects are using this?

If not, any alternative?

@Yegorich555
Copy link

@jerryephicacy yes, I'm.
It works fine with webpack 4 and webpack 5. I've been using it for both of my production projects. Despite looks like loader isn't touched during 2 years it works without bugs in my case ;)

@jerryephicacy
Copy link

@Yegorich555 , thanks so much for the idea. It works well!

@nex3 , I have solved this with the css-unicode-loader package using webpack.

@robinp
Copy link

robinp commented Sep 16, 2022

Just to increase crosslink density... I suspect based on searching this has to do with Chrome(Chromium) caching and not having an explicit charset or BOM. Do anyone see this issue on other browsers?

References:

In all cases, the suggestion seems to be to add the explicit encoding back, either with charset directive, response header or BOM.

@nsunga
Copy link

nsunga commented May 11, 2023

This is something I could see adding as a command-line flag (--ascii-only or something like that) to serialize Unicode characters as ASCII escapes.

Hello @nex3 !

sorry for bringing up such an old thread.

i just wanted to confirm: the --ascii-only flag isnt supported yet right?

and if not, dart-sass has no intention in doing this?

@nex3
Copy link
Contributor

nex3 commented May 16, 2023

This issue is open, which indicates that it is not supported but we would like to do it.

@nlozovan
Copy link

I can confirm that this is still happening, and icons will intermittently render as "" instead of the normal format.

  • in dev mode we have ASCII
  • in prod builds we have unicode
  • very rarely the render is broken, browser Chrome Version 116.0.5845.187 (Official Build) (arm64) but this is reproducible on other versions as well. It's rare but still a bug to be addressed.
  • just reproduced it, and it seems the TTF file loaded from the browser cache, so that might be related to Chromium as well.
    I still think we should have the chance to ignore ASCII conversion when needed, especially since in our case it's a CSS file generated in icomoon and you can't wrap declarations in SCSS files and such..

@nex3
Copy link
Contributor

nex3 commented Sep 21, 2023

@nlozovan Does the CSS file you're serving to Chrome retain the @charset rule and/or the UTF-8 byte-order-mark, or have those been stripped out by some other processing?

@nlozovan
Copy link

@nlozovan Does the CSS file you're serving to Chrome retain the @charset rule and/or the UTF-8 byte-order-mark, or have those been stripped out by some other processing?

Yes, it's stripped out for some reason. I ended up forcefully adding a charset declaration after the build. For now, I can't reproduce the error anymore, fonts are rendered correctly. I have a theory that, while Chrome is loading the CSS file from the cache, it will (sometimes) apply a wrong encoding. Yeah, it seems the planets aligned here in a wrong way somehow ;)

@nex3
Copy link
Contributor

nex3 commented Sep 21, 2023

If the @charset/BOM was stripped out, that's almost certainly the culprit. CSS requires those to correctly identify the encoding under all circumstances.

aarongable pushed a commit to chromium/chromium that referenced this issue Sep 22, 2023
This is to allow testing of the desired behavior, and it's a possible
cause of an issue being discussed in Sass:
sass/dart-sass#568

There are two call sites for CSSParserContext::Charset(), one for the
regular parsing path, and one for parsing registered property values.
Rather than touching both, just throw away the charset override when the
feature is enabled.

Bug: 1485525

Change-Id: I3da69b4156d6a84bcc8e0517c954a79b522a9ec9
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4846986
Commit-Queue: Philip Jägenstedt <[email protected]>
Reviewed-by: Rune Lillesveen <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1200109}
@foolip
Copy link

foolip commented Sep 26, 2023

I have landed a change in Chrome to make URL parsing in CSS spec compliant, by ignoring the encoding/charset of the stylesheet. This change is available in Chrome Canary 119.0.6025.0 and later and can be enabled by passing --enable-features=CSSParserIgnoreCharsetForURLs as a command line argument to Chrome.

@nlozovan would you be able to test with Chrome Canary with this command line argument to see if it has an effect on the problem you're experiencing?

@nlozovan
Copy link

@foolip here are some tests that I've made. Scenario: a CSS file will have @font-face declarations with some custom fonts, and the icon content is in Unicode.

  • Without a charset declaration:

    • Calling Chrome Canary as an app - can't reproduce the error anymore;
    • Using Terminal command /Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --enable-features=CSSParserIgnoreCharsetForURLs will reproduce the error;
    • Using Terminal command without the flag /Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary will reproduce the error;
  • With the charset declaration at the top:

    • all 3 scenarios work fine, I cannot reproduce the problem anymore;

I've used Canary Version 119.0.6030.0 (Official Build) canary (arm64). Also, I cannot explain why Chrome Canary which was opened as an app works differently than the terminal Chromium.
Hope this is helpful and yes, seems Canary has an update on this.

@foolip
Copy link

foolip commented Sep 26, 2023

@nlozovan thank you for testing! It sounds like you get the error when opening Chrome from the command line both with and without --enable-features=CSSParserIgnoreCharsetForURLs, right? That would suggest it has no bearing on the problem you're seeing, but then I don't understand why the problem doesn't reproduce when opening Chrome Canary as an app. Are you sure that Chrome Canary was fully closed between each test, so that it didn't open a new tab in an already open browser? I ask only because that's the only thing that comes to mind as an explanation for what you're seeing.

@nlozovan
Copy link

@foolip Yes, I was making sure the session is over, the cache is enabled in both cases. Tested now one more time and I have the same results. The terminal is not throwing any errors, it's opening a brand-new session. That is really interesting.
How I reproduce the error quite quickly is by opening and closing the Inspector Tools via the shortcut, multiple times, on page load. I do this 2-3 times and I can see the font icon error. Not on the Canary app though.

@foolip
Copy link

foolip commented Oct 10, 2023

@nlozovan thanks for double check that. I also don't understand why you'd see a difference between starting Chrome Canary by clicking an icon and from the command line.

To help me understand if the change I made behind a flag affects your case, can you share the relevant part of the stylesheet? I'm looking for non-ASCII characteres in URLs, which is what my change should affect.

@makbeta
Copy link

makbeta commented Aug 8, 2024

For webpack it can be fixed with css-unicode-loader

In case anyone else is running into this issue with Gulp, I've created a gulp plugin based on the webpack version. Hope that helps.

@ntkme
Copy link
Contributor

ntkme commented Aug 8, 2024

For webpack it can be fixed with css-unicode-loader

In case anyone else is running into this issue with Gulp, I've created a gulp plugin based on the webpack version. Hope that helps.

These implementations are limited to only patch css content rules, and both of them have css escape implemented incorrectly:

  • They will break when a character needs escape is immediately followed any of [0-9a-zA-Z \t]. An example would be if you have original css content: "ὠ1"; (U+1F60 followed by ascii "1"), this get encoded as content: "\1f601"; which becomes content: "😁"; (U+1F601). The correct escape would be content: "\1f60 1";.
  • They will also incorrectly encode surrogate pairs, e.g. any emoji characters. For example, content: "😁"; should be encoded as content: "\1f601";, but it got incorrectly encoded as content: "\d83d\de01";.

See: https://www.w3.org/International/questions/qa-escapes#cssescapes

In general, I recommend not to copy paste code without actually understanding what it is doing.

@makbeta
Copy link

makbeta commented Aug 8, 2024

For webpack it can be fixed with css-unicode-loader

In case anyone else is running into this issue with Gulp, I've created a gulp plugin based on the webpack version. Hope that helps.

These implementations are limited to only patch css content rules, and both of them have css escape implemented incorrectly. They will break when a character needs escape is immediately followed any of [0-9a-zA-Z \t]. An example would be if you have original css content: "ὠ1"; (U+1F60 followed by ascii "1"), this get encoded as content: "\1f601"; which becomes content: "😁"; (U+1F601). The correct escape would be content: "\1f60 1";.

See: https://www.w3.org/International/questions/qa-escapes#cssescapes

In general, I recommend not to copy paste code without actually understanding what it is doing.

@ntkme Thank you 🙏🏼 taking the time to point out the issue in the code and providing an example where it fails. I'll incorporate your feedback into the next version of my plugin.

@ntkme
Copy link
Contributor

ntkme commented Aug 8, 2024

If you don't really care about generating compressed output, you can probably just do:

css.replace(/[^\0-\x7f]/gu, (match) => `\\${match.codePointAt(0).toString(16)} `)
// This would always add the space character to escape sequence, even when it's optional.

If you really care about the output size and want to generate the smallest output possible:

css.replace(/[^\0-\x7f][0-9a-fA-F \t\r\n\f]?/gu, (match) => {
    const characters = [...match];
    const codePoint = characters[0].codePointAt(0);
    const escaped = `\\${codePoint.toString(16)}`;
    return characters.length > 1
        ? codePoint > 0xfffff
          ? `${escaped}${characters[1]}` // escaped character is 6 letters in hex, space is not required
          : `${escaped} ${characters[1]}` // otherwise add a space
        : escaped; // next letter is safe and will not be parsed as escape sequence
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests