Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String and character literals V2 #71

Open
LPeter1997 opened this issue Jul 20, 2022 · 4 comments
Open

String and character literals V2 #71

LPeter1997 opened this issue Jul 20, 2022 · 4 comments
Labels
Design document This one came out from an idea but considers many cases and tries to prove the usabity Syntax This issue is about syntax

Comments

@LPeter1997
Copy link
Member

LPeter1997 commented Jul 20, 2022

Introduction

This issue aims to completely redesign the string literals, inspired by Swift string literals. The reason is that it essentially does everything the new C# literals do, but it's a more cleaned-up and less complicated version of them. I'd like to take this opportunity to slightly change character literals a bit to free up the single-quote character.

Escape sequences

The escape sequences would stay and be identical to what's already been specified.

Single-line string literals

Single-line string literals would start and end with double quotes and they can not span multiple lines. Example:

val x = "Hello, World!";

They can also contain the usual escape sequences:

val x = "Hello,\nEarth! \u{1F47D}";

In this latter example, the value of x would be

Hello,
Earth! 👽

Multi-line string literals

Multi-line string literals would start and end with 3 double-quotes. The string would start in the next line after the opening quotes and end before the line of the closing quotes. Example:

val x = """
Hello, World!
""";

Note, that this string has no newlines in it. It is equivalent to the string "Hello, World!". If you want a leading or trailing newline, you can do:

val x = """

Hello, World!

""";

The placement of the ending quotes determine the amount of whitespaces cut off from each line. Example:

val x = """
    Lorem ipsum dolor sit amet,
    consectetur adipiscing elit,
    sed do eiusmod tempor incididunt
    ut labore et dolore magna aliqua.
""";

Here, nothing is cut off, the string is exactly

    Lorem ipsum dolor sit amet,
    consectetur adipiscing elit,
    sed do eiusmod tempor incididunt
    ut labore et dolore magna aliqua.

But if we indent the ending quotes, we can cut off the leading whitespace:

val x = """
    Lorem ipsum dolor sit amet,
    consectetur adipiscing elit,
    sed do eiusmod tempor incididunt
    ut labore et dolore magna aliqua.
    """;

Now the string is

Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.

Breaking long lines

Breaking long lines in multiline strings can be done using a \ at the very end of lines. Example:

val x = """
Hello, \
World!
""";

Which equals to "Hello, World!". This looks similar to C-style line continuations, but this is only valid in multiline-string literals.

Note, that # changes the sequence here too (later section specifies what these are):

#"""
Hello, \
World!
"""#

Is literally

Hello, \
World!

To have Hello, World!, you'd write

#"""
Hello, \#
World!
"""#

Interpolation

Interpolation introduces a new escape sequence, namely \(, which starts the interpolation expression until the matching ). For example:

`"1 + 2 = \(1 + 2)"`

Which would result in the string 1 + 2 = 3.

Alternatively, we could use \{ ... } or any other pairwise character.

Extended string delimeter

The escape-sequences and starting and ending sequences of string literals of both single- and multi-line strings can be changed, to make pasting literal strings easier. This is done by appending the same amount of # characters before the starting quotes and after the ending quotes.

For example, if we want to paste the literal string 1 + 2 = \n \(1 + 2), we could write it as: #"1 + 2 = \n \(1 + 2)"#.

Escape sequences can still be used, using the specified amount of # characters for the string. For example, ###"Hello,\###nWorld!"### becomes:

Hello,
World!

This works for both single-line, and multi-line strings.

We could change the way escape sequences would be specified or simply changle the # character. The simplicity of this method seems quite elegant.

The simplest way we could summarize the behavior, is that the number of #s modify the escape sequence:

  • no # -> \ is the escape sequence
  • # -> \# is the escape sequence
  • ## -> \## is the escape sequence
  • ### -> \### is the escape sequence
  • ...

Another example:

#"""
a = 5 + '\r'
idontrememberpython = """
  heheh e\n \n \r \u093
"""
\#u{1F47D}
"""#

which becomes

a = 5 + '\r'
idontrememberpython = """
  heheh e\n \n \r \u093
"""
👽

Character literals

The single-quote character could be very valuable to us in other ways. Since character literals are not that significant, I'd like to suggest merging them in with string literals.

Since there have been discussion about prefixing the literal with the encoding used - u8 "Hello", or u16 "Bye" for example -, we could do the same to turn a string-literal into a character literal using the char prefix, as long as it actually represents a single character. For example, char "a" would be the character literal a. String interpolation would not be allowed, as that would require runtime checks.

@LPeter1997 LPeter1997 added Design document This one came out from an idea but considers many cases and tries to prove the usabity Syntax This issue is about syntax labels Jul 20, 2022
@jl0pd
Copy link

jl0pd commented Jul 20, 2022

This issue doesn't mention alignment and formatting.

Using # may lead to complexities in repl scenarios (ambiguity with preprocessor directive)

@LPeter1997
Copy link
Member Author

This issue doesn't mention alignment and formatting.

Indeed, because we haven't proposed anything for it. Alignment and formatting is a largely inelegant part of .NET (at least as in C#) IMO and I find it a tough problem to tackle.

Using # may lead to complexities in repl scenarios (ambiguity with preprocessor directive)

This assumes the presence of a preprocessor which we might simply not need 😄 . But fair, we could think of some other character that's easier to type.

@WalkerCodeRanger
Copy link

When defining multiline string literals, I recommend specifying a fixed and predictable newline handling. I have experienced a bug before where the behavior was different on different machines, and it was tracked down to newlines in multiline string literals. You can easily configure git to change line endings locally. (Indeed, it is sometimes recommended for cross-platform development). As a result, the newlines in the source code were different on different developer machines and the build server. In C# whatever newlines are in the source code is what goes into the string.

Instead, I think multiline string literals should always generate \n for newlines OR always emit Environment.NewLine. Whatever the behavior, it should not be based on the newlines in the source code.

@333fred
Copy link
Contributor

333fred commented Oct 4, 2024

On the contrary, I highly advise you do not do this. If you must alter the code the user has written in some fashion inside a literal, then make it obvious; put a marker or something in the code to make it configurable. Concerns of file contents that change per-system, like line endings, should be left to the tools that manage the source code, like git itself (for example, by using a .gitattributes file to specify the line endings of your Draco files). No matter what you do here, you're going to hurt someone:

  • If you standardize on LF everywhere, then users who are on Windows and are expecting what they wrote, CRLF (whether they set that in their .gitattributes or not), will be harmed.
  • If you standardize on Environment.NewLine everywhere, then users who are on Windows and did use a .gitattributes to standardize on LF will be harmed.
  • If you standardize on "the user gets what they wrote", then the user is harmed when they're working between both Windows and Linux/Unix, and didn't use a .gitattributes to standardize their files.

That last group is the only one of the 3 that can take an action to have consistent behavior everywhere without falling out of constant value territory; the first two groups have to do some kind of transformation (either from CRLF to LF, or vice versa). Source code is the ultimate source of truth here, and doing anything else is just going to result in pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design document This one came out from an idea but considers many cases and tries to prove the usabity Syntax This issue is about syntax
Projects
None yet
Development

No branches or pull requests

4 participants