Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoService #2362

Closed
wants to merge 24 commits into from
Closed

IoService #2362

wants to merge 24 commits into from

Conversation

jerch
Copy link
Member

@jerch jerch commented Aug 5, 2019

PR of an IoService, that deals with different encoding aspects and the input flowcontrol.

Idea:

  • all IO to the backend is byte based and can be directly read/written from/to the pty
  • setEncoding gets applied to input and output data and should reflect the OS pty encoding
  • encodings can be added as needed
  • a decoder decodes bytes to our internal UTF32, an encoder encodes string data to the foreign encoding as bytes
  • binary data from within the terminal gets directly sent, no output encoding will be applied

@jerch jerch added the work-in-progress Do not merge label Aug 5, 2019
@jerch
Copy link
Member Author

jerch commented Aug 6, 2019

@Tyriar Done with the basic layout of the IoService (only a few TODOs left in the service itself). Would be good to get a first conceptual review before integrating this further in the terminal itself.

About encodings:
The encoding setting now applies to both input and output (imho separately setting this makes not much sense) and should reflect the pty app encoding. Missing encodings can be added with addEncoding (prolly usable by addons later on).

About IO in general:

  • input: Input buffering has moved into the service, imho its best to deal with it in one place. The write method now deals with string raw bytes at once. This needs to be benchmarked as it might decrease performance.
    write now applies the encoding to raw bytes, strings and utf-8 data are handled as before, only difference here - we have no explicit writeUtf8 method anymore (as the name would be misleading).
  • output: Data output can be grabbed as before with an onData event, but now as raw bytes with encoding correctly applied. Additionally if one really wants to grab only string data sent from the terminal, there is now a onStringData. For raw data there is a onRawData counterpart. Usage would be straight forward - just apply onData directly on the pty:
    ios.onData(data => pty.write(data));
    Same goes for the attach addon, it only has to forward the bytes to the backend.

The next steps are to integrate this with Terminal.ts and do some benchmarks. I dont expect surprises here, still I am unsure about the string | Uint8Array merge for write.

@jerch jerch added breaking-change Breaks API and requires a major version bump type/enhancement Features or improvements to existing features labels Aug 6, 2019
@jerch
Copy link
Member Author

jerch commented Aug 6, 2019

Benchmark results with merged input types:

   Context "out-test/benchmark/test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 10 runs - average throughput: 28.63 MB/s
         Context "writeUtf8"
            Case "#1" : 10 runs - average throughput: 33.37 MB/s
         Context "writeNew string"
            Case "#1" : 10 runs - average throughput: 29.84 MB/s
         Context "writeNew Utf8"
            Case "#1" : 10 runs - average throughput: 34.39 MB/s

The combined write in IoService is even slightly faster, no clue why. I think we are safe to go with this simpler input API.

@jerch jerch removed the work-in-progress Do not merge label Aug 7, 2019
@jerch
Copy link
Member Author

jerch commented Aug 7, 2019

This service is getting in shape, thus I removed the WIP flag.

We need to discuss my API proposal here, as it would change some basic interactions with xterm.js (for the better ofc 😸). Note that my proposals in #2326 are somewhat deprecated now as this here is the result of these thoughts and actual implementation. Also this PR contains parts of #2295 (the callback on the write method) making #2295 somewhat obsolete. Still this PR does not contain any flow control mechanism itself as it would have to take the backend side into account (which is not possible from within xterm.js alone, we can only recommend some ways in the docs imho). The attach addon can be fixed by a later PR.

Main topics that this PR touches and need to be discussed:

  • encodings:
    • Which should we support by default, which could go into an addon?
      I think we are save to ship with UTF-8 and ISO-8859-1 and move the others into an addon. The addon can be extended by more encodings if needed later on.
    • Does the encoding API work for our needs?
      The asymmetric encoding/decoding (encoding: bytes --> UTF32, decoding: string --> bytes) is abit weird, but suits our internal needs best imho. We should not try to make this symmetrical, strings are way to handy in the construction so UTF32 would not make any sense here. The other way around is a nogo as it would counter most of the buffer rework.
  • breakage with previous API due to byte IO:
    If integrators want to support receiving raw binary data from the terminal, it does not work without the switch from string to bytes. Otherwise the changes are small - write still accepts string type, only onData changed. If an integrator is not interested in the binary stuff he can simply switch from onData to onStringData, thus the binary way is still optional. I would not recommend that as it comes with possible encoding issues down to the pty (see the ASCII graphics in Allow arbitrary binary data for triggerDataEvent/onData #2326).
  • write callback as possible flow control hook:
    Not sure if this needs further discussion, imho its the most elegant way we have found after several failures. Needs at least some docs.

@jerch
Copy link
Member Author

jerch commented Aug 8, 2019

@Tyriar Basically done with this PR, up for review/discussion.

This was referenced Aug 17, 2019
@jerch jerch added this to the 4.0.0 milestone Aug 24, 2019
*/
write(data: string): void;
write(data: string | Uint8Array, callback?: () => void): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, I'm really concerned about the API getting too complicated/intimidating with all this encoding stuff so anything we can do to simplify great. I expect the perf hit won't be so bad.

* string data will always be decoded as UTF-16.
* `callback` is an optional callback that gets called once the data
* chunk was processed by the parser. Use this to implement
* a flow control mechanism so the terminal can keep up with incoming
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't mention flow control yet until we know what the evenutal solution will be.

Copy link
Member

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok there's a lot going on in this PR and I think we need to split it up:

Things I'm unsure about:

  • Anything that complicates the API (encodings, onData changes)
  • trigger(String|Raw)DataEvent
  • It's not clear to me why we need all these encoding and where they would be used

Things we should get into v4:

  • The callback on write, I really want to start using this in both xterm.js tests and vscode tests. This could be a tiny PR merged very quickly.

Things we should do in v4.1:

  • Introduce IOService without the external API changes first

* Output encoding as given in `setEncoding` is not applied.
* Grab the data with the `onRawData` event.
*/
triggerRawDataEvent(data: string): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't being used yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope not yet, the first use case for it would be the mouse report which just just got merged into master.


/**
* Event listener for data from the terminal.
* This event is a union of `onStringData` and `onRawData` as raw bytes with correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onRawData doesn't give raw bytes but onData does?

Copy link
Member Author

@jerch jerch Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the idea behind this is to give integrators a chance to grab either unencoded or encoded versions of the data, while onData is kinda a funnel of encoded bytes:

  data to be sent from xterm.js to pty:

   real string data                  byte data
   (may contain unicode >255)        (as bytestring 0..255)

  +------------------+              +------------------+
  |                  |  unencoded   |                  |
  |   onStringData   |    events    |     onRawData    |
  |                  |              |                  |
  +------------------+              +------------------+
           |                                  |
           |                                  |
current    |            encoding step         | "binary"
(90% UTF8) |                                  |
           |                                  |
           |       +------------------+       |
           +------->                  <-------+
                   |      onData      |
                   |                  |
                   +------------------+
                            |
                            v
                       data to pty
                       (as bytes)

Internally we only need to deal with triggerStringDataEvent and triggerRawDataEvent, the service hides the nasty encoding needs.

Outside (integrators) should mostly care for onData which contains the bytestream for the pty. If an integrator wants to do some additional stuff on sent data, onData is unhandy in its byte form (lost the encoding and domain info whether it comes from genuine string data or raw data), thus they can hook in earlier into onStringData and onRawData which still shows that distinction.

onStringData is basically what was onData before. I had to rename this as the domain specific variants are just specializations with different encodings applied to the general onData. Not renaming it would have caused even more confusion imho (Ive learnt that from writeUtf8 lol).

* See `Terminal.encodings` for installed encodings. Change
* `ITerminalOptions.encoding` to set the active encoding.
*/
addEncoding(encoding: IEncoding): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public API? How would VS Code use this and what benefits would doing so give?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to the public API for a yet to come encoding addon, that could contain most common legacy encodings (see questions above) and give ppl a way to implement their "yet another ancient encoding" as well.

The more fundamental question is whether xterm.js should care for legacy encodings at all, why not go with UTF8 everywhere. Well thats debatable. The question leads to "Who should ensure, that xterm.js encoding matches the pty encoding needs?" My initial stance here was - oh thats the integrators' responsibility. I think vscode bridges that already somehow, still I remember several issues regarding that (weird chars popping up and such). Most/Any of the other integrations dont care at all, I think neither hyper, terminus or fluentTerminal evaluate the pty encoding and translate between UTF8 and "the other encoding". Thats a problem, and all of them have unresolved issue reports regarding this.
My conclusion - lets do it as all other emulators do it - offer a setting for active encoding on API level. This lifts some burden from integrators (transcoding stuff forth and back), and those that didnt care so far kinda get it for free lol.

Note that those encoding problems mostly happen on windows these days, as most POSIX systems default to UTF-8 now. Windows is a mixed case here, seems some shells/apps spit out UTF16, others still some oldish cpXXX. This can be bridged with this PR pretty easily now: An integrator just needs to show some dropdown setting thingy with supported encodings. Whenever a user starts something with a foreign encoding he can just swap the active encoding. No more ssh/powershell/wsl encoding failures 😄

* @param data The data to write to the terminal.
*/
writeUtf8(data: Uint8Array): void;
writeln(data: string, callback?: () => void): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data: string | Uint8Array?

If so we could then have the impl just be:

this.write(data);
this.write('\r\n');

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be:

this.write(data);
this.write('\r\n', callback);

@@ -930,6 +971,51 @@ declare module 'xterm' {
readonly width: number;
}

/**
* xterm.js encoding interface.
* See common/input/Encodings.ts for implementation examples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't referenced internal implementations like this before, the problem here is that this gets translated to the website API docs and it won't be clear where that is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Is a pointer to some addon code safe here (yet to come)? If not lets just remove it (the API docs should contain enough hints to get this done).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say remove it for now

this.writeBufferUtf8.push(data);
this._innerWriteUtf8();
writeSync(data: Uint8Array | string): void {
(this as any)._ioService.write(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make _ioService protected to avoid this cast


const isNode = (typeof navigator === 'undefined') ? true : false;
export const isNode = (typeof navigator === 'undefined' && typeof process !== 'undefined') ? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just navigator wasn't sufficient?

Copy link
Member Author

@jerch jerch Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems trying to distinct nodejs env from some browser engine context is very hard. I searched for it quite some time getting very different "this works for me" results. The pitfalls always go along these lines - never trust a simple property check on the global object, as some engines allow complete overwrites.

Newer browser engines make attempts to overwrite globals a NOOP now (thus requesting the property still gives the real deal), but nodejs doesnt. Thus I added a sentinel that most nodejs envs hopefully would not change as many modules rely on it - process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw thats how emscripten tries to detect things:

// *** Environment setup code ***
var ENVIRONMENT_IS_NODE = typeof process === 'object' && typeof require === 'function';
var ENVIRONMENT_IS_WEB = typeof window === 'object';
var ENVIRONMENT_IS_WORKER = typeof importScripts === 'function';
var ENVIRONMENT_IS_SHELL = !ENVIRONMENT_IS_WEB && !ENVIRONMENT_IS_NODE && !ENVIRONMENT_IS_WORKER;

@jerch
Copy link
Member Author

jerch commented Sep 7, 2019

Ok there's a lot going on in this PR and I think we need to split it up:

Things I'm unsure about:

* Anything that complicates the API (encodings, onData changes)

* trigger(String|Raw)DataEvent

* It's not clear to me why we need all these encoding and where they would be used

I have added an ASCII chart above to illustrate the usage of the trigger methods and the events further.

Practically we dont need other encodings in the codebase beside UTF-8, as >90% of all usecases are UTF-8 now. The other <10% are either legacy systems ppl might ssh into, or windows related. Those fail atm. Since those encodings are not that common anymore, they are candidates for an addon.

To not make the API overly more complicated - maybe put the encoding stuff into some subsection?

Things we should get into v4:

* The callback on write, I really want to start using this in both xterm.js tests and vscode tests. This could be a tiny PR merged very quickly.

Things we should do in v4.1:

* Introduce IOService without the external API changes first

Imho this separation does not work, as it does not address the problem with raw byte data vs. string data. To get support for this (the initial idea behind this PR), we definitely gonna need an API change.

@Tyriar Tyriar modified the milestones: 4.0.0, 4.1.0 Sep 10, 2019
@jerch
Copy link
Member Author

jerch commented Sep 11, 2019

@Tyriar Now that v4 is out, we cannot apply the API changes anymore this PR would create. I still think that the encoding issues are a real problem, if you look through the integration projects like hyper, terminus and fluentTerminal - they all have open issues regarding this, most are unanswered though. Not sure why, not even sure if ppl know whats going on in this field (text encodings are always a nightmare to get done right).

Examples of issues prolly related to wrong encodings:

Proposal to get this fixed (without breaking API change):

  • rename onStringData in this PR back to onData as before
  • rename onData in this PR to onBytes to reflect the byte nature of the data
  • keep onRawData as bytestring event
  • move all encodings into an addon, only provide UTF8 by default
  • put encoding settings behind an API subsection
  • fix attach addon accordingly

@Tyriar
Copy link
Member

Tyriar commented Sep 11, 2019

TL;DR of our chat about encoding portion: Instead of supporting all encoding in xterm.js, support utf-8 and provide docs on how to configure it properly ($LANG, luit, etc.)

@jerch
Copy link
Member Author

jerch commented Sep 11, 2019

Gonna close this issue as it mostly obsolete now. The remaining parts (changes to write + onBinaryData event) will be covered by 2 following PRs.

@jerch jerch closed this Sep 11, 2019
@Tyriar Tyriar removed this from the 4.1.0 milestone Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Breaks API and requires a major version bump type/enhancement Features or improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants