IoService #2362

jerch · 2019-08-05T13:56:28Z

PR of an IoService, that deals with different encoding aspects and the input flowcontrol.

Idea:

all IO to the backend is byte based and can be directly read/written from/to the pty
setEncoding gets applied to input and output data and should reflect the OS pty encoding
encodings can be added as needed
a decoder decodes bytes to our internal UTF32, an encoder encodes string data to the foreign encoding as bytes
binary data from within the terminal gets directly sent, no output encoding will be applied

jerch · 2019-08-06T14:04:47Z

@Tyriar Done with the basic layout of the IoService (only a few TODOs left in the service itself). Would be good to get a first conceptual review before integrating this further in the terminal itself.

About encodings:
The encoding setting now applies to both input and output (imho separately setting this makes not much sense) and should reflect the pty app encoding. Missing encodings can be added with addEncoding (prolly usable by addons later on).

About IO in general:

input: Input buffering has moved into the service, imho its best to deal with it in one place. The write method now deals with string raw bytes at once. This needs to be benchmarked as it might decrease performance.
write now applies the encoding to raw bytes, strings and utf-8 data are handled as before, only difference here - we have no explicit writeUtf8 method anymore (as the name would be misleading).
output: Data output can be grabbed as before with an onData event, but now as raw bytes with encoding correctly applied. Additionally if one really wants to grab only string data sent from the terminal, there is now a onStringData. For raw data there is a onRawData counterpart. Usage would be straight forward - just apply onData directly on the pty:
```
ios.onData(data => pty.write(data));
```
Same goes for the attach addon, it only has to forward the bytes to the backend.

The next steps are to integrate this with Terminal.ts and do some benchmarks. I dont expect surprises here, still I am unsure about the string | Uint8Array merge for write.

jerch · 2019-08-06T21:56:33Z

Benchmark results with merged input types:

   Context "out-test/benchmark/test/benchmark/Terminal.benchmark.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "write"
            Case "#1" : 10 runs - average throughput: 28.63 MB/s
         Context "writeUtf8"
            Case "#1" : 10 runs - average throughput: 33.37 MB/s
         Context "writeNew string"
            Case "#1" : 10 runs - average throughput: 29.84 MB/s
         Context "writeNew Utf8"
            Case "#1" : 10 runs - average throughput: 34.39 MB/s

The combined write in IoService is even slightly faster, no clue why. I think we are safe to go with this simpler input API.

jerch · 2019-08-07T19:32:13Z

This service is getting in shape, thus I removed the WIP flag.

We need to discuss my API proposal here, as it would change some basic interactions with xterm.js (for the better ofc 😸). Note that my proposals in #2326 are somewhat deprecated now as this here is the result of these thoughts and actual implementation. Also this PR contains parts of #2295 (the callback on the write method) making #2295 somewhat obsolete. Still this PR does not contain any flow control mechanism itself as it would have to take the backend side into account (which is not possible from within xterm.js alone, we can only recommend some ways in the docs imho). The attach addon can be fixed by a later PR.

Main topics that this PR touches and need to be discussed:

encodings:
- Which should we support by default, which could go into an addon?
  I think we are save to ship with UTF-8 and ISO-8859-1 and move the others into an addon. The addon can be extended by more encodings if needed later on.
- Does the encoding API work for our needs?
  The asymmetric encoding/decoding (encoding: bytes --> UTF32, decoding: string --> bytes) is abit weird, but suits our internal needs best imho. We should not try to make this symmetrical, strings are way to handy in the construction so UTF32 would not make any sense here. The other way around is a nogo as it would counter most of the buffer rework.
breakage with previous API due to byte IO:
If integrators want to support receiving raw binary data from the terminal, it does not work without the switch from string to bytes. Otherwise the changes are small - write still accepts string type, only onData changed. If an integrator is not interested in the binary stuff he can simply switch from onData to onStringData, thus the binary way is still optional. I would not recommend that as it comes with possible encoding issues down to the pty (see the ASCII graphics in Allow arbitrary binary data for triggerDataEvent/onData #2326).
write callback as possible flow control hook:
Not sure if this needs further discussion, imho its the most elegant way we have found after several failures. Needs at least some docs.

jerch · 2019-08-08T16:07:36Z

@Tyriar Basically done with this PR, up for review/discussion.

Tyriar · 2019-09-06T17:45:34Z

typings/xterm.d.ts

     */
-    write(data: string): void;
+    write(data: string | Uint8Array, callback?: () => void): void;


I think this is good, I'm really concerned about the API getting too complicated/intimidating with all this encoding stuff so anything we can do to simplify great. I expect the perf hit won't be so bad.

Tyriar · 2019-09-06T17:46:02Z

typings/xterm.d.ts

+     * string data will always be decoded as UTF-16.
+     * `callback` is an optional callback that gets called once the data
+     * chunk was processed by the parser. Use this to implement
+     * a flow control mechanism so the terminal can keep up with incoming


We probably shouldn't mention flow control yet until we know what the evenutal solution will be.

Tyriar

Ok there's a lot going on in this PR and I think we need to split it up:

Things I'm unsure about:

Anything that complicates the API (encodings, onData changes)
trigger(String|Raw)DataEvent
It's not clear to me why we need all these encoding and where they would be used

Things we should get into v4:

The callback on write, I really want to start using this in both xterm.js tests and vscode tests. This could be a tiny PR merged very quickly.

Things we should do in v4.1:

Introduce IOService without the external API changes first

Tyriar · 2019-09-06T17:50:54Z

src/common/services/Services.ts

+   * Output encoding as given in `setEncoding` is not applied.
+   * Grab the data with the `onRawData` event.
+   */
+  triggerRawDataEvent(data: string): void;


This isn't being used yet?

Nope not yet, the first use case for it would be the mouse report which just just got merged into master.

Tyriar · 2019-09-06T17:52:48Z

typings/xterm.d.ts

+
+    /**
+     * Event listener for data from the terminal.
+     * This event is a union of `onStringData` and `onRawData` as raw bytes with correctly


onRawData doesn't give raw bytes but onData does?

Yes - the idea behind this is to give integrators a chance to grab either unencoded or encoded versions of the data, while onData is kinda a funnel of encoded bytes:

data to be sent from xterm.js to pty: real string data byte data (may contain unicode >255) (as bytestring 0..255) +------------------+ +------------------+ | | unencoded | | | onStringData | events | onRawData | | | | | +------------------+ +------------------+ | | | | current | encoding step | "binary" (90% UTF8) | | | | | +------------------+ | +-------> <-------+ | onData | | | +------------------+ | v data to pty (as bytes)

Internally we only need to deal with triggerStringDataEvent and triggerRawDataEvent, the service hides the nasty encoding needs.

Outside (integrators) should mostly care for onData which contains the bytestream for the pty. If an integrator wants to do some additional stuff on sent data, onData is unhandy in its byte form (lost the encoding and domain info whether it comes from genuine string data or raw data), thus they can hook in earlier into onStringData and onRawData which still shows that distinction.

onStringData is basically what was onData before. I had to rename this as the domain specific variants are just specializations with different encodings applied to the general onData. Not renaming it would have caused even more confusion imho (Ive learnt that from writeUtf8 lol).

Tyriar · 2019-09-06T17:53:24Z

typings/xterm.d.ts

+     * See `Terminal.encodings` for installed encodings. Change
+     * `ITerminalOptions.encoding` to set the active encoding.
+     */
+    addEncoding(encoding: IEncoding): void;


Does this need to be public API? How would VS Code use this and what benefits would doing so give?

I added this to the public API for a yet to come encoding addon, that could contain most common legacy encodings (see questions above) and give ppl a way to implement their "yet another ancient encoding" as well.

The more fundamental question is whether xterm.js should care for legacy encodings at all, why not go with UTF8 everywhere. Well thats debatable. The question leads to "Who should ensure, that xterm.js encoding matches the pty encoding needs?" My initial stance here was - oh thats the integrators' responsibility. I think vscode bridges that already somehow, still I remember several issues regarding that (weird chars popping up and such). Most/Any of the other integrations dont care at all, I think neither hyper, terminus or fluentTerminal evaluate the pty encoding and translate between UTF8 and "the other encoding". Thats a problem, and all of them have unresolved issue reports regarding this.
My conclusion - lets do it as all other emulators do it - offer a setting for active encoding on API level. This lifts some burden from integrators (transcoding stuff forth and back), and those that didnt care so far kinda get it for free lol.

Note that those encoding problems mostly happen on windows these days, as most POSIX systems default to UTF-8 now. Windows is a mixed case here, seems some shells/apps spit out UTF16, others still some oldish cpXXX. This can be bridged with this PR pretty easily now: An integrator just needs to show some dropdown setting thingy with supported encodings. Whenever a user starts something with a foreign encoding he can just swap the active encoding. No more ssh/powershell/wsl encoding failures 😄

Tyriar · 2019-09-06T17:54:31Z

typings/xterm.d.ts

-     * @param data The data to write to the terminal.
-     */
-    writeUtf8(data: Uint8Array): void;
+    writeln(data: string, callback?: () => void): void;


data: string | Uint8Array?

If so we could then have the impl just be:

this.write(data); this.write('\r\n');

Actually it would be:

this.write(data); this.write('\r\n', callback);

Tyriar · 2019-09-06T17:56:03Z

typings/xterm.d.ts

@@ -930,6 +971,51 @@ declare module 'xterm' {
    readonly width: number;
  }

+  /**
+   * xterm.js encoding interface.
+   * See common/input/Encodings.ts for implementation examples.


We haven't referenced internal implementations like this before, the problem here is that this gets translated to the website API docs and it won't be clear where that is.

Right. Is a pointer to some addon code safe here (yet to come)? If not lets just remove it (the API docs should contain enough hints to get this done).

I'd say remove it for now

Tyriar · 2019-09-06T17:57:51Z

test/benchmark/Terminal.benchmark.ts

-    this.writeBufferUtf8.push(data);
-    this._innerWriteUtf8();
+  writeSync(data: Uint8Array | string): void {
+    (this as any)._ioService.write(data);


You can make _ioService protected to avoid this cast

Tyriar · 2019-09-06T18:03:39Z

src/common/Platform.ts


-const isNode = (typeof navigator === 'undefined') ? true : false;
+export const isNode = (typeof navigator === 'undefined' && typeof process !== 'undefined') ? true : false;


Just navigator wasn't sufficient?

Seems trying to distinct nodejs env from some browser engine context is very hard. I searched for it quite some time getting very different "this works for me" results. The pitfalls always go along these lines - never trust a simple property check on the global object, as some engines allow complete overwrites.

Newer browser engines make attempts to overwrite globals a NOOP now (thus requesting the property still gives the real deal), but nodejs doesnt. Thus I added a sentinel that most nodejs envs hopefully would not change as many modules rely on it - process.

Btw thats how emscripten tries to detect things:

// *** Environment setup code *** var ENVIRONMENT_IS_NODE = typeof process === 'object' && typeof require === 'function'; var ENVIRONMENT_IS_WEB = typeof window === 'object'; var ENVIRONMENT_IS_WORKER = typeof importScripts === 'function'; var ENVIRONMENT_IS_SHELL = !ENVIRONMENT_IS_WEB && !ENVIRONMENT_IS_NODE && !ENVIRONMENT_IS_WORKER;

jerch · 2019-09-07T11:07:27Z

Ok there's a lot going on in this PR and I think we need to split it up:

Things I'm unsure about:
* Anything that complicates the API (encodings, onData changes)

* trigger(String|Raw)DataEvent

* It's not clear to me why we need all these encoding and where they would be used

I have added an ASCII chart above to illustrate the usage of the trigger methods and the events further.

Practically we dont need other encodings in the codebase beside UTF-8, as >90% of all usecases are UTF-8 now. The other <10% are either legacy systems ppl might ssh into, or windows related. Those fail atm. Since those encodings are not that common anymore, they are candidates for an addon.

To not make the API overly more complicated - maybe put the encoding stuff into some subsection?

Things we should get into v4:

* The callback on write, I really want to start using this in both xterm.js tests and vscode tests. This could be a tiny PR merged very quickly.

Things we should do in v4.1:

* Introduce IOService without the external API changes first

Imho this separation does not work, as it does not address the problem with raw byte data vs. string data. To get support for this (the initial idea behind this PR), we definitely gonna need an API change.

jerch · 2019-09-11T09:59:35Z

@Tyriar Now that v4 is out, we cannot apply the API changes anymore this PR would create. I still think that the encoding issues are a real problem, if you look through the integration projects like hyper, terminus and fluentTerminal - they all have open issues regarding this, most are unanswered though. Not sure why, not even sure if ppl know whats going on in this field (text encodings are always a nightmare to get done right).

Examples of issues prolly related to wrong encodings:

hyper: Printing ASCII characters causes hyper to display weirdly vercel/hyper#2033, UTF-8 encoding & zh-Hans bug? vercel/hyper#2356, Windows encoding issue vercel/hyper#1364, Encoding of æ,ø,å vercel/hyper#3541, ANSI characters not showing up correctly vercel/hyper#2946, The accents are badly encoded. vercel/hyper#1671 (some might be not xterm.js related)
terminus: Encoding changed? Eugeny/tabby#399, Chinese characters can not be displayed Eugeny/tabby#492, ssh could not display gkb encoding words Eugeny/tabby#1301
fluentTerminal: Customizing registry HKCU/Console/ScreenColors causes FluentTerminal display everything as dots on 1809. felixse/FluentTerminal#108, Rustup output encoding issue felixse/FluentTerminal#344, Encoding issue felixse/FluentTerminal#331

Proposal to get this fixed (without breaking API change):

rename onStringData in this PR back to onData as before
rename onData in this PR to onBytes to reflect the byte nature of the data
keep onRawData as bytestring event
move all encodings into an addon, only provide UTF8 by default
put encoding settings behind an API subsection
fix attach addon accordingly

Tyriar · 2019-09-11T18:37:12Z

TL;DR of our chat about encoding portion: Instead of supporting all encoding in xterm.js, support utf-8 and provide docs on how to configure it properly ($LANG, luit, etc.)

jerch · 2019-09-11T19:08:35Z

Gonna close this issue as it mostly obsolete now. The remaining parts (changes to write + onBinaryData event) will be covered by 2 following PRs.

early version of an IoService dealing with encodings

48a6313

jerch added the work-in-progress Do not merge label Aug 5, 2019

jerch added 2 commits August 5, 2019 16:04

make linter happy

e3423b4

integrate encodings with code base

66809ab

jerch added breaking-change Breaks API and requires a major version bump type/enhancement Features or improvements to existing features labels Aug 6, 2019

benchmark input chain

137c7d7

jerch added 5 commits August 7, 2019 01:20

integrate input part

308ac71

API integration

9ada2d3

fix integration tests

8fdafd6

expose encodings

16ebd17

use Object.create for encoding map

9bd2082

Tyriar assigned jerch Aug 7, 2019

jerch added 2 commits August 7, 2019 20:35

default encodings with tests

1b581c3

cleanup

6e84d12

jerch removed the work-in-progress Do not merge label Aug 7, 2019

jerch mentioned this pull request Aug 7, 2019

flow control and discard watermark #2295

Closed

jerch added 6 commits August 8, 2019 15:44

better ingetration with CoreService, IoService tests

0f12c75

Merge branch 'master' into io_service

5b4c6e6

remove obsolete parse methods from inputhandler

b7a4a2d

rename to .parse

25ae6bc

polish API docs

aa3d524

max size for input buffer

5886ab3

jerch mentioned this pull request Aug 8, 2019

Can xterm attach use socket. io? #1972

Closed

jerch added 2 commits August 9, 2019 18:25

handle needed globals

0135653

bind browser functions to global

d9bc6bc

This was referenced Aug 17, 2019

CoreMouseService #2323

Merged

UTF-8 felixse/FluentTerminal#430

Open

jerch added 2 commits August 23, 2019 21:51

Merge branch 'master' into io_service

6d2031c

fix paste test

d651858

jerch added this to the 4.0.0 milestone Aug 24, 2019

jerch added 3 commits September 5, 2019 21:43

Merge branch 'master' into io_service

90db4c7

change triggerDataEvent to triggerStringDataEvent

4eff29d

Merge branch 'master' into io_service

8a61dde

Tyriar requested changes Sep 6, 2019

View reviewed changes

Tyriar modified the milestones: 4.0.0, 4.1.0 Sep 10, 2019

Tyriar mentioned this pull request Sep 10, 2019

Unable to get value synchronously #2408

Closed

jerch closed this Sep 11, 2019

Tyriar mentioned this pull request Sep 11, 2019

Document recommendation on dealing with encoding issues #2420

Closed

Tyriar removed this from the 4.1.0 milestone Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IoService #2362

IoService #2362

jerch commented Aug 5, 2019 •

edited

Loading

jerch commented Aug 6, 2019 •

edited

Loading

jerch commented Aug 6, 2019

jerch commented Aug 7, 2019 •

edited

Loading

jerch commented Aug 8, 2019

Tyriar Sep 6, 2019

Tyriar Sep 6, 2019

Tyriar left a comment

Tyriar Sep 6, 2019

jerch Sep 7, 2019

Tyriar Sep 6, 2019

jerch Sep 7, 2019 •

edited

Loading

Tyriar Sep 6, 2019

jerch Sep 7, 2019

Tyriar Sep 6, 2019

jerch Sep 7, 2019

Tyriar Sep 10, 2019

Tyriar Sep 6, 2019

jerch Sep 7, 2019

Tyriar Sep 7, 2019

Tyriar Sep 6, 2019

Tyriar Sep 6, 2019

jerch Sep 7, 2019 •

edited

Loading

jerch Sep 11, 2019

jerch commented Sep 7, 2019 •

edited

Loading

jerch commented Sep 11, 2019 •

edited

Loading

Tyriar commented Sep 11, 2019

jerch commented Sep 11, 2019 •

edited

Loading


		const isNode = (typeof navigator === 'undefined') ? true : false;
		export const isNode = (typeof navigator === 'undefined' && typeof process !== 'undefined') ? true : false;

IoService #2362

IoService #2362

Conversation

jerch commented Aug 5, 2019 • edited Loading

jerch commented Aug 6, 2019 • edited Loading

jerch commented Aug 6, 2019

jerch commented Aug 7, 2019 • edited Loading

jerch commented Aug 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tyriar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch Sep 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch Sep 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch commented Sep 7, 2019 • edited Loading

jerch commented Sep 11, 2019 • edited Loading

Tyriar commented Sep 11, 2019

jerch commented Sep 11, 2019 • edited Loading

jerch commented Aug 5, 2019 •

edited

Loading

jerch commented Aug 6, 2019 •

edited

Loading

jerch commented Aug 7, 2019 •

edited

Loading

jerch Sep 7, 2019 •

edited

Loading

jerch Sep 7, 2019 •

edited

Loading

jerch commented Sep 7, 2019 •

edited

Loading

jerch commented Sep 11, 2019 •

edited

Loading

jerch commented Sep 11, 2019 •

edited

Loading