-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IoService #2362
IoService #2362
Conversation
@Tyriar Done with the basic layout of the IoService (only a few TODOs left in the service itself). Would be good to get a first conceptual review before integrating this further in the terminal itself. About encodings: About IO in general:
The next steps are to integrate this with |
Benchmark results with merged input types:
The combined write in IoService is even slightly faster, no clue why. I think we are safe to go with this simpler input API. |
This service is getting in shape, thus I removed the WIP flag. We need to discuss my API proposal here, as it would change some basic interactions with xterm.js (for the better ofc 😸). Note that my proposals in #2326 are somewhat deprecated now as this here is the result of these thoughts and actual implementation. Also this PR contains parts of #2295 (the callback on the write method) making #2295 somewhat obsolete. Still this PR does not contain any flow control mechanism itself as it would have to take the backend side into account (which is not possible from within xterm.js alone, we can only recommend some ways in the docs imho). The attach addon can be fixed by a later PR. Main topics that this PR touches and need to be discussed:
|
@Tyriar Basically done with this PR, up for review/discussion. |
*/ | ||
write(data: string): void; | ||
write(data: string | Uint8Array, callback?: () => void): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good, I'm really concerned about the API getting too complicated/intimidating with all this encoding stuff so anything we can do to simplify great. I expect the perf hit won't be so bad.
* string data will always be decoded as UTF-16. | ||
* `callback` is an optional callback that gets called once the data | ||
* chunk was processed by the parser. Use this to implement | ||
* a flow control mechanism so the terminal can keep up with incoming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably shouldn't mention flow control yet until we know what the evenutal solution will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok there's a lot going on in this PR and I think we need to split it up:
Things I'm unsure about:
- Anything that complicates the API (encodings, onData changes)
- trigger(String|Raw)DataEvent
- It's not clear to me why we need all these encoding and where they would be used
Things we should get into v4:
- The callback on write, I really want to start using this in both xterm.js tests and vscode tests. This could be a tiny PR merged very quickly.
Things we should do in v4.1:
- Introduce IOService without the external API changes first
* Output encoding as given in `setEncoding` is not applied. | ||
* Grab the data with the `onRawData` event. | ||
*/ | ||
triggerRawDataEvent(data: string): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't being used yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope not yet, the first use case for it would be the mouse report which just just got merged into master.
|
||
/** | ||
* Event listener for data from the terminal. | ||
* This event is a union of `onStringData` and `onRawData` as raw bytes with correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onRawData doesn't give raw bytes but onData does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the idea behind this is to give integrators a chance to grab either unencoded or encoded versions of the data, while onData is kinda a funnel of encoded bytes:
data to be sent from xterm.js to pty:
real string data byte data
(may contain unicode >255) (as bytestring 0..255)
+------------------+ +------------------+
| | unencoded | |
| onStringData | events | onRawData |
| | | |
+------------------+ +------------------+
| |
| |
current | encoding step | "binary"
(90% UTF8) | |
| |
| +------------------+ |
+-------> <-------+
| onData |
| |
+------------------+
|
v
data to pty
(as bytes)
Internally we only need to deal with triggerStringDataEvent
and triggerRawDataEvent
, the service hides the nasty encoding needs.
Outside (integrators) should mostly care for onData
which contains the bytestream for the pty. If an integrator wants to do some additional stuff on sent data, onData is unhandy in its byte form (lost the encoding and domain info whether it comes from genuine string data or raw data), thus they can hook in earlier into onStringData
and onRawData
which still shows that distinction.
onStringData
is basically what was onData
before. I had to rename this as the domain specific variants are just specializations with different encodings applied to the general onData
. Not renaming it would have caused even more confusion imho (Ive learnt that from writeUtf8
lol).
* See `Terminal.encodings` for installed encodings. Change | ||
* `ITerminalOptions.encoding` to set the active encoding. | ||
*/ | ||
addEncoding(encoding: IEncoding): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public API? How would VS Code use this and what benefits would doing so give?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to the public API for a yet to come encoding addon, that could contain most common legacy encodings (see questions above) and give ppl a way to implement their "yet another ancient encoding" as well.
The more fundamental question is whether xterm.js should care for legacy encodings at all, why not go with UTF8 everywhere. Well thats debatable. The question leads to "Who should ensure, that xterm.js encoding matches the pty encoding needs?" My initial stance here was - oh thats the integrators' responsibility. I think vscode bridges that already somehow, still I remember several issues regarding that (weird chars popping up and such). Most/Any of the other integrations dont care at all, I think neither hyper, terminus or fluentTerminal evaluate the pty encoding and translate between UTF8 and "the other encoding". Thats a problem, and all of them have unresolved issue reports regarding this.
My conclusion - lets do it as all other emulators do it - offer a setting for active encoding on API level. This lifts some burden from integrators (transcoding stuff forth and back), and those that didnt care so far kinda get it for free lol.
Note that those encoding problems mostly happen on windows these days, as most POSIX systems default to UTF-8 now. Windows is a mixed case here, seems some shells/apps spit out UTF16, others still some oldish cpXXX. This can be bridged with this PR pretty easily now: An integrator just needs to show some dropdown setting thingy with supported encodings. Whenever a user starts something with a foreign encoding he can just swap the active encoding. No more ssh/powershell/wsl encoding failures 😄
* @param data The data to write to the terminal. | ||
*/ | ||
writeUtf8(data: Uint8Array): void; | ||
writeln(data: string, callback?: () => void): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data: string | Uint8Array
?
If so we could then have the impl just be:
this.write(data);
this.write('\r\n');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it would be:
this.write(data);
this.write('\r\n', callback);
@@ -930,6 +971,51 @@ declare module 'xterm' { | |||
readonly width: number; | |||
} | |||
|
|||
/** | |||
* xterm.js encoding interface. | |||
* See common/input/Encodings.ts for implementation examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't referenced internal implementations like this before, the problem here is that this gets translated to the website API docs and it won't be clear where that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Is a pointer to some addon code safe here (yet to come)? If not lets just remove it (the API docs should contain enough hints to get this done).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say remove it for now
this.writeBufferUtf8.push(data); | ||
this._innerWriteUtf8(); | ||
writeSync(data: Uint8Array | string): void { | ||
(this as any)._ioService.write(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make _ioService
protected to avoid this cast
|
||
const isNode = (typeof navigator === 'undefined') ? true : false; | ||
export const isNode = (typeof navigator === 'undefined' && typeof process !== 'undefined') ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just navigator wasn't sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems trying to distinct nodejs env from some browser engine context is very hard. I searched for it quite some time getting very different "this works for me" results. The pitfalls always go along these lines - never trust a simple property check on the global object, as some engines allow complete overwrites.
Newer browser engines make attempts to overwrite globals a NOOP now (thus requesting the property still gives the real deal), but nodejs doesnt. Thus I added a sentinel that most nodejs envs hopefully would not change as many modules rely on it - process
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw thats how emscripten tries to detect things:
// *** Environment setup code ***
var ENVIRONMENT_IS_NODE = typeof process === 'object' && typeof require === 'function';
var ENVIRONMENT_IS_WEB = typeof window === 'object';
var ENVIRONMENT_IS_WORKER = typeof importScripts === 'function';
var ENVIRONMENT_IS_SHELL = !ENVIRONMENT_IS_WEB && !ENVIRONMENT_IS_NODE && !ENVIRONMENT_IS_WORKER;
I have added an ASCII chart above to illustrate the usage of the trigger methods and the events further. Practically we dont need other encodings in the codebase beside UTF-8, as >90% of all usecases are UTF-8 now. The other <10% are either legacy systems ppl might ssh into, or windows related. Those fail atm. Since those encodings are not that common anymore, they are candidates for an addon. To not make the API overly more complicated - maybe put the encoding stuff into some subsection?
Imho this separation does not work, as it does not address the problem with raw byte data vs. string data. To get support for this (the initial idea behind this PR), we definitely gonna need an API change. |
@Tyriar Now that v4 is out, we cannot apply the API changes anymore this PR would create. I still think that the encoding issues are a real problem, if you look through the integration projects like hyper, terminus and fluentTerminal - they all have open issues regarding this, most are unanswered though. Not sure why, not even sure if ppl know whats going on in this field (text encodings are always a nightmare to get done right). Examples of issues prolly related to wrong encodings:
Proposal to get this fixed (without breaking API change):
|
TL;DR of our chat about encoding portion: Instead of supporting all encoding in xterm.js, support utf-8 and provide docs on how to configure it properly ($LANG, luit, etc.) |
Gonna close this issue as it mostly obsolete now. The remaining parts (changes to write + onBinaryData event) will be covered by 2 following PRs. |
PR of an IoService, that deals with different encoding aspects and the input flowcontrol.
Idea: