-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rmk32 eol convention for input defaults to ANY, extend OPENSTREAM so that EOL can be specified as an "external format" #1785
base: master
Are you sure you want to change the base?
Conversation
As per technical meeting on 7/15/2024
… EOL" This reverts commit 6a7e8c3.
(* ; "Edited 6-Jul-2022 00:00 by rmk") | ||
(* ; "Edited 19-Dec-2021 09:30 by rmk") | ||
(* ; "Edited 14-Dec-2021 16:10 by rmk") | ||
(* ; "Edited 13-Dec-2021 15:20 by rmk") | ||
(* ; "Edited 29-Jun-2021 17:07 by rmk:") | ||
(* ; "Edited 5-Oct-92 13:45 by jds") | ||
|
||
(* ;; "RMK: July 2024: Default EOL to ANY on input streams, allow EXTERNAL FORMAT to be a (FORMAT EOL) list so CL:OPEN can get the EOL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the principle of "be liberal in what you accept, and conservative in what you generate"; would it make sense for the EXTERNALFORMAT
to be in proplist (:key val ...) format? That would allow the items to be in either order, and would establish the pattern for extending this in the future, if necessary.
Should that generalization be added to the implementation of CL:OPEN
before it calls to IL:OPENSTREAM
? There it could also ensure the EOL
symbols are in the IL:
package, and put the values in the correct order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COnsider the possibility that this change ahs too much flexibility, and the flexiblity means more error cases. What are the uses of EXTERNAL-FORMAT? When you are copying from one place to another, can you copy butes instead of characters (The ELEMENT-TYPE of Common Lisp streams can be BYTE or CHARACTER.
A simpler to implement and more backward comapatible would be to get rid of EOL as a separate parameter and "bake" it into the EXTERAN-FORMAT keyword:
We currently have :UTF-8 and :XCCS as the two frequent cases.
Declare that UTF-8 implies EOL=LF and add (i you need it) :UTF-8-CR or UTF-8-CRLF.
Declare that XCCS implies EOL=.CR on output and ANY on input.
Then you don't have to edit where any program assumes EQ can be used to answewr whether two streams have the same EXTERNAL-FORMAT which could happen anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to copy bytes, use COPYBYTES. If you want to copy characters, use COPYCHARS (which will convert the bytes from one format to another). Does commonlisp specify a function that branches on the element-type? It should choose which of the subfunctions to call.
Each external format already has its own default EOL convention. This extension is for the case where for whatever reason the user wants to override that. For OPENSTREAM the override can be passed as a separate parameter, but CL:OPEN doesn't allow for that kind of additional specification. This is all about sneaking that in without doing more serious damage.
This doesn't affect what is returned as the external-format STREAMPROP of the stream, it's always an EQ-able atom. It's just that if the EOL convention had been changed from its default, the property in the external format wouldn't be accurate.
In Interlisp the function STREAMPROP can be used to change the format and the eol separately, after the open. Does commonlisp support that kind of operation? (Another use case for STREAMPROP: the ENDOFSTREAMOP as a stream property rather than something that has be specified on each input operation. Does commonlisp support that?)
I probably don't yet have the correct logic for the EOL convention of external formats, as we transition to ANY as the default for input streams. At open the ANY should be installed for input streams even if the format specifies one of the specific conventions. The format's convention should apply by default only to output streams. If the user really wants a specific format on input, then an override should be applied (at open or by STREAMPROP).
(BTW, in the original, inherited implementation of external formats there was a flag EOLVALID. I don't understand the use case for that, and it isn't fetched anywhere in our core directories. But I left it in.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last week we decided to investigate soee different options for ANY -- find the first EOL and use that interpretation throughout.
Two things to think about: First EOL with input = ANY means you can do COPYBYTES.
Second, use EXTERNALFORMAT for EOL convention.
:FIRST-USE? Copychars vs copybytes. We're moving this to Draft.
Has any additional work been done on this? |
Nothing more has been done. I believe that the next step is to add another 2-bit field to the STREAM datatype (beyond the part that Maiko knows about) to hold the actual EOL convention that is detected when the file is read as ANY. This is so that COPYCHARS can preserve the original EOL convention of the characters, and even be consistent if the EOL convention changes across the file. |
BTW, there is a long related discussion at issue #345 |
and recompiled calls to macro \CHECKEOLC.
I added a 2-bit field DETECTEDEOLCONVENTION to the stream record, with initial value ANY.EOLC. If a file is read with ANY.EOLC, then that initial value is replaced with the code (LF.EOLC, CR.EOLC, CRLF.EOLC) that it first detects. Some other callers of the macro \CHECKEOLC were recompiled. FASLOAD should also be recompiled, but I got a number of error messages about illegal RETURNS when I tried to do that. I think all the examles were calls to CL:DOTIMES. Somebody else should look at that. I have not yet done the changes to COPYCHARS that I think is the ultimate end of this. I think the idea is that if the input's detected EOLC is the same as the EOL convention of the outputs external format, then do COPYBYTES? Anything else? But it would be good to get this merged in its current state, to avoid incremental confusion. |
(BTW on FASLOAD: Recompiling actually doesn't matter for this, but it should be recompilable) |
Oops, I broke something in the build, hold off until I get it resolved |
(plus a new (unchanged) version of IOCHAR needed to get the cleanup to work for the recompile)
UFS doesn't check file devices identity, doesn't give type-change message. Recompiled for create stream
OK, I unscrambled my loadup problem, now including more files that needed to be recompiled. When we figure out the logic, other files that have create-stream probably would also need to be recompiled to spread it around. I included ADIR here to get the TRUEDEVICE stub, since this FILEIO (with the new stream declaration) also includes the cleanup to \RENAMEFILE that hides the pseudo file and host names from the lower device rename methods. |
On further reflection, I think that what I did to enable on the fly detection is going off in the wrong direction. I think the logic of this is: I think that COPYCHARS vs COPYBYTES is a red herring. If the file is being copied (explicitly via COPYFILE or explicitly via RENAMEFILE), then it should be copied by COPYBYTES, always. The fix to RENAMEFILE is correct. It turns out that COPYFILE currently has the same default bug--it tries to infer whether to use COPYBYTES or COPYCHARS, and if they are not the same the destination file will not end up with the same bytes as the source. (COPYFILE currently has some IRM-undocumented arguments for source and destination parameters that try to manage this coercion; better would be to have a separate function COPYCHARFILE that makes clear that it might muck around.) So I propose to back out of the EOL detection stuff that I just put in to character reading and the exra STREAM field. We should write a separate function that the user can call to do eol or format detection, maybe by scanning far into the file to see what's really there. And I propose to eliminate the complexity of the \COPYOPENFILE subfunction and just have COPYFILE call COPYBYTES, always. |
We agreed at the meeting today to remove all of the type and format inferences from the basic COPYFILE and RENAMEFILE functions. They both will always do COPYBYTES. (This is a difference from the COPYFILE documentation, which says that it will try to infer the type and do COPYCHARS for text. But then it also says that "text" means 7-bit ascii, so this must have been a hold-over from the days before 16-bit character encodings). |
As per the technical meeting on 7/15/2024.
This sets the default EOL convention for input files to be ANY.
It also extends the possibilities for the externalformat parameter to OPENSTREAM. It can be a known format atom (e.g. :UTF-8) as before. But it can also be an EOL convention (CR, LF, CRLF, ANY) or a (format eolconvention) pair (e.g. (:XCCS LF)).
The motivation for this extension is to sneak in the EOL convention in the :EXTERNAL-FORMAT optional argument to CL:OPEN. The Commonlisp spec doesn't allow for arbitrary opening parameters to be specified, we trick it at least for the EOL convention by overloading the external format argument (essentially treating the EOL as a funky external format).