Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix COPY REPLACING and REPLACE #75

Conversation

lefessan
Copy link
Member

Current implementation does not conform to standard (COPY REPLACING and REPLACE are supposed to be in two successive passes instead of applied together), and has various bugs.

This version is probably less efficient, but better conforms to the standard.

Limitations:

  • This first version only modify the replacements during preprocessing, not the ones during the listing printing.

  • Since REPLACE are interpreted before the COPY REPLACING, they cannot be modified by them. A conforming implementation would interpret REPLACE strictly after COPY REPLACING.

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from f954957 to fb3f529 Compare January 16, 2023 22:29
@GitMensch
Copy link
Collaborator

Note: I guess WPI is "WIP", depending on if you see it ready for GC yourself you may request a review; if you see it as a draft in general, please set it as draft (on the right side on the Conversation page of the PR).

In any case: please keep the ci definitions as extra PR :-)

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from 9e04313 to 93d4799 Compare January 16, 2023 23:01
@lefessan lefessan changed the title [WPI] Fix COPY REPLACING and REPLACE Fix COPY REPLACING and REPLACE Jan 16, 2023
@lefessan lefessan changed the title Fix COPY REPLACING and REPLACE [WIP] Fix COPY REPLACING and REPLACE Jan 16, 2023
@lefessan
Copy link
Member Author

Yes, it was WIP. I don't know why there is no "draft" button on my interface.

I am playing with the CI at this moment, but yet, I will definitively put the git related stuff in another PR.

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from cff40b8 to a53bdd0 Compare January 16, 2023 23:34
@lefessan
Copy link
Member Author

I used the CI to check that everything was running correctly. It passes all the tests and I removed the CI commit.

@lefessan lefessan changed the title [WIP] Fix COPY REPLACING and REPLACE Fix COPY REPLACING and REPLACE Jan 16, 2023
@lefessan lefessan changed the base branch from gnucobol-3.x to gcos4gnucobol-3.x January 23, 2023 23:07
@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from a53bdd0 to 169080d Compare January 23, 2023 23:07
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2023

Codecov Report

Merging #75 (8bdfd9e) into gcos4gnucobol-3.x (a4abf11) will increase coverage by 0.17%.
The diff coverage is 94.44%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x      #75      +/-   ##
=====================================================
+ Coverage              65.17%   65.35%   +0.17%     
=====================================================
  Files                     31       32       +1     
  Lines                  58390    58499     +109     
  Branches               15389    15394       +5     
=====================================================
+ Hits                   38058    38233     +175     
+ Misses                 14393    14327      -66     
  Partials                5939     5939              
Impacted Files Coverage Δ
cobc/pplex.l 73.29% <85.18%> (+0.72%) ⬆️
cobc/replace.c 95.26% <95.26%> (ø)
cobc/cobc.c 72.49% <100.00%> (+1.06%) ⬆️
cobc/ppparse.y 61.33% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GitMensch
Copy link
Collaborator

Just a short update: the testcase is fine (I'll add that as expected failure with the next commit) and passes with other compilers.

It would likely be most useful to have this also fixed in the listing code. Maybe that's a similar fix or the new replace.c could be used from both places?
To reach a "conforming" implementation: Would you consider adjusting pplex? Would this mean to iterate twice over the file, with skipping everything but COPY REPLACING to the second iteration?

Concerning the draft question: You should be able to setup it via the UI in the following place:
gh-ui-draft

@GitMensch
Copy link
Collaborator

When checking the root issue it presented itself as follows:

  • ppecho_replace is called when any "replace" is active during ppecho
  • this matches as necessary
  • in the case of the given example, where the REPLACE code can only be applied after the COPY REPLACING code was handled it does not work because:
    • main issue: a match is directly output by ppecho_direct; instead it has to "wait" until there is no outer (r->next) entry
    • even if it would the outer replace would possibly - as in this case - see only a part of the word COLON (because the original :TEST: leads to an extra token), while it needs the resulting token
    • as the content can be really long and can't be directly replaced anywhere this may in general only be solved by giving the replaced content back to the lexer - to do so a loop of unput, then leaving the functions (and the outer replace, so that it may be triggered in the next run) is to be done; but to catch the sample case here the unput would have to be done until last space (separators , and ; are already unput as space here) / line break

@lefessan
Copy link
Member Author

lefessan commented Jan 25, 2023

I tried to understand how to use this work with the listings, but my main problem is that I don't understand what is expected as the result of listing. Indeed, COPY-REPLACING and REPLACE can completely change the source, including removing complete parts of the code, so the current line by line approach, or even displaying the content of a copybook after replacing, do not make sense.

If the goal is to have an equivalent source to the original one, but with COPY/REPLACE applied, and then well-formatted, then I would push to use the generated <file.i> and format it back in a fixed format, while re-including comments from the original file when possible.

If the goal is different, then I don't really understand what it can be, and so how to reach it.

(note that I am not familiar with IBM/MF tooling, so maybe the goal is obvious for people used to them if they have a listing mode...)

@GitMensch
Copy link
Collaborator

The goal is to have a complete code as it is processed, in the "original style", with COPY and REPLACE applied and all comments, intermixed with diagnostics, applied "print formatting" (pages + headers) and possible suppression (+extra stuff like symbol listing, xref, diagnostic summary, ...).

The goal is to drop nearly all of the listing code in cobc.c, moving some parts (possibly to a new file, possibly to the existing ones) and to do everything during the "pre-processing" itself. Note that we do want to have that available there later also for display errors in their context (instead of "position only, add the context in the text").

... there's just not someone that took on the work to do so...

@GitMensch
Copy link
Collaborator

That PR is not in draft state, so (while I don't expect this to be available upstream soon) would you mind rebasing it to the current version?

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from 17e40ca to 01db7f9 Compare April 12, 2023 09:33
@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from 01db7f9 to 145c685 Compare May 26, 2023 16:26
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently I'm finding it hard to follow the changes

Could you please split it into two commits?

First: moves the relevant code from pplex.l to replace.c without any logic change (so the code will also be changed only minimal) and include a new test to listings.at which has a result that will be changed by the second one;
Then have a second commit which does the "actual" change in the already existing files and listing test from the first commit.

cobc/cobc.h Outdated
@@ -506,6 +508,9 @@ extern void cobc_parse_free (void *);

extern void *cobc_plex_malloc (const size_t);
extern void *cobc_plex_strdup (const char *);
extern void *cobc_plex_strsub (const char *, const int);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`Those and the on above should go to tree.h

cobc/cobc.c Outdated
@@ -985,8 +985,8 @@ cobc_strdup (const char *dupstr)
return p;
}

#if defined (_WIN32) || defined (__CYGWIN__)
static char *
#if defined (_WIN32) || defined (__CYGWIN__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is currently just defined for WIN32 so should not be unconditionally declared in the header (not sure if we need that anywhere outside of this file...)

memcpy (p, s, len);
return p;
}

/* variant of strcpy which copies max 'max_size' bytes from 'src' to 'dest',
if the size of 'src' is too long only its last/last bytes are copied and an
eliding "..." is placed in front or at end depending on 'elide_at_end' */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move out all those memory-only related functions to a separate file?

cobc/pplex.l Outdated
@@ -106,6 +107,9 @@ static int ppwrap (void) {

#define PLEX_COND_DEPTH 16

// replace yytext by some simplified text in ppecho()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change that to c89 comment and explain why we do that - it replaces all texts below ?!?

cobc/pplex.l Outdated
return (save_ptr ? 1 : 0);
/* This new versoin does not use `alt_space`, so the output
will be different from the previous one. */
ppecho_copy( text );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has unused variable warning because now only text is used?

@GitMensch
Copy link
Collaborator

As noted above: whenever this is tackled it should be split into two commits (those may both be in this single PR).

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from 145c685 to 15b943a Compare June 26, 2023 15:34
@lefessan
Copy link
Member Author

I modified this PR to have 3 commits:

  • First one provides a replacement for the alt_space argument, that is cleaner (former version would only use the alt_space if there was no possible replacement, whereas in this version, it is passed around, making the code more uniform)
  • Second commit moves the code to replace.h/replace.c
  • Third commit implements a completely different algorithm for copy-replacing/replace using two phases instead of one (and using the same code for both phases)

cobc/pplex.l Outdated Show resolved Hide resolved
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Changelog entries for the last two commit are missing. I guess this will also contain a reference to at least one upstream bug issue (Bug 868).

There were some minor change requests, but that's "far enough" to be finished soon and then integrated into RC3.

cobc/pplex.l Outdated Show resolved Hide resolved
cobc/pplex.l Outdated
fwrite (text, (size_t)textlen, (size_t)1, ppout);
if (cb_listing_file) {
check_listing (s, 0);
ppecho_direct(text_queue->text, text_queue->token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacing that looks correct; but shockingly we seem to have no single testcase for this - can you please duplicate the test below in syn_copy.at for "preprocess listing" (different expected result), so that we see it is not changed unexpected?

cobc/replace.c Outdated Show resolved Hide resolved
cobc/replace.c Outdated Show resolved Hide resolved
cobc/replace.c Outdated Show resolved Hide resolved
cobc/cobc.c Outdated Show resolved Hide resolved
cobc/cobc.c Show resolved Hide resolved
tests/testsuite.src/syn_copy.at Show resolved Hide resolved
tests/testsuite.src/syn_copy.at Show resolved Hide resolved
tests/testsuite.src/syn_copy.at Show resolved Hide resolved
@GitMensch
Copy link
Collaborator

Just to check: do you have an ETA?

@GitMensch GitMensch added bug Something isn't working High priority labels Jul 1, 2023
@GitMensch
Copy link
Collaborator

GitMensch commented Jul 3, 2023

Also, there is a variable requires_new_line that is never set in any part of the code.

This came in with the OC2.0 "commit" that added Roger's change (rev. 68); this already has this never set, so please drop all its uses (potentially with the listing commit, but that is fine with any commit)

Don't mind, I'll do that just now along with fixing the code that code was broken by @nberth's change in r4644 for "pplex.l (check_listing): do not output sequence number of short lines"

Done in r5101.

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch 2 times, most recently from a2ec0fb to 9b1ab30 Compare July 5, 2023 17:20
@lefessan
Copy link
Member Author

lefessan commented Jul 5, 2023

Is there anything more to do on this PR ? The -P option seems to be working correctly now.

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor things to consider, but overall LGTM

cobc/replace.h Outdated Show resolved Hide resolved
cobc/ChangeLog Outdated Show resolved Hide resolved
cobc/replace.c Outdated Show resolved Hide resolved
cobc/replace.c Outdated Show resolved Hide resolved
cobc/replace.c Outdated Show resolved Hide resolved
tests/testsuite.src/syn_copy.at Outdated Show resolved Hide resolved
cobc/ChangeLog Outdated
algorithm. The first phase performs COPY REPLACING on the stream
of tokens, while the second phase perform REPLACE on the resulting
stream of tokens. This rewriting is closer to the COBOL standard
and fixes bug #831 partially.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why only partially? []

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#831 actually describes several bugs. One is in listing mode (and is addressed in PR #92 ) and another is in source mode (the one addressed in this PR). I was more concerned by the second part, as it prevents correct code from being compiled, while the first part is only a problem when using the listing mode (we advised our devs to look at the .i files instead of the listing files to exactly know the result of preprocessing).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would #92 solve the listing part already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found the time to check yet...

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from fbcbf05 to 7b69eaf Compare July 5, 2023 22:22
cobc/ChangeLog Outdated Show resolved Hide resolved
@GitMensch
Copy link
Collaborator

Apart from the missing Changelog entry for -P (this may even be NEWS worthy) and a NEWS entry about "cobc now uses a two-pass algorithm..." that is ready for SVN.

@lefessan lefessan force-pushed the z-2023-01-15-fix-copy-replace branch from 7b69eaf to f1d0612 Compare July 6, 2023 07:40
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go, listing may also be fixed after RC3

texts, src,
replace_list);
} else {
if (!strcasecmp(src_text,text)){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lefessan We do have the text len(yylen) in the scanner, wouldn't it make sense to pass this and include it in the struct?
Then we can compare the length itself and don't need to use strcasecmp for all content (which is done as soon as at least one REPLACE[ING] is active, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious if the speedup would be worth the work. It's likely that, for most words, strcasecmp(...) immediately returns after checking only one or two chars. So, on one hand, testing the size would save some tests when the sizes are different, but on the other hand, if the sizes are equal, it adds one test that was not done before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@lefessan
Copy link
Member Author

lefessan commented Jul 6, 2023

Merged, thanks !

@lefessan lefessan closed this Jul 6, 2023
@lefessan lefessan deleted the z-2023-01-15-fix-copy-replace branch July 6, 2023 15:39
@GitMensch
Copy link
Collaborator

Would you mind fixing the vs builds? The new file has to be added to all files under build_windows that reference help.c.

@GitMensch
Copy link
Collaborator

GitMensch commented Jul 6, 2023

Does this also fix https://sourceforge.net/p/gnucobol/bugs/23? If yes then it is likely good to check and possibly reference it.
That was a completely separate part, now fixed upstream.

Only open: MSVC (and OrangeC) project definitions for cobc (new replace.h).

@GitMensch
Copy link
Collaborator

Would you mind fixing the vs builds? The new file has to be added to all files under build_windows that reference help.c.

Done upstream.

@GitMensch
Copy link
Collaborator

GitMensch commented Jul 19, 2023

Primarily a note: it seems that this PR has increased memory usage of cobc during preparing a lot (testing with sources that have ~500000 LOC and no REPLACE or REPLACING at all).

The following is about cobc -E some.cob -e some.i:

Flamegraph for the overview
flamegraph

hotspots
hotspots

The main cpu instructions and memory are to be seen in replace.c (which is logically, we also moved old memory allocations there).

Biggest hotspot:
add_text_to_replace

Question @lefessan: Could you recheck for possible optimizations, especially reducing memory when no REPLACE is active?

Note: cpu-time wise that's no problem, with this big source there is a cpu time of 1.2 seconds for the preparsing, which is ok. But memory-wise - I'm not sure.

@GitMensch
Copy link
Collaborator

Friendly ping @lefessan on the memory usage/performance, especially when there's no REPLACE active in the whole file.

@GitMensch
Copy link
Collaborator

@lefessan you may have missed that: this PR has increased memory usage of cobc during preparing a lot; can you please reinspectif you find a better way to handle that?

@GitMensch
Copy link
Collaborator

GitMensch commented Apr 19, 2024

Here some numbers from a huge (generated) COBOL file, from valgrinds recorded heap usage

total heap usage allocs frees bytes allocated
GC 3.1.2 5,526 5,526 20,443,319
GC 3.2 (after this PR) 21,677,268 21,677,268 893,378,306

Note that also this file has no REPLACE or REPLACING at all.

Of course, this makes anything that inspects the memory (for example valgrind or cobc if built with hardening options) take much longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants