Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure the language for a Token/TokenList is always determined #5853

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

firewave
Copy link
Collaborator

@firewave firewave commented Jan 6, 2024

No description provided.

@firewave
Copy link
Collaborator Author

firewave commented Jan 6, 2024

Requires #5848 to be merged first and is a prerequisite for #5724 and #5725.

Several of the test Clang AST dumps have no filename specified which causes the language not be be determined. I wonder if we actually encounter such dumps in real world examples or if these are just in the tests. If the latter is the case we need some heuristic to be able to detect the language.

I could default to C++ for the time being though.

@firewave
Copy link
Collaborator Author

firewave commented Jan 6, 2024

The ambiguity of headers is also exposed by this causing TestVarID::varid_header to fail. This relates to the strange logic that was uncovered previously by #4681. So that also needs to be addressed before this can be merged.

@firewave firewave force-pushed the toklist-2 branch 3 times, most recently from 390bdfa to c743df5 Compare January 7, 2024 10:33
@firewave firewave changed the title make sure the language for a TokenList is always determined make sure the language for a Token/TokenList is always determined Jan 7, 2024
@firewave firewave force-pushed the toklist-2 branch 6 times, most recently from 3a34536 to 4e809ca Compare January 13, 2024 23:16
@firewave
Copy link
Collaborator Author

It turns out there is actually a valid state where we have no language - when we process markup files (i.e. .qml).

@firewave
Copy link
Collaborator Author

Requires #5896 to be merged first.

@firewave firewave force-pushed the toklist-2 branch 3 times, most recently from cc2d9c9 to 2835d3b Compare January 20, 2024 14:51
@firewave
Copy link
Collaborator Author

I am not too happy about the multiple asserts but it is possible to create a TokenList in various ways. I tried to narrow it with some refactoring but that introduced some bigger problems.

@firewave firewave marked this pull request as ready for review January 20, 2024 15:09
@danmar
Copy link
Owner

danmar commented Jan 28, 2024

Several of the test Clang AST dumps have no filename specified which causes the language not be be determined.

For your information we have in the Cppcheck Premium backlog a task to look at improving clang AST import.. it's way in the future we will at least not work on this for the next year..

I have the feeling it might be a good idea to fork clang and update the dump output so it's more parser-friendly. I believe we just need to tweak the dumping code which should be relatively limited in scope. I envision it will be a big job to maintain it.. but it would probably be worth it.

@firewave
Copy link
Collaborator Author

firewave commented Feb 8, 2024

#5961 is another step in improving things.

@firewave
Copy link
Collaborator Author

firewave commented Feb 8, 2024

And yet another step with #5963.

@firewave firewave force-pushed the toklist-2 branch 2 times, most recently from 5f9bf44 to 60c27fe Compare February 12, 2024 15:34
@firewave
Copy link
Collaborator Author

With the adjustments the asserts are now limited to TokenList. I put them behind a define so they can be easily disabled came release. I also added some fallbacks so we do not have an undetermined language.

I hope to make further improvements that make the asserts unnecessary.

@firewave firewave marked this pull request as ready for review February 12, 2024 15:36
@@ -317,8 +323,37 @@ void TokenList::insertTokens(Token *dest, const Token *src, nonneg int n)

bool TokenList::createTokens(std::istream &code, const std::string& file0)
{
#ifdef ASSERT_LANG
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry it's not so nice to have such ifdefs over and over. Is the idea that I will undefine ASSERT_LANG in release builds somehow? How about this code:

#ifdef N_ASSERT_LANG
#define ASSERT_LANG(cond)   throw InternalError(nullptr, __FILE__ ": assertion error: " #cond);
#else
#include <cassert>
#define ASSERT_LANG(cond)  assert(cond)
#endif

.. and I build with -DN_ASSERT_LANG in release builds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary and the intention is to simply disable the define in source. I can already provide a PR so it will not be missed. In case I am able to make the appropriate improvements the asserts will one by one go away. In case I am not they will not be enabled at all. Not pretty but it quite local and does not spill into other code.

I do not want an I do not want an InternalError for this as this is nothing the user should ever see. We will also not be able to generate the appropriate information the user needs to provide so we can figure out the location in case the source cannot be provided. Also this would require modifications in daca so I get the desired information. That would cause a ripple effect I want to avoid.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least I'd suggest that we avoid ifdefs over and over.. for instance:

#ifndef N_ASSERT_LANG
#include <cassert>
#define ASSERT_LANG(cond)  assert(cond)
#endif

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@firewave
Copy link
Collaborator Author

Any more feedback on this? Otherwise I am going to merge this tomorrow so I can hopefully finish up the task during this dev cycle.

@firewave
Copy link
Collaborator Author

Still needs a bit of work. Files without any extension have no language set. Those need to default to C++ by default as per the existing code.

But we treat headers as C by default - that is inconsistent. That should be aligned to be C by default. Extension-less files are most likely C++ anyways but those should be identified by the Emacs marker (see https://trac.cppcheck.net/ticket/10692). I will address this in a different PR.

@firewave
Copy link
Collaborator Author

Ugh. Turns out to get this done properly we need to move towards only using FileSettings.

@firewave firewave force-pushed the toklist-2 branch 3 times, most recently from 7533b82 to 92858b5 Compare February 20, 2024 21:49
@firewave
Copy link
Collaborator Author

I addressed this with another fallback for now.

@firewave firewave marked this pull request as ready for review February 20, 2024 21:50
Copy link
Owner

@danmar danmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@firewave firewave merged commit 589ceed into danmar:main Feb 27, 2024
64 checks passed
@firewave firewave deleted the toklist-2 branch February 27, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants