Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: frontend testing #7

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

docs: frontend testing #7

wants to merge 2 commits into from

Conversation

kabaros
Copy link
Collaborator

@kabaros kabaros commented Jul 9, 2024

- ? Always add a test for bug fixes

- ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docs in general are good and cover high level aspects of testing.
One aspect I think is still missing is guidelines on how to write tests, which includes how we want to use tests and how a test should look like, here's an example:

Suggested change
- ...
## How to write tests
### Tests as debugging tools
When refactoring or debugging, tests can help us identifying what broke (during refactoring) or analyse and debug the process during which a bug happens (which passed tests and made it into production).
A common pitfall is that we approach tests similarly to how we approach writing production code:
> ## Test code is not like other code
>
> Production applications typically involve thousands to millions of lines of code. They’re too large for humans to conceptualize all at once. To manage the complexity, language designers have provided mechanisms like functions and class hierarchies that allow developers to think in abstractions.
>
> Good production code achieves encapsulation. It allows the reader to navigate large systems with ease, diving down into the details or rising to a higher level of abstraction, as needed.
>
> Test code is a different beast. **A good unit test is often small enough that a developer can conceptualize all the logic at once. Adding layers of abstraction to test code increases its complexity. Tests are a diagnostic tool, so they should be as simple and obvious as possible.**
>
> ([Why Good Developers Write Bad Unit Tests - Test code is not like other code](https://mtlynch.io/good-developers-bad-tests/#test-code-is-not-like-other-code); text highlights made by us)
This means that good tests violate a lot of good practices we use and rely on for production code.
DRY test code makes it harder to simply read a test case from top to bottom and understand everything that's happening without having to jump around (either within the file or even across files):
> If a test breaks, the reader should be able to diagnose the problem by reading the test function in a straight line from top to bottom. If they have to jump out of the test to read ancillary code, the test has not done its job.
> [...]
> Before blindly applying DRY to your tests, think about what will make the problem obvious when a test fails. Refactoring may reduce duplication, but it also increases complexity and potentially obscures information when things break.
@TODO: We could quote more suggestions from the linked article ("Go crazy with long test names" & "Embrace magic numbers")

:::warning[Discussion]

- How does the ideal scenario look for you? some other possible prompts: TDD, cucumber, contract testing, end to end, pair programming...
Copy link

@Mohammer5 Mohammer5 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the term "integration test" is quite ambiguous and has a different scope from developer to developer.
Although I agree that integration tests (the way I understand it) should end up being the majority of tests in an hollistic test suite, stating that we should write mostly "integration-like" tests is not a guideline for what we actually want to test.

That's why I would refrain from using that term in the first place and change the perspective: Instead of testing abstractions, the code that uses abstractions should be tested.

The idea behind this approach is:

  • it's much less ambiguous which code to test
  • It allows us to change abstractions without having to change tests (the code that uses the abstractions should still work)
  • It allows us to change a single "callee" of an abstraction and adjust its test code without having to touch anything related to the abstraction. If we realize during this that the abstraction is actually wrong for this particular case, we have a lot of options:
    • We can change the abstraction and all "callees" will tell us whether we changes it correctly
    • We can inline the abstraction as we don't need an abstraction yet when the needed, new implementation of it is only relevant to a single "callee"
    • We can easily create a new abstraction for all "callees" that have used the wrong abstraction before if others still need the same abstraction without any changes. (Otherwise we'd have to add conditions to the abstraction, which obfuscates the original intention and therefore pollutes the implementation. We'd make the abstraction more complex as we'd mix together two different "ideas" of the abstraction in one place)

This presentation by Dan Abramov (creator of redux and facebook core developer for react) explains this in detail in an entertaining way.

Copy link

@Mohammer5 Mohammer5 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For e2e tests: The guideline I've used recently is: Cover critical user paths (from start to end) with details.
Here's an example of what I mean.


- **Brittle tests**: Tests with too many mocked dependencies that are hard to maintain and don't test integration between components.
- **Snapshots** (discussion point): Snapshots are particularly brittle and close to implementation details.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what I meant with my remark on snapshots. For example, snapshotting an html tree is easy to misuse. But inline snapshots have their uses. Like here.

So rather than proclaim snapshots an anti-pattern, maybe what would be more informative is to explain why and when using snapshots can be an anti-pattern. For example, a problem of snapshots can be when it's used to lock down an implementation (i.e. the specific shape of the rendered html), instead of the intended behaviour or result (i.e. it renders a clickable button that allows the user to close a modal). As long as it performs that function, the exact shape of the html does not matter much. So in that case a snapshot test can become annoying. In general testing implementation, or how code does something and not what it does, is the underlying problem.

Same for mocking. I don't think all mocking is bad. But, it can be a signal that your code is too tightly coupled. In that case, what might be preferable is passing the dependency explicitly. Decoupling the code, and making for easier testing in the process. That said, even with the scheduler's dockerized backend there's some mocking. The alternative is manually creating the required state on the backend, which is cumbersome and a lot more brittle. So here explaining the tradeoffs seems informative as well.

So in general, maybe we could dig down to the underlying ideas and explain those clearly. Where to make the tradeoffs, etc. I assume that we all have slightly different ideas there as well, and I think those would be worthwhile discussing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, besides just discussing our own ideas, there's of course plenty of reference material. I see we have some links, but maybe we can dig a little deeper there as well. Kent Dodds, Guillermo Rauch et. al. are quite well known, but to me it still seems a little vague and anecdotal. I would love to base our approach on material that's a little more solid. Otherwise we risk just picking articles that confirm our biases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus for the points that are not as objective but more subjective in nature (which is probably a lot of this), we should make sure that everyone is heard. Otherwise we risk basing the approach on whose voice is the loudest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to base our approach on material that's a little more solid. Otherwise we risk just picking articles that confirm our biases.

could you provide some of these articles please, I will add them to the resources

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything at the ready. It feels like a lot of what comes to my mind as well are those kinds of articles, and not a lot of substantial material. I'll take a proper look.

:::warning[Discussion]

- What should be a high coverage in our context?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having 0% coverage is a very obvious sign that we need more tests.

But having a close-to-100% coverage comes with its own downsides:

  • We'd be testing things that don't need tests
  • Changing code can become very cumbersome because that means a lot of tests have to be addressed

So I'm not sure what a good number would be. But if we want to do mostly integration tests - and in an ideal case we'd have lots of simple, composable units and just a few compositional units - code coverage will probably be around 30%-ish? So I have a hard time coming up with a good value.

Maybe having this metric doesn't mean anything for us at the moment as long as it's not 0%.


- What should be a high coverage in our context?
- What is unit vs integration tests in the frontend?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.
I think we shouldn't care about the definitions and rather talk about what checkboxes code must tick so we add tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that code examples on what we want to tests and how are can be a great way to start, rather than definitions. What do you think about postponing the definition here and add instead a link to some good tests example in the code bases, which we think could be used as a general basis? If we start with there, it would give the team a good place to start writing more of each of the tests. Then as we write more tests, and we improve and get a better understanding, we can of think about the definitions and aim for continuous improvements until we are happy with it

Copy link

@Mohammer5 Mohammer5 Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can do!

ASIDE: There's something I'd like to talk about before going through an example: IMO, it's impossible to have a rule that tells you exactly which files to test. In the end, it's the developer who has to judge whether a file / functionality falls under a certain category or not. Another reason why it's impossible is that sometimes it's hard to test something specific and easier to test its internally used components (I'll give an example for that). So in the end, it's up to the developer to decide what's worth testing and with which tools.

I think the aggregate data entry app's source code is a very good example for several reasons:

  • It's architecture is consistent
  • Code is divided into "abstractions" (the src/shared folder) and implementation (all the other folders)
  • This app serves as a good example of having too many Cypress tests instead of Jest tests
  • This app uses index.js-files consistently and for re-exporting only

When you look at the modules inside the src/shared folder, you'll see:

  • They cover a specific concept
  • They work in isolation (they're not written in a way to cater to a specific other module, folder, or concept)

Then there are other modules that use those abstractions to construct a very specific/concrete component:

  • They're not meant for re-use
  • They use the anstractions + module-internal code

Because this "app uses index.js-files consistently and for re-exporting only", we can take a look at what's being exported from those modules. I think the src/context-selection folder is a good place to start looking at:

  • This is an example of "sometimes it's hard to test something specific and easier to test its internally used components"
  • It's still relatively easy to understand what should be tested.

The src/context-selection/index.js has two exports: ContextSelection and ContextualHelpSidebar. Testing ContextualHelpSidebar is fairly trivial. Testing the ContextSelection is not easy, though, simply because it's covering so many different topics.

This is where the developer has to make a choice, and the part which can't be covered by rules / automation:

  1. The component can be tested with cypress (the choice I made; in hindsight, I think the 3rd option would've been better)
  2. Cover the ContextSelection with an integration test
  3. Cover the ContextSelection with a unit-tested (mocking internally used components and just testing the actual "implementation" code), and then add tests for the other components/hooks used internally

The reason why I think this shouldn't have been done with Cypress is that the tests I've added don't cover critical user paths, but instead test the component in detail.

So when we look at the implementation of the ContextSelection, we can see that it's composing different components. None of the components are abstractions / generalizations, they're concretions!
So in this case, it makes sense to add tests for the following components:

  • ContextualHelpSidebar (as it's exported by the index.js)
  • ContextSelection (with a unit test + mock what's necessary to test implementation details)
  • DataSetSelectorBarItem
  • OrgUnitSetSelectorBarItem
  • PeriodSelectorBarItem
  • AttributeOptionComboSelectorBarItem
  • SectionFilterSelectorBarItem

By having tests for these, we:

  • cover the behavior that we care about directly
  • cover the correctness of the shared generalizations indirectly
  • cover the module contracts (their interfaces/ boundaries), as that's essentially what the index.js are (the contract of how external modules use the module in question and what information is transferred during usage)

If the shared modules have bugs, it is irrelevant when it doesn't affect the actual business-logic implementations.
If there's a bug in shared that's relevant, we'll notice it in the tests for the components above.

- ? Rethink e2e tests and move to docker-based solution

- ? Always add a test for bug fixes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also good to consider why we're doing what we're currently doing. For example, the tasks we discussed that have a vague scope require a lot of work before it's clear what you should be doing. Similar with bug fixes. If the bug is vague then the core issue might be that our bug triage is not good. Maybe we don't make the process easy enough for bug submitters, etc. etc. So maybe the proper solution is not to always write a test for bug fixes, but to improve our triage. Which might then make it much easier to define the core issue and follow a TDD process.

So more broadly, might make sense to first consider if we see problems with the current state of affairs, and what those problems are exactly. Then to dig in why we have those problems and what the potential solutions might be. Otherwise we might just make assumptions about what it is that we should fix, when we don't have a clear picture yet. Then we might end up with normative solutions like "we should always add a test for this and that", when in fact this might just mask the underlying issue of vague bug reports.

So my point is to watch out that we don't jump to action before we've done the research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants