Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure that years are not lost when reading in worldsteel #431

Merged
merged 4 commits into from
Oct 18, 2023

Conversation

fbenke-pik
Copy link
Contributor

@fbenke-pik fbenke-pik commented Oct 18, 2023

Currently input data generation fails because years are lost when reading in worldsteel. Expects column names as numbers, but they are read in as X2005, X2010 ...

~ Run readSource(type = "worldsteel", convert = FALSE)
~~ WARNING: NAs introduced by coercion
~~ WARNING: NAs introduced by coercion
~~ WARNING: NAs introduced by coercion
~~ WARNING: NAs introduced by coercion
~~ WARNING: NAs introduced by coercion
~~ WARNING: NAs introduced by coercion
~~ WARNING: NAs introduced by coercion
~~ ERROR: 
~~ error in evaluating the argument 'x' in selecting a method for function 'as.magpie': error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': NAs are not allowed in subscripted assignments
Error in `h()`:
! error in evaluating the argument 'x' in selecting a method for function 'as.magpie': error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': NAs are not allowed in subscripted assignments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know since when this failed?

In the spirit of more functional programming, I would have gone with

pivot_longer(c(-'country', -'name'), names_to = 'year',
                         names_transform = list(year = function(x) {
                           as.integer(sub('^X', '', x)) }))

but you do you.

@@ -51,6 +51,8 @@ calcIEA_EVOutlook <- function() {
# set 0s in other CHA countries than China to approximate CHA as China
x[c("HKG", "MAC", "TWN"),,] <- 0

x <- add_dimension(x, dim = 3.1, add = "scenario", nm = "historical")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, sneaked in another minor fix.

@fbenke-pik
Copy link
Contributor Author

fbenke-pik commented Oct 18, 2023

I went with your solution, as it matches the overall coding style of this function and looks more elegant.

Do you know since when this failed?

I am not sure. My theory is that it was hidden behind caching of calcHistorical until yesterday, when I made changes in that calc function, causing recalculations that revealed this problem.

@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (b2aa660) 0.00% compared to head (971c1fa) 0.00%.

❗ Current head 971c1fa differs from pull request most recent head 2b4b472. Consider uploading reports for the commit 2b4b472 to get more accurate results

Additional details and impacted files
@@          Coverage Diff           @@
##           master    #431   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         265     265           
  Lines       17282   17285    +3     
======================================
  Hits            1       1           
- Misses      17281   17284    +3     
Files Coverage Δ
R/calcIEA_EVOutlook.R 0.00% <0.00%> (ø)
R/readworldsteel.R 0.00% <0.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fbenke-pik fbenke-pik merged commit ee639bb into pik-piam:master Oct 18, 2023
2 checks passed
@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Do you know since when this failed?

I am not sure. My theory is that it was hidden behind caching of calcHistorical until yesterday, when I made changes in that calc function, causing recalculations that revealed this problem.

OK, that makes sense. I was under the erroneous impression that madrat also tracked code changes in upstream functions.

@fbenke-pik
Copy link
Contributor Author

OK, that makes sense. I was under the erroneous impression that madrat also tracked code changes in upstream functions.

In theory, it should. So my theory is actually that it should have noticed changes when it did not. But in order to confirm that suspicion, I'd have to do some more experiments and dig deeper.

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

In theory, it should.

Is that a positive or a normative statement? :p

The code only checks for readSource/calcOutput and tool functions.

@fbenke-pik
Copy link
Contributor Author

What I have learnt from asking around over the years (but not validated by looking at the code): if you only changed the underlying file ./Steel_Statistical_Yearbook_combined.ods , but not readworldsteel itself, the caching would not pick it up. If you also changed readworldsteel code, it should be recalculated, even if the only function calling readworldsteel, calcHistorical, did not change.

But this does not explain why a change in calcHistorical caused recalculation of readworldsteel, revealing the bug.

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Are we talking about the same thing?

There was a change in the (upstream) readODS::read_ods() function, which madrat did not track (and does not claim to track). So it used the stale cache instead of running the code again, which would have resulted in an error. Steel_Statistical_Yearbook_combined.ods has not been changed since February 2022.

What I have learnt from asking around over the years (but not validated by looking at the code): if you only changed the underlying file ./Steel_Statistical_Yearbook_combined.ods , but not readworldsteel itself, the caching would not pick it up.

The madrat documentation says otherwise:

Function which creates a unique fingerprint for a madrat function based on the code of the function itself, other madrat functions which are called by this function and of all source folders involved in the process.

madrat:::fingerprint()

And it at least claims that changes to the source directory do affect the caching:

library(mrremind)

x <- madrat:::fingerprint('calcHistorical', TRUE)

grep('steel', names(attr(x, 'details')), ignore.case = TRUE, value = TRUE)
[1] "mrremind:::calcSteelStock"                      
[2] "mrremind:::readworldsteel"                      
[3] "/p/projects/rd3mod/inputdata/sources/worldsteel"

But the hashes are calculated based on file names and the first 300 bytes of the files, which might work on binaries, or might not.

But this does not explain why a change in calcHistorical caused recalculation of readworldsteel, revealing the bug.

Three theories:

  • Did you use an empty cache directory "falks-cache"?
  • There might have been changes to the worldsteel source directory, like a deleted file, which would have changed the fingerprint.
  • Different lunar phase.

@fbenke-pik
Copy link
Contributor Author

Thanks for pointing out that file changes do indeed affect the caching, wasn't aware.
I think I did not use an empty cache directory, but I wouldn't bet on it. But for me the possible explanations are good enough for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants