Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug while reading sas7bdat file #728

Open
vpprasanth opened this issue Jul 10, 2023 · 4 comments
Open

bug while reading sas7bdat file #728

vpprasanth opened this issue Jul 10, 2023 · 4 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@vpprasanth
Copy link

vpprasanth commented Jul 10, 2023

There is a bug while importing sas7bdat files into the R environment. That is, if the sas7bdat file contains a date variable and is holding a value 07/07/7777 (say. By the way, this is a valid data for clinical studies to characterize "not applicable". similarly, we make use of 09/09/9999 to refer it as a missing value). Then if we import the same into R, it reads it as 7777-07-06. I could understand the change in format. However, I am a bit baffled with the change in value over here.

It would be nice if we can have an option to read all the variables as characters or "as it is", than changing the class by default.

@gorcha gorcha added bug an unexpected problem or unintended behavior readstat labels Jul 11, 2023
@gorcha
Copy link
Member

gorcha commented Jul 11, 2023

Hi @vpprasanth, thanks for the bug report.

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

Date variables in SAS are stored as numeric values so we can't preserve "as is" unfortunately, there's a necessary conversion step from the numeric value (number of days or seconds since the origin date) to the date representation. It looks to me like it's due to SAS having a difference in a leap day somewhere, but this will be easier to track down with a reprex.

Thanks!

@gorcha gorcha added reprex needs a minimal reproducible example and removed readstat labels Jul 11, 2023
@vpprasanth
Copy link
Author

vpprasanth commented Jul 11, 2023

bug.zip

Please find attached the zip file that contains the following:
a. txt_data
b. sas_data

Here is the SAS code used for generating the sas7bdat file (sas_data)

data sas_data;
informat Sub_ID 5. Date ddmmyy10. BMI 5.;
format Date ddmmyy10.;
infile "/home/u63128400/txt_data.txt" missover;
input Sub_ID Date ddmmyy10. BMI ;
run;

libname out "/home/u63128400/";
data out.sas_data;
set sas_data;
run;

Now, if you open the sas_data (the sas7bdat file) in SAS, you could see the date values as 07/07/7777. However, if you open the same sas_data (the sas7bdat file) in R using haven, you will see 7777-07-06. This is a mismatch.

@gorcha
Copy link
Member

gorcha commented Jul 11, 2023

Thanks!

@vpprasanth
Copy link
Author

By the way, SAS goes wrong with the leap year and it seems that it's an existing problem...!!!
https://blogs.sas.com/content/sasdummy/2010/04/05/in-the-year-9999/

@hadley hadley removed the reprex needs a minimal reproducible example label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants