Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset label truncated after write_xpt #746

Open
kaz462 opened this issue Dec 14, 2023 · 5 comments
Open

Dataset label truncated after write_xpt #746

kaz462 opened this issue Dec 14, 2023 · 5 comments
Labels
bug an unexpected problem or unintended behavior documentation

Comments

@kaz462
Copy link

kaz462 commented Dec 14, 2023

From write_xpt documentation:

Note that although SAS itself supports dataset labels up to 256 characters long, dataset labels in SAS transport files must be <= 40 characters.

The following dataset label in Chinese has 40 characters and was truncated after write_xpt.

(thanks @siye6 for the original example in atorus-research/xportr#194)

label <- "这是一段文字,用来测试在XPTversion5中作为数据集label是否会被截断"
nchar(label, type = "chars")
#> [1] 40
nchar(label, type = "bytes")
#> [1] 88

tmp <- tempfile(fileext = ".xpt")
haven::write_xpt(mtcars, tmp, label = label)
test <- haven::read_xpt(tmp)
attributes(test)$label
#> [1] "这是一段文字,用来测试在XPTv"

nchar(attributes(test)$label, type = "chars")
#> [1] 16
nchar(attributes(test)$label, type = "bytes")
#> [1] 40

Created on 2023-12-13 with reprex v2.0.2

@ynsec37
Copy link

ynsec37 commented Dec 18, 2023

Dear developer,

I found that the label length that must be <= 40 is just used for the xpt 5, if the version = 8 the label should be up to 256.

sas xpt version 5

image

sas xpt version 8

image

@botsp
Copy link

botsp commented Dec 28, 2023

Hi both, May I confirm a question about data conversion.
I am trying to convert the *.rda file to *.sas7bdat, and it seems the "write_xpt" doesn't work as expected. The created sas7bdat file cannot be opened, it always shows "file ... is not a SAS data set".

  • write_xpt(my_dataset, "P:/Kevin/CDISC_DataFromR/Pharmaverse/Adam/my_sasdsy.sas7bdat") *

I saw some discussion about this issue and doesn't found a good solution.

  1. What is the recommended method for converting the *.rda file to a *.sas7bdat file?

  2. It seems that "write_xpt" works well when converting to an *.xpt file. Should I first convert the file to an xpt format and then change it to a *.sas7bdat file using SAS? Are there any potential risks associated with this approach?

Looking forward to leanring the insights from your valuable experience. Many thanks!

@ynsec37
Copy link

ynsec37 commented Dec 29, 2023

Hi @botsp It seems that write_xpt() may only support the xpt creation.

write_sas() creates sas7bdat files. Unfortunately the SAS file format is complex and undocumented, so write_sas() is unreliable and in most cases SAS will not read files that it produces.
write_xpt() writes files in the open SAS transport format, which has limitations but will be reliably read by SAS.

For sas7bdat, I use the same way you mentioned, that is creating the xpt first by R then coverting to sas7bdat by SAS.
After converting, I compared results from write_xpt() with SAS datasets directly created by SAS, there is no difference except the variable length.

@botsp
Copy link

botsp commented Dec 29, 2023

Thanks for your explanation and this inspire me about the method of sas data conversion. Thank you!

@gorcha gorcha added bug an unexpected problem or unintended behavior documentation labels Jan 31, 2024
@gorcha
Copy link
Member

gorcha commented Jan 31, 2024

Hi @kaz462 and @ynsec37,

Thanks for the feedback! This is an issue with our dataset label validation code, and the documentation could be clearer - the dataset label for XPT files is a maximum of 40 bytes rather than characters. Our validation code is currently checking with the default type = "chars" and should be updated to type = "bytes".

@ynsec37 note that the XPT documentation shared above is referring to the variable label length. Although variable labels can be longer in version 8 the maximum dataset label length is still 40 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior documentation
Projects
None yet
Development

No branches or pull requests

4 participants