-
-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract R code from R Markdown HTML file #1811
Comments
IMO, using pandoc makes the code simple and applicable to more formats (e.g., gfm). # purloc = purl + pandoc
purloc = function(x, output = file.path(".", xfun::with_ext(x, "R")), ...) {
input = tempfile(fileext = xfun::file_ext(x))
file.copy(x, input)
knitr::pandoc(input, 'commonmark', ext = 'md')
intermediate_md = xfun::with_ext(input, 'md')
intermediate_md %>%
readr::read_lines() %>%
stringr::str_replace_all("^``` r", '```{r}') %>%
readr::write_lines(intermediate_md)
knitr::purl(intermediate_md, output = output, ...)
} |
@atusy that is a great simplification and improvement on the DIY solution in the original. The Some questions Do you think # from inside applied-ml root directory
dir() %>% grep("Part_{1}.*html", ., value = T) %>% sapply(., html_to_r) -> a
dir() %>% grep("Part_{1}.*html", ., value = T) %>% mapply(html_to_r, inc_out=F, .) -> b
a[[2]] %>% cat # with output
b[[2]] %>% cat # without output For me, it's useful, but maybe not for everyone? Also, do you agree replacing character entities is useful? I think it is essential (otherwise pipes and some conditionals will appear meaningful in HTML but not in R code) replace_character_entities <- function(char_entity){
xml2::xml_text(xml2::read_html(paste0("<x>", char_entity, "</x>")))
}
# E.g.
replace_character_entities(">")
# [1] ">" Which makes a pipe appear as I applied this conversion to some test examples but I cannot be certain it will work under all circumstances (one exception that comes to mind is if R code contained some literal |
About About special characters, we do not have to care as pandoc takes care of them echo "<pre>%></pre>" | pandoc --from html --to gfm
# ```
# %>
# ``` |
There appears to be no fast and easy way to extract the R code from HTML files generated via R Markdown.
Example
Max and Davis's applied-ml workshop is a good example.
We can easily get the R code for 'Part_1.html', since we have access to the original
.Rmd
file, and can hence callBut we cannot so easily get the R code for parts 2 through 5, as the originating
.Rmd
is not available.Possible solution
html_to_r()
extracts the R code from R Markdown generated HTML files.I provide an implementation in a PR.
Using in the applied-ml example
We can now easily retrieve the R code from the
.html
files, like soThis can be merged if relevant or disregarded if not relevant.
The text was updated successfully, but these errors were encountered: