-
Notifications
You must be signed in to change notification settings - Fork 2
/
Batch-word-to-rmd.Rmd
49 lines (38 loc) · 1.76 KB
/
Batch-word-to-rmd.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
title: "Batch convert Word files to Rmd"
date: '`r format(Sys.time(), "%A %B %d %Y %X %Z")`'
output: md_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r include=FALSE}
options(
gargle_oauth_cache = ".secrets",
gargle_oauth_email = "[email protected]"
)
googledrive::drive_deauth()
googledrive::drive_auth(scopes = "https://www.googleapis.com/auth/drive.readonly", email="[email protected]")
url_googledrive <- "https://drive.google.com/drive/folders/11WnXxs56jORbLkD1mFTZxwSaShex3Sse"
id_googledrive <- "11WnXxs56jORbLkD1mFTZxwSaShex3Sse"
```
# Word to Rmd
This Rmd will download Word (`docx`) files from a Google Drive folder and convert to Rmds.
## Download all the Word files
This will download the files from the folder and save to a folder called `data`.
```{r message=FALSE}
a <- googledrive::drive_ls(path = url_googledrive, type = "docx")
for (i in 1:nrow(a)){
googledrive::drive_download(a$id[i], overwrite = TRUE, path = file.path("data", a$name[i]))
}
```
## Convert the Word files to Rmd
Converting Word to Rmd works well if your Word document is simple and all the text has style of "Normal". Click the Style pane from the Home tab in Word to see the style applied to text. Real-world Word files don't convert so well but at least you get the text. Tables are particularly badly converted.
Note, read up on the [options in Pandoc](https://pandoc.org/MANUAL.html#options). You can tell it how to deal with track changes in the document.
```{r}
for (i in 1:nrow(a)){
fil <- file.path(here::here(), "data", a$name[i])
outfil <- file.path(here::here(), "data", paste0(stringr::str_sub(a$name[i], 1, -5), "Rmd"))
rmarkdown::pandoc_convert(fil, to="markdown", output = outfil, options=c("--wrap=none", "--extract-media=."))
}
```