-
Notifications
You must be signed in to change notification settings - Fork 3
/
week1.Rmd
149 lines (110 loc) · 11.3 KB
/
week1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "Week 1"
output:
html_document:
toc: true
include:
after_body: footer.html
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Key Git/GitHub Skills
* [The lecture notes](week1-introtogit.html)
* [Lecture video](https://youtu.be/M1sOC4046PQ)
## Some more videos illustrating Git/GitHub skills
* Here are some shorter videos of the workflows that I showed during Week 1.
* [Playlist of videos](https://www.youtube.com/playlist?list=PLDqZV53PcnYzuhOWmBNJy6jJRcEeHPi-M)
* Connecting GitHub Desktop to your GitHub account. [Video](https://youtu.be/AiRrSwQte5k)
* Making a repo on GitHub and getting that into RStudio and GitHub Desktop [Video](https://www.youtube.com/watch?v=jPya6iDik6M)
* Getting a repo on GitHub into RStudio Cloud. Watch the first part of the previous video if you need to see how to create a repo on GitHub. [Video](https://youtu.be/w6fivjMGZVo)
* Connecting Git and RStudio Desktop. You can use Git inside RStudio (I do sometimes) but for 25 to 50% of people there is much suffering going this route while the set-up using GitHub Desktop is fast and easy. [Notes](https://rverse-tutorials.github.io/RWorkflow-NWFSC-2020/set-up.html#Set_up_RStudio_to_use_Git), [Tutorial](https://happygitwithr.com/install-intro.html), [Video overview of the steps](https://youtu.be/2QJQ6pNroVM).
* Real workflow. This is a quick video of me working on the workshop webpage via RStudio. [Video](https://youtu.be/8aHOrfoQICk)
* Connecting GitHub and GitKraken. I downloaded [GitKraken](https://www.gitkraken.com/) and this shows how to connect it to GitHub. [Video](https://youtu.be/_plekl0y8Rk)
* Clone a repo from GitHub using GitKraken. [Video](https://youtu.be/axXwBEc4g0U)
* Open a repo on your computer using GitKraken. [Video](https://youtu.be/sBxya8FjK7w)
## Git/GitHub FAQs
* Difference between forking and importing
* This [video](https://youtu.be/uuti1G48yhY) shows how to fork and some of the features you'll see on GitHub when you do that.
* This [video](https://youtu.be/0-5LiuxNnbM) shows how to import the same repository and the difference between what you see when you fork.
* Discussion of alternatives to using branches as repository 'versions'. [Video](https://youtu.be/t2YqepzFSYc). Branches are not meant for versioning your repository. They are for breaking off a copy to work on something and then merging those changes back in (or deleting). Branches might have a long life, e.g. a Development branch, but are not for 'versioning' (2020 data, 2021 data). It is tempting to use them that way, but there are better alternatives, e.g. releases, separate repositories, separate folders. A unique exception might be if you have a branch that say uses C++ versus pure R but they are otherwise the same code base.
## Branches
This [video](https://youtu.be/4TUZEQdaZGM) shows a basic workflow using branches.
In the lecture, I cautioned against using branches when you are beginning with Git. Even through I work on many Git repositories and develop public R packages, it is rare that I need to use branches. Normal scientific workflow does not involve branches and IMO most scientists won't gain much by adding that to their workflow. The GitHub features I talked about so far are things that are already part of our workflow (taking notes on what we are doing, saving copies along the way, reviewing work with collaborators, sharing our work) but Git/GitHub allows us to do it more efficiently, better and faster.
I do use branches regularly with certain R package projects and when I use them, it is in very specific ways
* work on a file or set of files, finish work, merge into main, delete the branch.
* sandbox an idea. If I am really uncertain about an idea/change or about to make a major revamp, I'll make a branch while working on the idea, once I decide yeah/neah, I merge the branch into main and delete the branch.
* When working on a branch, I stay off other branches and my main branch as much as possible.
* I create a timeline (mentally) for a branch before I create it. The branch has a concrete purpose and end state (i.e. when xyz plot functions are done or when documentation for xyz files is done). Vague branches like 'development' are too amorphous.
* When I am using GitHub Pages, and need the `gh-pages` branch to serve that up.
But usually other features of GitHub are sufficient and more appropriate.
* Using the revert feature to get rid of a change that I made.
* Using Releases to create the stable version and using the main as the development.
* Releases also effectively create an 'archive' version of the repository at key states: draft 1, draft 2, etc.
* Using issues to create concrete small chunks of work.
* Using a fork of a repository and a pull request (if I am collaborating with someone)
That said, they are definitely helpful in certain situations. Before you start working with branches make sure think through some of the aspects of how they affect your file system and how that will affect your current workflow and the workflow of any users of your repository.
* Switching branches changes your file system state. A branch is not a separate 'space'. When you switch branches, you tell Git to change your files to reflect the branch state. Do you have any code that 'sources' the folder/repository with branches? For example, do you have any code in other folders with lines like `source("Documents/myrepowithbranches/plot.R")`. Do you do things like that in practice? That kind of workflow will cause problems because they will reference the branch state when you switch branches. Your file system state will remain in the branch state until you switch back to the main branch.
* File time stamps. [Video](https://youtu.be/_WXHp6uRmAI) Because switching branches on your computer causes Git to change your file system to that branch state, the file time stamps of any files that differ between branches will update to the time that you switched branches. It is possible to make Git change the file time stamps back to the last modification time. This [video](https://youtu.be/WbP9B_jfxPU) shows how.
* Are you working in a team with users who are not Git-saavy but will need to access branches? If you only have them access branches on GitHub, then it is probably fine but if you will have them switches branches using RStudio or GitHub Desktop on their computer then they are likely to get confused when they accidentally leave their file system in a branch state.
* Pulling and Pushing work a bit differently when you have branches. When you do a pull from GitHub, you will get the changes across all branches, but when you push, the push is branch specific. So if you have changes on the main branch and another branch, you need to push from main then switch to the other branch(es) and push from them too. [Video](https://youtu.be/uJxk2l5PEKQ)
* Are you using a cloud backup system that syncs across devices/computers? Dropbox and iCloud are examples. Syncing repositories outside of Git (so not using the push/pull system) can definitely cause problems especially if you are using branches. If your backup system is linear, one computer being backed up, with no syncing across devices, and you are not working off-line then you are probably ok.
1. Why are cloud backup/syncing systems so problematic? The problem happens when you work offline and you use branches. Remember that changing branches, changes your file system. Let's say you are on branch A on computer 1 (home) and working off-line. Then on computer 2 (office), your create branch B and do a bunch of work. You leave that computer on branch B. Then you go back home, get on computer 1 and get online. That computer syncs to the cloud and wipes out all the work on computer 2 because it is in branch A or it'll create a slew of 'Conflicted Copies'. These sort of problems happen all the time, when you do automatic syncing across computers.
2. Backups of repositories with branches can be confusing even if you don't have syncing across devices. Let's say you switch to branch B and delete most of your files. You stay in branch B and that is what is backed up. The info to get back the files (in the main branch A) are still there but in the hidden .git folder. If you go and look at the back-up, you will just see branch B with all the files gone. You have to know that it is a Git repository and use Git to find our what branch the repository is on. But it might not be obvious to you that this folder is a Git repository.
* How to I clone just one branch? For this you will use Git from the command line. Open a terminal window, change to the directory where you keep repositories (e.g. `cd Documents/GitHub`). Then issue the command
```
git clone -b <branchname> --single-branch <remote-repo-url>
```
You can add the name of the folder optionally at the end. So the command I issued in the video was
```
git clone -b test --single-branch https://github.com/eeholmes/Week2 Week2-test
```
Here's a screen recording of this [Video](https://youtu.be/CNZh9L2qwCc).
## Fix timestamps
* Look in the .git folder (it's hidden so unhide files) for the hooks folder
* Click on one of the xxx.sample files and duplicate (click on the checkbox next to file, click More, click 'Copy to'). *This is important* duplicate like this so that you retain the special file permissions.
* Save the file as 'post-checkout' with no file ending. Get rid of the code in the copy and copy in the code below.
Here is the post-checkout code used in this [video](https://youtu.be/WbP9B_jfxPU) to show you how to fix time stamps when you switch branches.
```
#!/bin/sh -e
OS=${OS:-`uname`}
if [ "$OS" = 'Darwin' ]; then
get_touch_time() {
date -r ${unixtime} '+%Y%m%d%H%M.%S'
}
else
get_touch_time() {
date -d @${unixtime} '+%Y%m%d%H%M.%S'
}
fi
# all git files
git ls-tree -r --name-only HEAD > .git_ls-tree_r_name-only_HEAD
# modified git files
git diff --name-only > .git_diff_name-only
# only restore files not modified
comm -2 -3 .git_ls-tree_r_name-only_HEAD .git_diff_name-only | while read filename; do
unixtime=$(git log -1 --format="%at" -- "${filename}")
touchtime=$(get_touch_time)
echo ${touchtime} "${filename}"
touch -t ${touchtime} "${filename}"
done
rm .git_ls-tree_r_name-only_HEAD .git_diff_name-only
```
### RStudio and Git
**I want to link RStudio to Git, but I am getting an error that RStudio cannot find Git**
Open RStudio. Go to Tools > Global Options... > Git/SVN . The paste in the location of the git.exe file.
On a Mac? location is `usr/bin/git`
Finding that location if you are on a PC can be onerous. First make sure you can see hidden folders. In a finder window, you click View and make sure the hidden files checkbox is checked. Then here are some ideas of where to find `git.exe`. Note you might not see `.exe` . You might only see `git`. It depends if you have `show file endings` selected in preferences.
If you only installed GitHub Desktop, look here
```
C:\Users\UserName\AppData\Local\GitHubDesktop\app-2.8.3\resources\app\git\cmd\git.exe
```
You have to change app-x.x.x to whatever version you have.
If you installed Git for Windows, look in these places.
```
C:/Program Files/Git/bin/git.exe
```
If you installed Git for Windows locally, look here
```
C:\Users\UserName\AppData\Local\Programs\Git\bin\git.exe
```