This content is from the fall 2016 version of this course. Please go here for the most recent version.
There are many benefits to using Git for version control, amoung them, having a record of your work, especially your code, at different points of time. This is particularly important as your code grows more complex. This allows the freedom to experiment and attempt to improve (and likely break) your code, while knowing you have a version of working code you can return to if you get stuck.
Beyond the freedom to experiment, version control has the added benefit of having a backup of your code from which you can access from any computer. If you somehow delete, corrupt, or simply cannot find where you stored your code on the computer, you can quickly reclone the entire repository with a few clicks and single command. If you are working on a different computer, you can access your code and easily share it with colleagues and other collaborators.
Storing your code online has other benefits, such as building a code portfolio. If you apply to tech related jobs or fellowships, they will often ask and otherwise search for your github profile to see examples of your code.
Github also has the benefits of collaboration and being able to work on teams both locally and remotely. Members of a team can each work on various parts of the code in tandem without fear of destroying the codebase, particularly if using the benefits of “branching” on Github.
It should be noted that we have thus far only scratched the service in using Git. There are two primary ways to interact with Git, using a GUI (Graphical User Interface) versus the Shell. One possible GUI approach is operating Git using the point and click options in R. Conversely, one can use the Shell (on Linux/Mac OSX the Unix Shell/Terminal, on Windows PC Powershell, GitBash or Cygwin ). Below, I discuss some benefits and limitations of the approaches.
Some of these advanced steps are particularly important when collaborating on a team. Because collaborative work often involves one or more languages, Git from the shell is even more advantageous.
git clone LINK_TO_REPOSITORY
git add NAME_OF_FILE
git commit -m "Comment describing what file does or what change you made"
git push
Before cloning the repository, you will want to navigate to your desired location on the computer. This is most likely whichever folder you store your code and notes for the class.
This can be achieved with the command prompt (these commands work on Linux/Mac OSX). To use the same commands on Windows, Cygwin can be used.
#Display current working directory
pwd
#List files commands
ls #List files in folder
ls -a #List all files/folders including hidden ones
ls FOLDER_NAME/ #List files inside FOLDER_NAME
#Change Directory
cd <NAME_OF_FOLDER_IN_PWD/FOLDER/FOLDER/ETC #Go to subfolder
cd .. #Go one folder up
cd ~/NAME_OF_FOLDER/FOLDER/ETC/ETC #Go back to home directory and go to folder from there
#Clear shell screen
clear
The above commands are some of the most basic and should help you navigate to the desired folder. Once there, clone your repository.
#CLONE YOUR REPOSITORY
git clone LINK_TO_REPOSITORY
This command will make a clone of the repository in the current working directory. Specifically, it will make a folder (using the name of your repository). This folder will contain all code and documents as seen in the repo online.
NOTE: If you clone the repository and try to reclone it in the same directory, you will get an error. Why? A folder already exists with that name. Solution, delete the folder or rename it.
mv NAME_OF_REPO NAME_OF_REPO__OLD
CAUTION THE BELOW COMMAND PERMANTLY ERASES THE FOLDER
(DO NOT ENTER THE WRONG FOLDER NAME AS YOU WILL PERMANTLY ERASE IT)
rm -rf NAME_OF_REPO
Once you have cloned your repo, you have to cd
inside the repository folder to use Git. For example:
To get the current status of your repo, use the git status
command. If you run this command before using cd
to enter the folder, you will get this error: fatal: Not a git repository (or any of the parent directories): .git
Instead, do the following:
cd NAME_OF_REPO
git status
Which provides the output:
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
Once inside the repository, you are free to make changes as you see fit. This might include making documentation, creating R or Rmd scripts, Python scripts, Shell scripts, making folders for results, or anything else imaginable.
Once you make changes, you will want to add the files or folders, commit them, and push the results.
NOTE: If you are working on a team, another person’s code, or multiple computers, first attempt to pull
to make sure you are on the most up to date version of the repo (and thereby avoid merge conflicts).
git pull
Now, add any changes. There are several ways to do this depending on how many changes you make.
git add SINGLE_FILE_NAME #Adds a single file
git add SINGLE_FOLDER_NAME #Adds folder, all subfolders, and files
Note, that you will want to check to see if you are adding the files you expect when you do this.
git status
Will show the names of all files staged to be commited. Review these closely. Do you really want to add 100 images from ggplot2 graphs in the output folder? Are you accidentally adding your API credentials? Double check your files. If you made a mistake, don’t commit. Use git reset HEAD
and then redo the git add FILE
to add the correct files and folders only.
Once satisfied, commit the results:
git commit -m "Added initial script to do XYZ"
You may want to consider seperate commit messages when uploading multiple files rather than use the same commit for all files.
git add FILE1.R
git add FILE2.R
git commit -m "Added R scripts that call Twitter API and download tweets"
git add FILE3.R
git commit -m "Added R script to clean and graph tweet results"
git add FILE4.R
git commit -m "Built outline Shiny app of results"
git push
In most cases, however, you will want to not only commit, but also push the changes once you get something working. If you already have a working version of all these scripts and want to make minor dependent changes in each one, it may be appropriate to make the changes and commits before pushing. This said, test your code before committing and pushing it. In this way you can note if the code is working or not in your message, i.e. git commit -m "Updating code, currently not working"
Once you have committed and pushed the changes, ensure you are up to date:
git status
Double check by visiting the URL for your repo. Is your latest change online?
If not, consider the following. Did you save your latest code? Was that code saved to your repo or elsewhere? If saved and in your git repo, make a minor change to the code (like a #comment). Make sure you see changes to the file when you do git status
. Then repeat:
git add FILE
git commit -m "Trying again"
git push origin master #Assuming you are working on the master branch
If you get an error, Google search for the error. Someone probably has had it before and has answered it on StackOverflow. A benefit of using Git in Shell is that if you have a problem, dozens before you have had the same issue, asked about it, and received an answer from someone else.
If you would like to work on other types of code, such as Python, Bash, PostgreSQL, or even the .gitignore, you may want to use a text editor other than RStudio.
For text editing, most Mac or Linux machines will have a program Vim installed. Since Vim is already often installed, you may want to consider learning it. However, for ease of use, I would recommend installing the text editor Atom which will work on Mac, Linux, and PC.
Once installed, you can start a new script by simply typing in bash atom name_of_your_new_script
. You can edit an existing script by using atom name_of_script
.
Changes to scripts can be commited as before using Git.
This work is licensed under the CC BY-NC 4.0 Creative Commons License.