The Overall Benefits of Using Git for Version Control

There are many benefits to using Git for version control, amoung them, having a record of your work, especially your code, at different points of time. This is particularly important as your code grows more complex. This allows the freedom to experiment and attempt to improve (and likely break) your code, while knowing you have a version of working code you can return to if you get stuck.

Beyond the freedom to experiment, version control has the added benefit of having a backup of your code from which you can access from any computer. If you somehow delete, corrupt, or simply cannot find where you stored your code on the computer, you can quickly reclone the entire repository with a few clicks and single command. If you are working on a different computer, you can access your code and easily share it with colleagues and other collaborators.

Storing your code online has other benefits, such as building a code portfolio. If you apply to tech related jobs or fellowships, they will often ask and otherwise search for your github profile to see examples of your code.

Github also has the benefits of collaboration and being able to work on teams both locally and remotely. Members of a team can each work on various parts of the code in tandem without fear of destroying the codebase, particularly if using the benefits of “branching” on Github.

Ways to Interact with Git: The GUI vs. Shell

It should be noted that we have thus far only scratched the service in using Git. There are two primary ways to interact with Git, using a GUI (Graphical User Interface) versus the Shell. One possible GUI approach is operating Git using the point and click options in R. Conversely, one can use the Shell (on Linux/Mac OSX the Unix Shell/Terminal, on Windows PC Powershell, GitBash or Cygwin ). Below, I discuss some benefits and limitations of the approaches.

The R-GUI for Git

Benefits of the GUI

Simple to use and operate
If using only RStudio, code and git can be easily written, run, and committed in the same environment

Limits of the GUI

Process does not translate well to multi-language (Python, Bash, SQL, HTML, CSS) repositories
Less help for resolving issues on a language-specific GUI for Git
Arguably more time consuming once you know Git using the Shell

Shell for Git

Benefits of the Shell

1. Same exact process for adding and updating files in any language
- e.g. R, Python, Bash, SQL, HTML, CSS, Java, C++, LaTex, Markdown, etc
- Adding/Removing, Committing Changes, Pushing/Pulling, Branching is the same
2. Commands are explicit and may be difficult to replicate in a GUI, particularly for advanced commands, e.g.
- Checking out a specific branch
- Pulling updates from another branch into a working branch
- Pushing changes of a particular brach

Some of these advanced steps are particularly important when collaborating on a team. Because collaborative work often involves one or more languages, Git from the shell is even more advantageous.

Limits of the Shell

Potentially higher learning curve
More challenging if you have not used the Shell before

Basics of Using Git with the Shell

Overview of Basic Git Steps

git clone LINK_TO_REPOSITORY
git add NAME_OF_FILE
git commit -m "Comment describing what file does or what change you made"
git push

In Detail with Other Helpful Commands

Before cloning the repository, you will want to navigate to your desired location on the computer. This is most likely whichever folder you store your code and notes for the class.

This can be achieved with the command prompt (these commands work on Linux/Mac OSX). To use the same commands on Windows, Cygwin can be used.

#Display current working directory
pwd

#List files commands
ls #List files in folder
ls -a #List all files/folders including hidden ones
ls FOLDER_NAME/ #List files inside FOLDER_NAME

#Change Directory
cd <NAME_OF_FOLDER_IN_PWD/FOLDER/FOLDER/ETC #Go to subfolder
cd .. #Go one folder up
cd ~/NAME_OF_FOLDER/FOLDER/ETC/ETC #Go back to home directory and go to folder from there

#Clear shell screen
clear

The above commands are some of the most basic and should help you navigate to the desired folder. Once there, clone your repository.

#CLONE YOUR REPOSITORY
git clone LINK_TO_REPOSITORY

This command will make a clone of the repository in the current working directory. Specifically, it will make a folder (using the name of your repository). This folder will contain all code and documents as seen in the repo online.

NOTE: If you clone the repository and try to reclone it in the same directory, you will get an error. Why? A folder already exists with that name. Solution, delete the folder or rename it.

To rename mv NAME_OF_REPO NAME_OF_REPO__OLD

CAUTION THE BELOW COMMAND PERMANTLY ERASES THE FOLDER

(DO NOT ENTER THE WRONG FOLDER NAME AS YOU WILL PERMANTLY ERASE IT)

To permantly remove: rm -rf NAME_OF_REPO

Once you have cloned your repo, you have to cd inside the repository folder to use Git. For example:

To get the current status of your repo, use the git status command. If you run this command before using cd to enter the folder, you will get this error: fatal: Not a git repository (or any of the parent directories): .git

Instead, do the following:

cd NAME_OF_REPO
git status

Which provides the output:

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

Once inside the repository, you are free to make changes as you see fit. This might include making documentation, creating R or Rmd scripts, Python scripts, Shell scripts, making folders for results, or anything else imaginable.

Once you make changes, you will want to add the files or folders, commit them, and push the results.

NOTE: If you are working on a team, another person’s code, or multiple computers, first attempt to pull to make sure you are on the most up to date version of the repo (and thereby avoid merge conflicts).

git pull

Now, add any changes. There are several ways to do this depending on how many changes you make.

git add SINGLE_FILE_NAME #Adds a single file

git add SINGLE_FOLDER_NAME #Adds folder, all subfolders, and files

Note, that you will want to check to see if you are adding the files you expect when you do this.

git status

Will show the names of all files staged to be commited. Review these closely. Do you really want to add 100 images from ggplot2 graphs in the output folder? Are you accidentally adding your API credentials? Double check your files. If you made a mistake, don’t commit. Use git reset HEAD and then redo the git add FILE to add the correct files and folders only.

Once satisfied, commit the results:

git commit -m "Added initial script to do XYZ"

You may want to consider seperate commit messages when uploading multiple files rather than use the same commit for all files.

git add FILE1.R
git add FILE2.R
git commit -m "Added R scripts that call Twitter API and download tweets"
git add FILE3.R
git commit -m "Added R script to clean and graph tweet results"
git add FILE4.R
git commit -m "Built outline Shiny app of results"
git push

In most cases, however, you will want to not only commit, but also push the changes once you get something working. If you already have a working version of all these scripts and want to make minor dependent changes in each one, it may be appropriate to make the changes and commits before pushing. This said, test your code before committing and pushing it. In this way you can note if the code is working or not in your message, i.e. git commit -m "Updating code, currently not working"

Once you have committed and pushed the changes, ensure you are up to date:

git status

Double check by visiting the URL for your repo. Is your latest change online?

If not, consider the following. Did you save your latest code? Was that code saved to your repo or elsewhere? If saved and in your git repo, make a minor change to the code (like a #comment). Make sure you see changes to the file when you do git status. Then repeat:

git add FILE
git commit -m "Trying again"
git push origin master #Assuming you are working on the master branch

If you get an error, Google search for the error. Someone probably has had it before and has answered it on StackOverflow. A benefit of using Git in Shell is that if you have a problem, dozens before you have had the same issue, asked about it, and received an answer from someone else.

Editing Code Outside of R

If you would like to work on other types of code, such as Python, Bash, PostgreSQL, or even the .gitignore, you may want to use a text editor other than RStudio.

For text editing, most Mac or Linux machines will have a program Vim installed. Since Vim is already often installed, you may want to consider learning it. However, for ease of use, I would recommend installing the text editor Atom which will work on Mac, Linux, and PC.

Once installed, you can start a new script by simply typing in bash atom name_of_your_new_script. You can edit an existing script by using atom name_of_script.

Changes to scripts can be commited as before using Git.

Introduction to Using Git in Bash

Joshua Gary Mausolf