What is Git and Version Control?

Today we have dev teams that can number in the hundreds, some working from different countries remotely, all working on a single code base; making changes sometimes in the same file. Could you imagine passing around a thumb drive with your changes? Or constantly having to plug in new changes? Or someone deleting a critical file? Or changing some very complex code, forcing the team to go back in and do the same work all over again? What a nightmare.

Enter version control. Version control software allows us to have many different people working on the same files very easily. It gives us the ability to put our changes into the code base (pushing or merging/pulling) easily, get updates and changes that others have made (pull or fetch), and allows us to make copies of the entire code base to work on instead of working on the "master" code.

As an aside, Git is not the only version control software, but it is the only one I've used, and it is the only one I am covering here.

Local Vs Remote

The first important thing to understand with Git is that you have a remote copy of the code on a repository and a local copy of the code in a local repository. The local repository and remote repository are connected via a network link, often in the cases of the popular internet repos like GitHub it's just a simple URL. This link is set up when first creating a repository from a git service (GitHub, BitBucket, etc) by running the initializing commands:

initialize the git repo, adding a .git file in the current directory:

$ git init

add all changes (this will be all the files in the dir since it's a fresh git):

$ git add .

Set the remote repository URL so .git knows where to send and pull from:

$ git remote add origin https://urlhere/projectName.git

When you are making changes on your computer, you are working on the local repository. You always want the remote repository to be up-to-date, so we want to update the remote often with our changes. We do this with a push command that pushes our changes through the network link to the remote repository. Always remember that the remote repository doesn't have any idea what you've done or changed until you push your changes!

Branching

I have an application all written. The code is my master copy. I place that into a repository to store it. I can work directly on the master copy, but that is dangerous as we don't want to break something or delete something important and have no working version of the application. Also, my application is big, and I have a team who all work on it too. How do we collaborate?

What I need is to be able to let my team all make copies of the master, and then make their changes on that. Making a copy of the master to work on is called creating a branch, or branching:

$ git branch myBranchName && git checkout myBranchName

OR
$ git checkout -b myBranchNameHere

The above do the exact same thing, the second just uses the -b flag on checkout that says "switch to this new branch that I am creating". It is important to note that when I create a new branch, it is simply making a copy of the code in that particular branch. Git has "positioning" within the "git tree" like the command line: you can change around to different directories in the CLI, and you can change around to different branches in git. If my current git position is the Master branch, making a new branch will copy the master branch in it's current state.

If I create a new branch called NewBranch1, make some changes, and then create a new branch my new branch will include those changes I made as it is simply creating a branch off of NewBranch1.

This is extremely useful as often you'll find yourself wanting to "try something" that could end up breaking or destroying things. In these cases, you just check out a new branch, do your mad scientist experiment, and if it all goes to hell you just delete the branch and go back to pre-apocalypse. If it works, you can just merge that change into your original branch and keep the changes. But how do we do that?

Adding/Staging and Committing

Remember my point above about deleting something you needed or breaking something? Well git doesn't actually save those changes you're making until you tell it to. Basically what git does when you make a change is tracks that change against the branch. You change var myCar from 'honda' to 'subaru' and git writes that all down, but keeps the base branch without the changes too. once we decide we want those changes (or at any point really) we can type git status and get a list of all the tracked changes. In the CLI the red files are tracked changes that haven't been staged, while green files are tracked changes that have been staged. I like to think of staging just like in the military sense where a staging area is where they get ready at. It's the same with git, we stage the changes we want to commit. How do we stage things? With the add command:

$ git add filenameGoesHere/with/path/etc

Then if we were to check status:

$ git status

We see:

modified: filenameGoesHere/with/path/etc

modified: otherfile/we/didntAdd

So we see the green text indicates that file has been added and is ready to be commited (it has been "staged"), while the red text indicates that file has been modified but has not been staged for the next commit.

or we can add specific files that have been changed and nothing else:

$ git add /components/homepage/home.css

Once we add the files we want we can type git status again and see the green files are staged while the red are not. Next we want to commit our changes. Committing is just like it sounds: we are committing those changes to the branch, basically saying save these changes to this branch. Remember, this only saves the changes to that specific branch. Commit also requires that we write a message to go along with the commit, don't just write "fixed bug" or something equally useless! There is a lot of best practices regarding commit messages, research them a bit so you understand the best ways to write these. Remember: good devs write code that is easy to read and understand. That doesn't mean just the code, that means you're writing comments and commit messages that literally explain things, and labeling branches/variables/classes/functions in a descriptive manner.

What is a commit layout?

$ git commit -m "commit message goes inside double quotes"

The -m indicates a "message", there are several other flags I won't go into, but you should check them out! (https://git-scm.com/docs/git-commit)

Merging

Say I have a branch off of master called fix-broken-button. The branch fix-broken-button is just a copy of the Master code. I then go about making my changes to do what I set out to do, which is fix the broken button (just like everything else, we want to make descriptive names for our branches). While working on this button, I get an idea to try some fancy new approach to the function the button controls, so I check out another branch called button-onclick-experiment. So now I'm on branch button-onclick-experiment which is a copy of the branch (with any changes I've made) fix-broken-button which is a copy of Master.

I make my changes to the function and decide I like it better and want to keep the changes, so now I checkout my previous branch and merge the changes in by checking out the branch I want to add the changes to and then using the merge command with the name of the branch that has the changes I want:

First I make changes to my project, then I add all of them:

$ git add .

I know I want to keep my changes so I commit them

$ git commit -m "made some great changes and found a new algorithm that will solve world hunger"

Next I will checkout the branch I want to move my changes into, and then use the merge command with the name of branch I want to bring in:

$ git checkout fix-broken button

$ git merge button-onclick-experiment

Now, as long as there aren't any merge conflicts, that code will have been added to my fix-broken-button branch. Cool! An important thing to note about merge is that it will automatically commit the for you. But remember: this is all still local and hasn't been added to my remote copy of the code (remote repository). So let's cover pushing to a remote.

Pushing, Pulling

From the above examples here is where we are currently at:

local repository:
Master
|---- fix-broken-button

|----merged---- button-onclick-experiment

remote repository:

Master

|----

Since we haven't pushed anything to the remote, as far as it's concerned nothing has changed. Let's push our fix-broken-button branch to it. We already merged the button-onclick-experiment branch into fix-broken-button, so it is ready to be sent out. We do this with the push command. Think of it like you are pushing changes out to the web copy.

Since we created the branch locally, our remote and git has no idea what we are talking about if we just try to push this branch to the remote repository because there is no corresponding branch there. Since we aren't on master, git doesn't know what to do with this new copy. The first time you push that branch to your remote you'll have to set the upstream origin so that it tracks your branch. This is all fancy talk for basically telling your remote "there is a new branch, it's called [insert branch name], and you'll want to accept new pushes going to it". That way we can send changes to that branch on our remote, keeping our Master separate. A push when setting the upstream has a layout of the following:

$ git push --set-upstream origin newBranchNameGoesHere

Where origin is our repository (the url is saved in the .git file in the project when we ran git init or git clone) and the branch name is... well yeah the branch name. We can shorthand the --set-upstream using the -u flag. So in our above example it would be:

$ git push --set-upstream origin fix-broken-button

$ git push -u origin fix-broken-button

The above two do the exact same thing. After we have done this for our branch, in the future we can just do $ git push and it will automatically push to our branch on the remote repository.

Now say that our dev team wants to work using our branch. They can access it now that it's on the remote repository using the pull command from the project directory:

$ git pull

This command basically says "give me a new copy of the remote repository from my local repository". If my dev team runs this they'll get the Master branch in it's current state, and my new fix-broken-button branch we just pushed out to it. They can then checkout the branch in their local repository:

$ git checkout fix-broken-button

Switched to branch 'fix-broken-button'

Your branch is up-to-date with 'origin/fix-broken-button'.

We can see in the message that it tells us what branch we swapped to, and it gives us a message that we're up to date. If we weren't it would let us know by telling us how many commits behind our version (aahhhh, see there's the word!) of the repository is.

As you can probably see by now the reason it's called version control is because you are able to have many, many different versions of the same thing all living in harmony because of the controls in place to keep them separated. If you're anything like me you are terrified of accidentally pushing to the Master and wiping up things with your crappy changes and ruining everything! Well it's simple to do that in your projects that you control, but it isn't in anything you'll be working on. That's thanks to the pull request.

Pull Requests

For someone to make a change to your repository they must submit a pull request to you. Basically it's them saying "I have this code I changed in your repository, do you want to add it?" and from there you can look it over and decide if you want to add it or not. This is done on "staging" branches like a "Development" branch or the like, never on Master.

To contribute to someone else's project that you aren't added as a contributor for (think of it like a member of that projects dev team) you must fork their repository. This is, for our purposes, very similar to cloning, but it doesn't allow you to push to their project (cloning is a separate instance of the project on your computer, and the upstream is the main project it was cloned from; i.e. your pushes can go to the project and change things without your consent, so you can see why you wouldn't want random people able to clone your project).

When we fork their repository we make a copy to our GitHub/BitBucket/GitLab/etc account which we can then clone (because it's our copy now, not the original project) which we can push to all we want. Once our changes are done and updated, we submit a pull request to the original project. I have always done this part through the GUI interface on the website of whatever repository you use (i.e. GitHub) though I am sure there is a CLI command to do it. The owner will look at our pull requests changes, decide if they want them, if they don't want them, or they want them but they want certain changes. This is the basis of how open source software works.

This article is just scratching the surface on many of these items, but my goal was to provide a solid base of understanding for Git and Version Control. I hope I have helped your understanding, and if you like how I write and want me to cover something just leave a comment or send me an email.

Thanks for reading and stay tuned for more!

// Corey

Speak Plainly

Search This Blog

Untangling the web using the simplest language possible.