6

I am a data scientist who programs in R, but am relatively new to using version control (yes, I know!) and I am still learning lots about it.

FYI: I use BitBucket and Sourcetree

I understand the general idea behind it - that you commit new updates / features / etc. - and that it acts as a safety net where you can revive features/code that would have otherwise been lost/overwritten, or even reverse your reversals if you decide you were in fact happy with the new code you'd written.

Today, I came to the realisation that I will need to completely rewrite an R script that I have (which I created/built myself - no other collaborators) because the requirements have now changed significantly, and rather than making several changes to the existing code (we're talking around 75.00%), it would probably be better and more efficient to rewrite it in its entirety.

As such, I would like to know the following: what is considered best practice in this case?

Should I create a new branch?

I have only ever used the main/master branch in git (I told you I was new to VCS's) and was thinking that I could do this (create a new branch) and then delete the content from my existing script and begin rewriting it in the same file (in order to preserve the same filename/history when I push the changes to my git repository).

My understanding is that my new changes are branched off of / away from the master branch and, if I am happy, I can/should merge them back into the master branch at the point where I am happy/confident that the changes are good to go - is that correct?

Or should I do something else?

If it is the former (creating a new branch), what might such a new branch typically be called?

I know that I'm not the only person to have ever needed to do a complete code rewrite/rebuild, so would appreciate some advice here.

MusTheDataGuy
  • 381
  • 1
  • 11

1 Answers1

4

One of the big advantages of branches in the business world is that it allows you to work on a new feature while allowing you to go back to the master version and make updates there if necessary, without losing your progress. For example, if you were rewriting the home page of your website, then a critical bug that prevents the user from signing in is discovered, this is what you can do with a branch.

  1. Commit the current work you have on the new homepage to your branch.
  2. Switch back to the master branch.
  3. Fix the bug and commit the fix to master, to get the fix out to the users quickly.
  4. Switch back to your new homepage branch and resume working on the rewrite exactly where you left off.

My understanding is that my new changes are branched off of / away from the master branch and, if I am happy, I can/should merge them back into the master branch at the point where I am happy/confident that the changes are good to go - is that correct?

You are quite correct. This is especially useful if multiple people code on this project, as a rewrite may be disruptive to their projects. Branches give you more control over when a feature gets added to the main code base. It also allows your coworkers to more easily test your branch/feature separately.

For a solo project that affects only one file with limited impact to the rest of the application, a branch may be overkill. Using a new branch causes no harm, but a standard commit to master still maintains the file history, and takes less time and knowledge of git.

If your rewrite causes changes to many files, could interrupt the workflow of coworkers, or if you might need to maintain the current version of the script in the meantime, then you should consider using a branch.

If it is the former (creating a new branch), what might such a new branch typically be called?

Typically a new branch is named after the feature that is being added by it. So in the previous example the branch might be called "HTML5 Home Page Rewrite".

Nathanael
  • 827
  • 4
  • 11
  • Thank you, Nathanael. To clarify that I understand, you're saying that if I need to actively use the current version of the script (which I do), then a branch is a good idea because it preserves the current version without causing disruption? Also, your example of the web page rewrite is good, and I mostly understand it, but I struggle to understand the part where you say `...with a branch you can just commit your change to the branch, switch back to master, fix the bug and commit it, then resume working on the rewrite in your branch.` What is meant by this exactly? – MusTheDataGuy Jul 11 '18 at 16:15
  • @MusTheDataGuy: If the current version of the script is going to be used and therefore there is a chance you need to make updates to it before your rewrite is complete, a branch would certainly be useful by minimizing the disruption. It allows you to commit your rewrite changes freely without concern of affecting the original version. – Nathanael Jul 11 '18 at 16:16
  • Right, I think I understand. Going by what you say, does that mean that I can work on a version on each branch simultaneously and one won't affect the other unless merged? Also, using the advice you have given, I expect that a branch named `rewrite` might be appropriate here, correct? – MusTheDataGuy Jul 11 '18 at 16:18
  • @MusTheDataGuy: I edited my post to make the website example more clear. Yes, branches are totally independent until merged. I would recommend a bit more detail than "rewrite" for the name, such as "Rewrite for feature X" or just "Feature X". – Nathanael Jul 11 '18 at 16:23
  • Fantastic, I appreciate your time and thank you for your help with this Nathanael. – MusTheDataGuy Jul 11 '18 at 16:24