Just find the right flow

Branching models and workflows for SCM systems
22
Jan

Just find the right flow – Branching models and workflows for SCM systems

Source control management systems (SCM) are a part of the standard equipment of a software developer. Despite their daily usage, there are still many helpful techniques that are largely unknown. There are quite a lot of SCM tools [1]. This article refers to the freely available, distributed SCM tool Git that has a very high relevance due to its widespread use. And with only a few adaptations many of the here presented practises can also be applied to other solutions as well, for example, subversions.

While TortoiseGit is quite intuitive, SmartGit requires a bit of training time. SmartGit provides an extensive set of functionalities and supports professional requirements in handling SCM systems with well-structured wizards. Another charming detail is that SmartGit is developed in Germany. From my personal experience I can confidently say that both applications complement each other perfectly. TortoiseGit is a very good choice for developers for corporate use, while configuration and build managers quickly feel at home with SmartGit. We want to leave it at this point and turn to the other terms that play an important role.

 

Expressions

Most developers will certainly be familiar with the following explanations. However, it is always useful to take a more detailed look at the necessary context. Let’s start with a short explanation of what a revision is. Each commit represents a so-called change in the repository. In order to be able to distinguish between the different changes, one generally speaks of revisions. In subversions, the individual revisions are identified by a simple incremented revision number. Git has a slightly different strategy: Here, the revision number is a hash. This introduced a little bit more of flexibility for branching and other activities. The revision is not to be confused with the release. A release is a defined set of functionalities that is to be implemented more or less successfully. This means that a release can consist of at least one, but also any number of revisions.

Another aspect concerns the internal organization of a repository. Over the years, terms of the Apache subversion environment have established themselves here. Even if they cannot be transferred exactly to the distributed world of systems like Git or Mercurial, there still is some overlap. The main development branch, for example, is called trunk. In Git, the details are a little bit more complicated. The main development branch in the local copy of the repository is here referred to as the master. However, on the server this branch is called origin. These conventions are default values that can be changed, but it is less recommended.

Selected revisions, e.g. releases, are of particular interest. You can bookmark them, in order to find them very quickly, if they are needed. These bookmarks are also described as tags. In this context, the branching that split of from the main development branch, and are referred to as branches, should also be mentioned. It is, of course, possible to create any number of branches from a branch and to merge them later on. The unification process is called merging, whereby we also want to advance to the next point.

 

To share and rule

Even though sharing may be a rather trivial process, it can make it almost impossible to merge the parts at a later point in time, if carried thoughtless. The standard procedure is shown in figure 1. A basic assumption in this figure is a release process that takes semantic versioning into account. The post-fix SNAPSHOT is an artifact under development. The main development branch starts in version 1.0 and is continued after a release for version 1.1. If bugfixes are required for the previous version after the incremented version 1.1, a branch is created by the revision that represents release 1.0.

The corrections are made in this branch, there is no new functionality introduced in version 1.0. After a successful correction, the result is transferred from the branch back to the trunk again. This procedure prevents new functionalities of version 1.1 from being transferred to version 1.0 unintentionally. If one follows the strategies that are presented here, many activities can be automated to the greatest possible extent.

In build environments with CI servers, you do not create new branches until they become necessary. This reduces the administrative effort of the CI infrastructure. However, this statement does not apply to developers. The easy creation and discarding of branches is also one of Git’s strengths. We want to engross the thought of this a little bit with the branching models. The branch-by-release model, for example, described in figure 1 has a deficit that does not take the development of parallel functions into account. So-called feature branches become necessary when the function that is to be implemented cannot be completed during a release cycle. Current revisions can be transferred from the trunk to the corresponding feature branch at any time, as required. A merge from the feature branch to the trunk occurs earliest for the release for which the feature was intended. In the case of complex and risky functions, one tries to delay the time of a merge as much as possible in order to not have to perform a rollback in case of problems and delays. A test run of such a project can also be carried out locally via an integration branch to see how well the individual fragments interact.

In order to be able to maintain the right overview at all branches, it is also necessary to agree on a nomenclature for the respective branches. A tried and tested methodology is the naming of feature branches according to the feature name itself, e.g. FEATURE_DocumentParser.

Every release is to be tagged through the release number, e.g. Release_1.0. The possibly necessary bugfix branches of existing releases only get a name after the version number with major and minor. The third section of the version number with the bugfix version should not be included in the name of the branch. An example of this is version_1.0, for the name of the bugfix branch. So the branch can be used as often as you want for future bugfixes. This ensures a good transparency.

 

Fig. 1: Branch-by-Release-Modell

Fig. 1: Branch-by-Release-Modell

 

 

Switching the workspace

The branches topic also includes a methodology for securing one’s own work area in order to switch to another branch and to make corrections there if necessary. Many developers prefer a somewhat cumbersome way. They have several workspaces for different branches on their workplaces and then change them if necessary. This is practicable, but takes up a lot of disk space. If the source repository has been kept compact, the transfer times for changing a branch are very short.

Git’s built-in stash function creates a fresh branch and transfers all changes to the workspace to save them. Then you can use the rebase command to switch to any branch, perform the necessary work, and return to the previous state after successfully transferring the changes to the Git. Git allows you to create multiple stash states. So it is necessary to give them unique names. Therefore a combination of STASH.<Branchname>.<User-Workspace> is recommended.

 

Workflow

In addition to the various options for creating branches, there are also alternative workflow concepts. Especially in open source projects it is necessary to keep a large number of committers in check. Here, quality, qualification and style do diverge greatly. There are different practises for how developers can implement their changes in the codebase, in order to ensure its quality.  The commercial tool IBM Synergy, for example, offers various workflows and corresponding role models in the standard procedure. The simplest workflow provided by centralized systems (SVN) is the push- and-pull workflow. Each developer transfers their changes directly to the repository without any additional controls instances. This carries the risk of the codebase drifting to the state that it cannot be compiled, which leads to a build fail on the corresponding CI server.

There is a so-called dictator workflow, in order to avoid such an error. All developers transfer their changes to a temporary repository, and a build manager (dictator) then checks if the changes are accepted. If a developer’s commit is accepted, the build manager then transfers that commit to a reference repository from which the entire team obtains the changes. The pull-request-process offered by GitHub also works with the same according pattern. In the case of large teams such a procedure can overload the workload of the building manager. In order to mitigate this effect, the lieutenant-dictatorship-workflow was devised.

It establishes an intermediate layer, so-called lieutenants. These preselect commits that were transferred for a small developer group and rejects them in the event of quality defects. Especially in Git there are also simple ways for developers to exchange code states without affecting the main development branch. One possibility is to create a branch that is later discarded. Alternatively, there is also the more complex possibility of a detour via an additional exchange repository.

 

Cherry picking and other useful things

A very practical function for dealing with branches is the cherry picking. Here, several revisions from another branch can be collected and applied to the main development branch. If the merge affects several revisions that are distributed across different branches, a separate cherry pick with the selected revisions must be performed for each branch. A small workaround for the fact that Git does not place empty directories under configuration is to create an empty text file called placeholder or gitkeep. This is especially necessary when initiating a repository to provide the necessary project structure for the development team.

The ability to undo revision is also very useful. In case of an unsuccessful transfer, the corresponding revision can be reversed afterwards with a revert and a corrected variant can be provided. Besides deleting branches, this is one of the possibilities that can be realized more easily by using hashes as revision numbers than with an auto incremented number.

 

Conclusion

We could see that SCM systems, and especially Git, have some advantages that, if known, have a very positive effect on productivity. Certainly the whole topic is far more extensive than it can be reproduced in such an article. Nevertheless, there is the hope, that the here given explanations have poked at your curiosity about the possibilities of source management systems. So, we could see that these tools offer much more and that they aren’t just pure data silos.

 


Sources

[1] Comparison Version Control: https://en.wikipedia.org/wiki/Comparison_of_version_control_software
[2] TortoiseGit: https://tortoisegit.org
[3] SmartGit: https://www.syntevo.com/smartgit/

Stay tuned!

Behind the Tracks of IPC

PHP Development
Best Practices & Application

Web Development
Web Development & more

JavaScript Development
All about JavaScript

Agile & Culture
Agility has become mainstream

Architecture
Concepts & Environments

Web Security
All about Web Security

Testing & Quality
An overview of the most important topics

DevOps
DevOps is a philosophy