Unit tests, static code analysis, writing documentation, provision of versioning and build artifacts – those are some of the subtasks of common continuous-integration-systems:
– Continuous Integration
– compile source code
– dynamic software tests (unit tests, TDD), incl. Lint
– static software tests
– acceptance testing (Behaviour Driven Development, BDD)
– load testing
– explorative testing
Unit tests are one of the most important cornerstones in automatizing software testing. If dev teams are following the principles of Test-Driven-Development (TDD), these tests are written in iterations within the development process. TDD is characterized by developing tests not just after the code was implemented, as it was usually done in waterfall- and V-models of development. TDD requires the tests to get written first. Requirements get implemented only after the tests were written, until the (unit-) test for a specific requirement doesn’t fail anymore.
The approach to developing unit tests and units can be described like this: In TDD you work in iterations, similar to those you know from Scrum. Unit tests and the respective units to be tested are developed in parallel within small, repeated micro-iterations of preferably not more than a few minutes. Within these iterations three steps are usually executed: At First, a test for the desired zero-bug behaviour is written – naturally failing at the beginning. Second Step: unit code is adapted with the least possible effort till the test doesn’t fail anymore. In a third step, the code just implemented will be refactored immediately in order to, for example, eliminate repeating code parts or abstract parts if necessary. There is no risk in doing so; you can make sure the code is still working right by re-running the corresponding test anytime.
These steps need to be repeated until the desired function of the unit is fully implemented. Incremental and evolutional design and improvement of software are facilitated by developing code like this. Unit tests guarantee that requirements are still met and executed correctly after each improvement.
To set up a Continuous Integration System, it is best to have devs and ops work together. As a basis for the following explanations we assume Jenkins was set up according to the necessary steps for Continuous Integration.
Continuous Integration for PHP
In PHP environments, the CI subtask of compiling sourcecode mentioned above is omitted. Just execute Lint to check for syntax errors. We will use PHPUnit for dynamic software testing in our PHP environment in combination with a corresponding plugin for Jenkins that helps to test code coverage and visualizes reports. Static software tests are used to test styles, measure complexity and couplings. Tools you can use for this include: PHP_CodeSniffer, PHP_Depend, PHPMD and PHP CPD. Keep in mind, same as with PHPUnit those tools are only of any help, if Jenkins is provided with the results, including history and trend. In such cases, the plugins listed above can be quite helpful. The plugins can also be set up to define threshold values which the results need to be in compliance with, marking a build as failed if there are just too many formatting errors found. Writing documention tends to be forgotten in many projects; therefore it is recommendable to have docs generated automatically within the CI chain. phpDox is a nice solution for this problem, as it is able to integrate the results of static code testing in the documentation.
Through shared understanding of this CI process and joint work of developer and operations teams, trust in deployed software will increase not just with the developers but also with the operators, who will be responsible for running the software in production. Part one of CI/CD-chain can be covered by Jenkins and is primarily run and used by developers.
We already executed some automated tests in step one of the Continuous Delivery process – the step that is called Continuous Integration: unit tests, also called white box tests. For the next step in the Continuous Delivery process, we will execute black box tests via GUI or API. The goal of this step is the validation of our software by stakeholders. This is where acceptance tests are fundamentally different from unit tests (which are a tool for developers with aforementioned objectives). Unit tests are transparent for stakeholders validating features based on acceptance tests. If we followed the approach of Behaviour-Driven-Development to formulate software requirements as text, it would be possible to transfer them into automated acceptance tests. Behaviour-Driven-Development, therefore, is a way to implement acceptance tests for stakeholders. However, acceptance test are not synonymous to GUI-based acceptance tests. The BDD approach can be executed without GUI. Codeception’s approach to implementing acceptance tests is geared towards BDD. Hence, it makes test results readable by stakeholders. On the other hand, Behat/Mink offers a pure BDD approach – including implementation.
Acceptance tests usually get executed in DevOps by Devs. There are two approaches to these kinds of tests: testing against a defined API and testing against the GUI. It depends (at least partially) on your stakeholders area of expertise and level of understanding which approach should be used to which extend. While tests run against the GUI are generally speaking easier to understand, they also tend to be substantially more fragile than those run against the API, as APIs generally stay more stable throughout development than a given GUI, which might change quite often. Even marginal changes of identifiers in a GUI can make tests fail and create a need for adapting the test. The amount of maintenance work created this way should not be neglected.
Acceptance tests in Jenkins can be visualized using tools like Selenium and Codeception. Their setup is a bit more complex, at least compared to the first part of the CI chain. You need to run a Selenium Server that accepts commands from Codeception and forwards them to a pre-defined browser like Firefox. Firefox and further browsers need to be available, followed by a framebuffer. In an environment without a GUI, the framebuffer is used by the browsers as an output display instance. In contrast to dynamic software tests using PHPUnit, BDD tests include more parts of the system – or rather: encompass the whole system interaction, because that is exactly what we need to test now. You cannot use mocks anymore, like those commonly used in unit tests. However, there is no need for the full system to be physically distributed across servers; it can be provisioned by Docker. DevOps proves to be a fantastic approach in this setup and collaboration of these two departments helps to set up a realistic projection of the final system in test environment.
Collaboration of devs and ops is of crucial importance for our next step. These kinds of tests are used to determine the behaviour of a new piece of software in the way of performance and flow-rate/throughput, without testing it in production. There is one exception though: if a blue/green-approach (see below) is chosen in rollout, there are two identical systems and the tests can be run close to production. This is rather a special case though; thereby stressing the importance of devops in this step.
If the devops team succeeds in providing a test environment which is comparable to the production environment, these test results are significant. Ops can use the same tools as used in production to evaluate results of capacity tests. The ELK-Stack and monitoring with Nagios support these kinds of tests and help narrowing down bottlenecks indicated by the test. Load/capacity tests are best done automatically in Continuous Delivery chains; devops can utilize tools like JMeter or Gatling to do so. For example, Gatling was designed to simulate real user behaviour, making it an ideal choice to simulate load in capacity tests.
Explorative testing seems to contradict the idea of automation in Continuous Delivery. This kind of testing has to be done manually, resulting in lots of work and higher costs; nonetheless it makes sense to execute these kinds of tests. Business requirements of a piece of software are best approved by stakeholders in this way. It is recommendable to document the style of execution or rather the workflow of the stakeholders involved for later use. The Selenium-IDE mentioned before is suitable for this purpose.
In the last step of our Continuous Delivery chain, the collaboration of the devops team is especially important. A new piece of software that successfully passed through all former steps is now brought to production. Errors are prone to happen and need to be fixed quickly; the process in itself needs to be monitored. Different strategies can be utilized for the rollout, minimizing the risk of downtimes or at least making them easier to handle.
While executing a classic rollout to the productive system, it is possible to do a rollback in case of errors. This again can lead to problems though – especially in cases when data became inconsistent in DBMS. If a rollback isn’t possible anymore, the productive environment has to be stabilized with a hotfix via Roll Forward. Risks carried by both Rollback and Roll Forward can be minimized by usage of extra resources. When working with blue-/green-Rollout, there are two or more identical productive systems in use. New software gets rolled out to an inactive system and will subsequently be launched actively. This is a very cost-intensive rollout strategy, because there always has to be at least one productive system left unused. A similar approach without underused systems is called Canary Releasing: New software gets rolled out to a limited number of productive servers first and will be tested/monitored there. If no errors occur, the same version will be rolled out to the rest of the system; if there are errors, the affected systems get taken out of production environment.
Log files are an important part of monitoring software in production – and a classic domain of ops. Log data serves both ops for error state analysis in the productive system and devs to find bugs in the software code. Log files are usually distributed through n-instances and m-software products (like Apache, NGINX, MySQL, …). To make those reasonably usable and provide the data needed for evaluation, you should collect and index the files in one place. ELK-Stack is a software-stack used here: E stands for Elasticsearch, L for Logstash and K for Kibana. Logstash collects and parses log files, Elasticsearch persists and indexes them, Kibana presents the reports in a comfortable frontend. Elasticsearch is a full text search engine based on Lucene, enabling you to search through almost all kinds of data in a performant way; in ELK-stack it is the log files which are analysed.
Besides evaluating log files, ops is responsible for continuous monitoring of the whole infrastructure and all processes running there. Monitoring tools like Nagios can be used for this. Nagios monitors both the hard- and software and reports possible outages to predefined locations. Collaboration of dev and ops is especially recommendable for systematic monitoring in production environment. Those parts of code that did light up in capacity tests before should be subjected to more intense monitoring and supplied with rapid alert systems.
By using Codeception and Jenkins one part of the CD Chain in PHP can be realized; Codeception offers a good compromise to execute acceptance tests close to BDD-standards. A pure BDD-approach in PHP is realized by Behat/Mink. Utilization of Docker containers offers a way of splitting tests. Combined with Selenium Grid this can be an approach to parallel execution. Docker is also suitable to version a finalized build and target a roll out, in favour of the DevOps approach.