The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
CI/CD has been gaining a lot of attraction and is probably one of the most talked topics for the novices in DevOps. With the availability of CI/CD tools available in the market, configuring and operating a CI/CD pipeline has become a lot easier than what it was 5-6 years ago. Back then, there were no containers and the only CI/CD tool that dominated the sphere was Jenkins. Jenkins provided you with a task runner, so you could define your jobs to run either sequentially or in parallel. Today, the scenario is different. We have numerous CI/CD tools available in the market, which provides us with added features and functionality in comparison to Jenkins. One such renowned CI/CD tool is GitLab CI and that is precisely what we will be covering in this article. In this article, we will configure a CI/CD pipeline with GitLab CI/CD and execute Selenium testing over it through LambdaTest. Basics of CI/CD CI/CD is a collection of best practices followed to ensure you are delivering product updates to your web application on a consistent and reliable basis. Your web application is bound to grow with every sprint that is taken into a new release cycle. Initially, you may have a small team responsible for code changes in your web application. In such cases, you wouldn’t mind doing everything directly, you build the code, you test it yourself, and deploy it to the production. However, as your team grows, there will be a lot of interaction points and the probability of error increases as you try to migrate all of the code changes from one staging environment to another. This is where the CI/CD pipeline plays a pivotal role. Any successful business running online is highly-dependent on how their CI/CD pipelines are configured. According to High Scalability: Uber is now in 400 cities and 70 countries. They have over 6000 employees, 2000 of whom are engineers. Only a year and a half ago there were just 200 engineers. Those engineers have produced over 1000 microservices that are stored in over 8000 git repositories. If you observe how fast a business can grow, then you can imagine the challenges Uber might have to come across to coordinate with 10x engineers down the road, in just a year and a half, had they not incorporated a CI/CD pipeline. In today’s world, it would be extremely hard to imagine a web application that is scalable in terms of speed and consistency without following CI/CD best practices. Now, what are CI and CD? CI refers to Continuous Integration and CD implies Continuous Delivery. Combining both can achieve continuous deployment. Let us look at what they mean. What Is Continuous Integration? In traditional SDLC models, developers would migrate new features into an environment one-by-one in isolation. This created issues when you have multiple developers working over multiple features. Continuous Integration is a practice that ensures developers are able to commit numerous changes to the main branch of your web application through a shared repository, in a systematic manner. By leveraging the practice of Continuous Integration, your developers can integrate code around hotfixes, product enhancement, etc., into a shared repository, multiple times a day. That way, your overall go-to-market launch can accelerate, allowing you to be agile. If you have given edit access to GitHub repository to developers in your team, you only need to ensure the developers are following best practices, code styling, and, most importantly, the test cases are not failing. As long as these requirements are fulfilled, you shouldn’t disallow anybody to check in your code. This will help your company scale continuously. What Is Continuous Delivery? Continuous Delivery only happens after CI is performed. As the name suggests, the practice Continuous Delivery ensures you have an automatic pipeline configured to deploy code changes from one staging environment to another. Continuous Delivery includes all the steps necessary to make your software deployable. This includes running comprehensive tests, quality assurance using testing tools, execution of builds, code signing, documentation, and deployment to pre-prod or user acceptance environments. Don’t Confuse Continuous Delivery With Continuous Deployment Think of Continuous Delivery as everything except the deployment. You prepare deployment but you don’t actually deploy it to the production servers. You leave it to human intervention steps that will ensure when and where to deploy. Continuous Delivery is suitable for teams where Continuous Deployment is not required. However, Continuous Deployment is a practice that can only be implemented if you have a well-defined migration system set up, which makes it infeasible for organizations with less employees on-board. Which brings us to our next question. What Is Continuous Deployment? Continuous Deployment actually follows up with Continuous Delivery. It is an extension of Continuous Delivery. It takes Continuous Delivery a step further to a stage where the deployment for the new release of the version on the production is conducted automatically. The only requirement for Continuously Deployment is that the process, the checks and tests set up, guarantee a crash-free experience. Now, since it’s a completely automated system, it’s imperative that you spend more time developing very strict test cases because, here, you don’t have any chance for manually reviewing your the migration. Once it’s gone; it’s gone. Which is why Continuous deployment isn’t feasible for all companies. Continuous Deployment should have the strictest rules possible before deploying the code because of the process being a fully automated system without any human intervention. Being the last step in the chain of the automated pipeline production, it’s imperative the checks and tests, at this level, are the strictest and anything less than a 100% should be rejected without any leeway. In spite of all the benefits that comes with Continuous Deployment, a team should validate the requirements and only adopt Continuous Deployment if the development environment, production sensitivity, and the test system allow seamless adoption. Keep in mind, if the systems in the place are not mature enough, then the deployment might prove to be a catastrophic one for any team. Which is why most teams go with Continuous Delivery only and there’s no harm in that. It totally depends on what are you building and how critical it is. There’s no hard and fast rule that you should use Continuous Deployment. What Is GitLab CI/CD? GitLab has an excellent CI/CD offering for projects hosted on GitLab and other git providers. Using GitLab CI/CD, you can incorporate all three stages we discussed: Continuous Integration Continuous Delivery Continuous Deployment What makes GitLab CI/CD powerful is that it allows you to host your Git repository to any of the other Git providers, such as GitHub, and you can still harness it’s CI/CD system. You don’t even have to change your Git provider to use GitLab CI/CD. The only requirement to run CI/CD is the presence of a special GitLab CI YAML configuration file. GitLab CI YAML file contains all the instructions and data required to run different CI/CD pipelines. There are plenty of customization options available to shape the pipelines according to custom needs. Note: .gitlab-ci.yml is version-controlled and placed in the repository. This allows the old versions of your repository to build successfully, making it easier for your team to adopt the CI practice. Reason being, if the GitLab CI YAML is placed in the repository itself, it means you have now put the logic of CI/CD into your repository. Making you free from the worries that your CI/CD system might fail and you might lose your data. Now, wherever a code lives, your CI/CD is present there, making it simpler to shift from one hosting environment to another, as long as it uses the same pipeline. That way, your team can easily make use of CI branches as special different pipelines and jobs and you have a single source of truth for all CI/CD pipelines. What Are GitLab CI/CD Environment Variables? Env variables are dynamic-named values that can be used to make CI/CD pipelines completely dynamic and parameterized. In general, it’s always the best practice to keep removing hard coded values and use environment variables to make the jobs portable and provider agnostic. Specifically, GitLab has a huge list of predefined variables that can aid to build robust and flexible CI/CD pipelines. The most commonly used and important variables comprise of: CI_COMMIT_REF_NAME CI_COMMIT_BRANCH CI_COMMIT_TAG CI_EXTERNAL_PULL_REQUEST_IID These variables allow pipeline shaping according to different git branches and IMO. This provides great flexibility in differentiating jobs based on the environments. It is always better to use as many environment variables as possible to make your jobs customizable and flexible. What Are GitLab Cached Dependencies? Every CI/CD job requires some kind of building phase where the est target is built using 3rd party dependencies. Depending on the stack, these dependencies are fetched using plugin managers, module importers, etc. The common pain point in building with 3rd party modules across all languages is that it takes a lot of time to fetch dependencies from 3rd party sources and compile them. Imagine doing this process over a hundred times a day for multiple projects and calculate the time and resource wastage it incurs. Not a pleasant picture, right? If there was a way to cache these built dependencies and use these cached dependencies for multiple pipelines, it would make the CI build much faster and reduce bandwidth wastage and will unclog the CI pipelines so the same Infra can be used for much more builds. GitLab’s cached dependencies allow you to exactly do this straight out of the .gitlab-ci.yaml file. It’s as simple as setting a cache dictionary in a YAML file and key attribute. Just ensure you use the same key in all the jobs where cached directory is required. Common practice to ensure cache between branches is to use git bases environment variables as cache key. For example, CI_COMMIT_BRANCH can help you utilize cache whenever a job is run for a branch. GitLab CI/CD provides powerful primitives to invalidate cache. This can be done via UI or by clearing the cache key. An extension: You can optionally fetch dependencies and build them only to package manifest file changes. This is superior to using cache always. For example, only fetching Node.js dependencies whenever the package.json changes. How To Trigger a CI/CD Pipeline GitLab CI/CD allows you to trigger your pipeline using the following ways: Git-Based Triggers Webhooks/Crons Manual Intervention Git-Based Triggers The easiest way to trigger CI/CD is to perform any git based operation, such as pushing in a branch, merging pull request, or creating a tag for which handers are mentioned in the gitlab.yaml file. This is the most frequently used and most convenient method to trigger CI/CD. Webhooks Webhooks provide a convenient method to trigger CI/CD on demand by making an HTTP post call to specialized URLs. This is very useful for event-based triggering where the webhook can be called whenever a required event occurs: You can setup a cron to run nightly builds on GitLab by hitting a curl request on the webhook URL at the desired interval. Any other event can be used as long as the webhook can be hit in response to the event. Manual Intervention GitLab has a provision where manual intervention by authorized users can be requested to continue the next steps of the job. In the gitlab.yaml, you can mention a part of the pipeline to run only after somebody with access in the team can resume the job from UI: This feature enables constructing Continuous Delivery pipelines that we have discussed already. Everything, except deployment, can be automated and, only after the manual intervention, the deployment can take place. Exclusive Parameters for GitLab CI/CD: Only and Except Only and Except are two parameters that set a job policy to limit when jobs are created. These constructs are the nut and screw of the GitLab CI/CD pipeline that allow customization and conditional execution of jobs to shape it according to your own needs: Only specifies the names of branches and tags for which the job will trigger. Except specifies the names of branches and tags for which the job will not trigger. Only and Except are inclusive in nature and allow the use of regular expressions. This is a really interesting feature and makes any kind of customization possible by playing over strings. Only and Except allow us to specify a repository path to filter jobs for forks. Some of the interesting values that Only and Except take are: branches tags merge_requests This follows the precondition that GitLab CI/CD is supported for a git-based application. If you use multiple keys, under Only or Except, the keys will be evaluated as a single conjoined expression. That is: Only: means “include this job if all of the conditions match.” Except: means “exclude this job if any of the conditions match.” With Only, individual keys are logically joined by an AND. With Except, is implemented as a negation of this complete expression. This means the keys are treated as if joined by an OR. This relationship could be described as: There is a huge list of attributes that can be used by Only and Except conditions. I really recommend you check out those values. Executing Selenium Testing Scripts for GitLab CI/CD Let’s apply what we learned here. The project we’ll be using today for this GitLab CI/CD tutorial is the HourGlass 2018, which is a MERN (Mongo Express React and Node.js) stack application. It’s a simple time management application using the best practices available at that time. Unfortunately, in the JS world, those best practices change every month. Some of these might have been updated, but most of these are still relevant and this is a full production scale and production style development repository, with all the best practices available. Cloning the GitHub Repository Make sure you clone the HourGlass GitHub repository to your GitLab CI instance. After cloning, route to the master branch and check the GitLab YAML file: image: node:10.19.0 stages: - install_dependencies - build - test - deploy install_dependencies: stage: install_dependencies cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ script: - yarn install only: changes: - yarn.lock #continuous integration unit-test: stage: test cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull script: yarn test # Only runs in case of continuous delivery integration-test: stage: test cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull services: - mongo script: - echo $MONGO_URI_TESTS - yarn test:integration only: - merge_requests - prod lint: stage: test cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull script: yarn lint e2e-test: stage: test services: - mongo cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - build/ policy: pull script: - node node_modules/node-static/bin/cli.js build --port 5000 --spa & - yarn start-prod-server - node node_modules/pm2/bin/pm2 logs & - sleep 3 - yarn run test:e2e dependencies: - Build-client only: refs: - tags except: - /^((?!release).)*$/ Build-client: stage: build cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ - build/ policy: pull-push script: yarn build-client artifacts: paths: - build Build-docs: stage: build script: yarn docs cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull only: - merge_requests - prod Build-storybook: stage: build script: yarn build-storybook cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull only: - merge_requests - prod deploy-backend: stage: deploy image: ruby:latest cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull script: - apt-get update -qy - apt-get install -y ruby-dev - gem install dpl - dpl --provider=heroku --app=$HEROKU_APP_NAME --api-key=$HEROKU_API_KEY dependencies: - e2e-test when: manual allow_failure: false only: refs: - tags except: - /^((?!release).)*$/ deploy-frontend: stage: deploy cache: key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR paths: - node_modules/ policy: pull variables: REACT_APP_API_HOST: $REACT_APP_API_HOST_PROD script: - yarn build-client - node ./node_modules/.bin/surge -p build/ --domain $SURGE_DOMAIN when: manual allow_failure: false dependencies: - e2e-test only: refs: - tags except: - /^((?!release).)*$/ Configuring the CI/CD Pipeline in GitLab CI To trigger our CI/CD pipeline, we will need to edit the README.md file. We can make these changes directly through Web IDE. We can go ahead and add a sample comment and hit the commit. Make sure to perform the change in the master branch. Once you commit the code, you can jump into the CI/CD section over GitLab and notice the job executed successfully. You can find the following in running state: Build-client Linting Unit test Now, raise a pull/merge request to the production branch. For the demo, we’ve kept the production branch as the main branch of the Git repository and not the master. Now, we need to merge from master to prod. Note: By default, before you submit the merge request, you will find the checkbox ticked to “delete source branch when merge request is accepted.” Deselect that checkbox. GitLab CI/CD won’t perform a merge unless the pipeline succeeds. Which means it won’t pass the changes until your test scripts are done executing themselves. That is a great feature to help you pass a stable release. Also, you don’t have to wait for the test scripts to be completed so you can perform a merge. All you need to do is click on the button to “Merge when pipeline succeeds,” and the GitLab CI will take care of the merging post your test-script execution. All the jobs, by default, will be executed in parallel, unless you specify them otherwise. This greatly reduces the overall time consumed by the CI process. You can see which builds are being passed and what jobs are running after you hit the merge request. Now, you may notice that an integration test would be running in the detached pipeline, along with the previously discussed jobs in the latest pipeline, i.e., linting and unit testing. Integration testing will ensure your backend and frontend are in good synch and your APIs are responding well. Creating Tag for a Release To create a release, we will need to generate a tag through GitLab CI/CD. Git tags are extremely useful if you want to bookmark some important changes. Now, create a new release from the master branch. You can add any message over there to help you remember what this main release contains or maybe you can add some build release checkpoints as well. Then you can create the tag. As soon as the tag is created, a process will begin to execute. Now, you can see the CI/CD and notice the new pipeline on the top is created in response to the tag creation event. In this pipeline, you will notice that there are four stages. The first one is making sure all the dependencies are installed. This is crucial as you are now creating the final build and you don’t want any confusions, bugs, warnings, to be ignored in that. The next stage is where the build-client is being established. Third, you will notice three kinds of tests that will be running, i.e., unit test cases, linting, and end-to-end tests. End-to-end testing is where you will be incorporating your Selenium testing scripts to perform cross browser testing. You will then have those test scripts executed over an online Selenium Grid offered by LambdaTest. At last, when Selenium testing scripts would be passed and the pipeline will move onto the next stage, where the deployment over the backend and frontend stage will take place. Note: These stages are manually triggered. You will find a play button once the pipeline goes to that stage. That’s it! Once you hit the play button, you can deploy the changes to its respective environments. You can go ahead and validate your Selenium testing scripts over the LambdaTest Automation Dashboard. As you can see in the recorded video of your Selenium testing, this is the same HourGlass application we have been trying to deploy. Note: The web application will be accessible over the localhost using 127.0.0.1:5000. This is the step where you are running a static server to host your frontend file and separate backend server. Later, you can run an end-to-end test using LambdaTest Tunnel. Continuous Testing Using LambdaTest Selenium Grid As you noticed, we ran our script over an online Selenium Grid of LambdaTest. What was the need? Well, doing so can help you quickly and automatically validate your code changes into the staging environment where they are being migrated through your CI/CD pipeline. That way, you are continuously integrating the code for new features, continuously deploying them from one staging environment to another, and now, you are able to continuously test those code changes too. Every time a code is committed to a branch, for which you have your Selenium testing scripts ready, that piece of code will be validated for browser compatibility testing. Allowing you to accelerate the test cycles with the help of continuous testing. Now, let’s ponder a little about how we ran Selenium testing over a locally hosted web application through LambdaTest. Here is the Selenium testing script used for the Jest framework to perform automated browser testing: const webdriver = require('selenium-webdriver'); const { until } = require('selenium-webdriver'); const { By } = require('selenium-webdriver'); const lambdaTunnel = require('@lambdatest/node-tunnel'); const username = process.env.LT_USERNAME || 'Your_LambdaTest_Username'; const accessKey = process.env.LT_ACCESS_KEY || 'Your_LambdaTest_Access_Key'; const capabilities = { build: 'jest-LambdaTest-Single', browserName: 'chrome', version: '72.0', platform: 'WIN10', video: true, network: true, console: true, visual: true, tunnel: true, }; const tunnelInstance = new lambdaTunnel(); const tunnelArguments = { user: process.env.LT_USERNAME || Your_LambdaTest_Username', key: process.env.LT_ACCESS_KEY || 'Your_LambdaTest_Access_Key', }; const getElementById = async (driver, id, timeout = 2000) => { const el = await driver.wait(until.elementLocated(By.id(id)), timeout); return await driver.wait(until.elementIsVisible(el), timeout); }; const getElementByClassName = async (driver, className, timeout = 2000) => { const el = await driver.wait( until.elementLocated(By.className(className)), timeout ); return await driver.wait(until.elementIsVisible(el), timeout); }; const getElementByName = async (driver, name, timeout = 2000) => { const el = await driver.wait(until.elementLocated(By.name(name)), timeout); return await driver.wait(until.elementIsVisible(el), timeout); }; const getElementByXpath = async (driver, xpath, timeout = 2000) => { const el = await driver.wait(until.elementLocated(By.xpath(xpath)), timeout); return await driver.wait(until.elementIsVisible(el), timeout); }; function timeout(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } describe('webdriver', () => { let driver; beforeAll(async () => { const istunnelStarted = await tunnelInstance.start(tunnelArguments); driver = new webdriver.Builder() .usingServer( 'https://' + username + ':' + accessKey + '@hub.lambdatest.com/wd/hub' ) .withCapabilities(capabilities) .build(); // eslint-disable-next-line no-undef await driver.get(`http://127.0.0.1:5000/signup`); // https://hourglass.surge.sh/signup }, 20000); afterAll(async () => { await driver.quit(); await tunnelInstance.stop(); }, 15000); test( 'Signup test', async () => { const nameInput = await getElementById(driver, 'name'); await nameInput.clear(); await nameInput.sendKeys('Mayank'); const emailInput = await getElementById(driver, 'email'); await emailInput.clear(); await emailInput.sendKeys('mybach8@gmail.com'); const passwordInput = await getElementById(driver, 'password'); await passwordInput.clear(); await passwordInput.sendKeys('password'); const cnfPassInput = await getElementById(driver, 'confirmPassword'); await cnfPassInput.clear(); await cnfPassInput.sendKeys('password'); const prefWorkingHours = await getElementById( driver, 'preferredWorkingHours' ); await prefWorkingHours.clear(); await prefWorkingHours.sendKeys('10.0'); const btn = await getElementByClassName( driver, 'LoaderButton btn btn-lg btn-default btn-block' ); await btn.click(); await timeout(2000); const successText = await getElementByClassName( driver, 'registerSuccess' ); const successTextValue = await successText.getText(); console.log(successTextValue); return expect(successTextValue).toContain('Congratulations'); }, 20000 ); test( 'Login test', async () => { await driver.get(`http://127.0.0.1:5000/login`); // https://hourglass.surge.sh/signup // const lnk = await getElementByName(driver, 'li1'); // await lnk.click(); // const lnk1 = await getElementByName(driver, 'li2'); // await lnk1.click(); const emailInput = await getElementById(driver, 'email'); await emailInput.clear(); await emailInput.sendKeys('mybach8@gmail.com'); const passwordInput = await getElementById(driver, 'password'); await passwordInput.clear(); await passwordInput.sendKeys('password'); const btn = await getElementByClassName( driver, 'btn btn-lg btn-default btn-block' ); await btn.click(); await timeout(2000); const successText = await getElementByClassName( driver, 'btn btn-primary' ); const successTextValue = await successText.getText(); console.log(successTextValue); expect(successTextValue).toContain('Manage Time tracks'); }, 20000 ); }); Code Walkthrough There are some tunnel arguments we are using here to work up the LambdaTest Tunnel. There is an npm package that is released by LambdaTest to setup tunnel automatically: const lambdaTunnel = require('@lambdatest/node-tunnel'); What we have done here is, before every test, we are setting up a new WebDriver and this driver is aimed at the public URL of the LambdaTest Selenium Grid Hub. We are using the username and access key provided by the LambdaTest account: const tunnelArguments = { user: process.env.LT_USERNAME || Your_LambdaTest_Username', key: process.env.LT_ACCESS_KEY || 'Your_LambdaTest_Access_Key', }; Then, you provide all the capabilities, such as you want to use the LambdaTest Tunnel over a specific browser, browser version, operating system, with video recording etc: const capabilities = { build: 'jest-LambdaTest-Single', browserName: 'chrome', version: '72.0', platform: 'WIN10', video: true, network: true, console: true, visual: true, tunnel: true, }; We are constructing the URL using the environment variables, i.e., your LambdaTest username and access key: .usingServer( 'https://' + username + ':' + accessKey + '@hub.lambdatest.com/wd/hub' ) Once the remote WebDriver for online Selenium Grid is setup, we are awaiting for the signup page to load: await driver.get(`http://127.0.0.1:5000/signup`); // https://hourglass.surge.sh/signup }, 20000); The below piece of code will ensure tunnel instances will start first and only after that will your Selenium testing script will be executed: const istunnelStarted = await tunnelInstance.start(tunnelArguments); Best Practice After you are done with all the test cases, make sure you delete the tunnel instance. This will help save your concurrency limit available over the LambdaTest Selenium Grid. There is a limited amount of tunnel you can run per count, depending upon the LambdaTest pricing you opt for and if you leave a tunnel in running state, even after your tests are executed. It won’t allow you to use the tunnel in other test cases, so it’s a best practice to close a tunnel after your testing has been completed: afterAll(async () => { await driver.quit(); await tunnelInstance.stop(); }, 15000); Monitoring Logs LambdaTest provides you with an intuitive interface for analyzing the results of Selenium testing scripts. You can get a variety of logs such as network logs, command logs, raw Selenium logs, and metadata. You can also record a video of the entire script execution, along with command-by-command screenshots. You may notice the test has been successfully triggered over the online Selenium Grid of LambdaTest. You can visit the different tabs in the automation dashboard to figure out what went wrong while debugging your scripts. Here is how the logs are provided: Selenium Logs Command Logs Console Logs Eventually, GitLab CI will deploy the backend on Heroku and the frontend on Surge. After opening the URL, you can see frontend is deployed on Serge and my backend is deployed on Heroku. This is done automatically by the GitLab CI/CD pipeline. Now, let’s quickly note down some of the best practices you need to keep in mind for CI/CD before we wrap this GitLab CI/CD tutorial. Best Practices for CI/CD Now that you’ve had a fair-share of knowledge around leveraging GitLab CI pipelines for Selenium testing, I suggest you make notes of these best practices for CI/CD to build better web applications, faster. Build Fast and Test Faster The success of CI/CD systems depends on the execution speed. If CI cycles take a huge amount of time for each commit, developers will find alternate and faster bypasses to get their code integrated quickly. This often involves pathways, which skip tests in favor of optimistic updates. This can cause havoc on production. I think I don’t even need to mention the consequences of integrating untested code. CI/CD Environments Should Be Secured Often ignored, but very crucial to protect your CI/CD environment. It is one the most sensitive pieces of infrastructure to protect as it contains access to your codebase, highly sensitive data and various environments. Furthermore, it is one of the most used systems in a large and high frequency development team. Any outages on CI/CD can cause tremendous loss of productivity and financial losses in the worst cases. CI/CD Should Be the Only Way to Deploy to Production CI/CD pipelines are as successful as the last person using them. All of the effort in developing CI/CD fails if it is not adopted by the team. CI/CD should be strictly the only way to deploy to the prod. In fact, rollbacks should be deployed via the same pipeline. Always Keep Rollback Options in CI/CD Pipelines Ability to rollback a change shouldn’t involve complex procedures. It should be as simple as possible to rollback a change. It’s always better to rollback changes at 3 AM rather than debugging them on production. Fail Early You should run your fastest test early in the pipeline. The idea is to reject the build if it fails any test. Rejecting early saves a lot of time make the turnaround time really small. Run Tests Locally Before Committing to the CI/CD Pipeline CI starts on your local development system. All basic CI tests should first be run on your local system as it is fast, saves time, and conserves CI/CD infra on platform for more critical and later stage pipelines. Tests Should Run in Ephemeral Environment To provide consistent results for CI/CD pipelines it is important that tests run in a fresh state every time. Ephemeral environments are a necessity for making testing idempotent. Containers are a suitable environment as they make it easy to provide a fresh environment. Decouple Deployment and Release As mentioned in the Continuous Delivery introduction, decoupling deployment from the release process makes the release process purely marketing and strategy team decision. This has huge benefits in terms of flexibility and speed. Wrapping Up Kudos!! You have now successfully executed Selenium testing to perform all kinds of checks before the release got deployed. GitLab CI prepared the release process for you and took away all the hassle from doing those last minute checks, by allowing you to integrate with an online Selenium Grid of 3000+ real browsers, ensuring a seamless UI for your web-application. This is how we can actually build a robust CI/CD pipeline for your company or team there. If you still have any questions, feel free to post them in the comment section below. Happy testing!
The pull request (PR) review process, if not set up well in your team, can create a lot of bottlenecks in getting your code merged into the main branch and into production. By adding more context and information automatically to your PRs, you save yourself and your teamwork. Take the scenario of fixing a typo in documentation. If there’s a backlog of PRs that need attention, such a PR may take two days — or longer — just to be approved. Here, learn about continuous merge (CM) with gitStream. gitStream is a tool that allows you to add context and automation to your PRs, classifying PRs based on their complexity. This ensures that a review won't stay in the queue for long as it can be quickly assigned to the right person, immediately approved, or have the appropriate action identified easily. This hands-on article demonstrates how to add gitStream CM to your repository. In this article, you’ll learn: How to configure your repository How to create pull requests (PRs) How to add the CM feature to your PRs Quick gitStream Setup Guide If you’re keen to get all the benefits of gitStream and continuous merge right away, all you need to do is follow these simple steps. If you want to understand how gitStream works, how you can customize it, and more options, it will follow right after. Choose Install for free on gitStream's GitHub marketplace page. Add 2 files to your repo: a) .cm/gitstream.cm b) .github/workflows/gitstream.yml 3. Open a pull request. 4. Set gitStream as a required check. A Comprehensive Guide to gitStream and Continuous Merge Filter functions and context variables are used to effect automated actions, such as adding labels (add-label@v1), assigning reviewers (add-reviewers@v1), and approving requests (approve@v1), among others. Everything is included in a .cm configuration file named gitstream.cm. All instructions to gitStream CM are detailed in the docs found at docs.gitstream.cm. gitStream also uses GitHub Actions to do its work, so you’ll need to add the gitstream.yml file to your GitHub Actions directory at .github/workflows/. The main components to fulfill gitStream’s CM are: The configuration files: gitstream.cm and gitstream.yml The filter functions: Code that tries to check and/or select certain data types from the input for checks during a PR creation The context variables: The inputs fed to the filter functions The automation actions Note: Some steps use Python only for demonstration purposes. It’s not required knowledge. Prerequisites To follow this tutorial, ensure you have the following: Hands-on knowledge of Git and GitHub workings: You must know activities such as creating a repository, PRs, commits, and pushes. A GitHub account Git installed in your working environment You can find and review the final project code on gitStream's GitHub marketplace page linked earlier in this article. Step 1: Set Up gitStream on Your Repo Create an empty repo and give it a name, then install gitStream to it from the marketplace. After installation, you can either: Clone the repository to your environment, or, Create a folder and point it to the repository. This tutorial uses the second option. Create a folder called gitStreamDemo. In this folder, create two directories: .github/workflows and .cm, using the commands in a terminal window as below: mkdir -p .github/workflows mkdir .cm In the .github/workflows folder, create a file called gitstream.yml, and add the following YAML script: name: gitStream workflow automation on: workflow_dispatch: inputs: client_payload: description: The Client payload required: true full_repository: description: the repository name include the owner in `owner/repo_name` format required: true head_ref: description: the head sha required: true base_ref: description: the base ref required: true installation_id: description: the installation id required: false resolver_url: description: the resolver url to pass results to required: true resolver_token: description: Optional resolver token for resolver service required: false default: '' jobs: gitStream: timeout-minutes: 5 # uncomment this condition, if you dont want any automation on dependabot PRs # if: github.actor != 'dependabot[bot]' runs-on: ubuntu-latest name: gitStream workflow automation steps: - name: Evaluate Rules uses: linear-b/gitstream-github-action@v1 id: rules-engine with: full_repository: ${{ github.event.inputs.full_repository } head_ref: ${{ github.event.inputs.head_ref } base_ref: ${{ github.event.inputs.base_ref } client_payload: ${{ github.event.inputs.client_payload } installation_id: ${{ github.event.inputs.installation_id } resolver_url: ${{ github.event.inputs.resolver_url } resolver_token: ${{ github.event.inputs.resolver_token } Next, create a file called gitstream.cm in the .cm folder and add the following code: manifest: version: 1.0 automations: show_estimated_time_to_review: if: - true run: - action : add-label@v1 args: label: "{{ calc.etr } min review" color: {{ '00ff00' if (calc.etr >= 20) else ('7B3F00' if (calc.etr >= 5) else '0044ff') } safe_changes: if: - {{ is.doc_formatting or is.doc_update } run: - action: add-label@v1 args: label: 'documentation changes: PR approved' color: {{'71797e'} - action: approve@v1 domain_review: if: - {{ is.domain_change } run: - action: add-reviewers@v1 args: reviewers: [<listofreviewers>] - action: add-label@v1 args: label: 'domain reviewer assigned' color: {{'71797e'} set_default_comment: if: - true run: - action: add-comment@v1 args: comment: "Hello there. Thank you for creating a pull request with us. A reviewer will soon get in touch." calc: etr: {{ branch | estimatedReviewTime } is: domain_change: {{ files | match(regex=r/domain\//) | some } doc_formatting: {{ source.diff.files | isFormattingChange } doc_update: {{ files | allDocs } In the file, you’ll see the following four automation actions: show_estimated_time_to_review: This automation calculates the estimated time a review to a PR may take. safe_changes: This shows if changes to non-critical components done in a PR are safe, such as document changes. The PR is automatically approved. domain_review: This automation runs to show if a change was made to the domain layer. set_default_comment: This is fired every time a PR is opened and raises an acknowledgment comment to the user that a PR has been created. At the end of the document, there’s a section containing filter functions for the automation actions. The actions are run after certain conditions specified in the filter functions or keys are met. Step 2: Calculating the Time To Review In the first automation, check the value of the etr variable and decide which label to assign to the PR. For more information on how ETR is calculated, check out this blog. Create a file called main.py in the root of your folder. Then, create three folders using the command below: mkdir views domain data Add the following to the main.py file: def show_message(name1, name2): print(f'Hello, {name}. Welcome to the gitStream world') if __name__ == '__main__': print_hi('Mike') Copy the main.py file as is and paste it to the other three folders. Rename them to match the folders’ names (domain.py) for the domain folder. For the dummy documentation file, create a README.md file in the root of your folder and add the following markdown script. # gitStreamDemo Demo Showing How To Set Up gitStream on Your First Repo Now, run these commands to initialize the repository, stage the files for committing, and make a commit, in that order: git init git add . git commit -am “initialization” Next, point the folder to your repository using the command below: git remote add origin https://github.com/<your-username>/<your-repo-name> Finally, push it: git push -u origin main Step 3: Creating the Repository As you may have noticed, there’s a sample bug in the code. In any programming language, you must call the function using its exact name. But in this case, print_hi was called instead of show_message. As a team member or an open-source contributor, you can fix this by opening a PR. First, create a branch called fix-function-call and checkout into the branch using the commands below: git branch fix-function-call git checkout fix-function-call Next, replace the name print_hi with show_message in all the .py files, then commit and push the changes. git commit -am “changed function name” git push --set-upstream origin fix-function-call Now, open your repository in GitHub. You’ll see the following card: Click on Compare & pull request. On the next page, click the Create pull request button. Once the gitStream automation has finished running, you’ll see the domain reviewer assigned tag. Additionally, a comment has been created. Add this Dijkstra’s Shortest Path Algorithm script just below the show_message function in each of the .py files again. These scripts calculate the shortest path for a node in a graph. Commit the changes and then push the code. git commit -am “updates” git push Creating a Safe Change For the final automation, you’ll add text to the README.md file created earlier. Create a new branch and checkout to it. You do so because you’ll need a new PR to demonstrate this automation. git checkout main git branch update_docs git checkout update_docs Then, add this sentence to the README.md file: Continuous Merging is very beneficial to the Open-Source Community. Commit and push. git commit -am “updated the docs” git push --set-upstream origin update_docs When the checks are done, you’ll see a different label with the PR already approved. Help Developers Make the Most of Their Time Reviewing and merging PRs are crucial in contributing to software development and enhancing team productivity. However, being unable to classify PRs by complexity can lead to long wait times or much back-and-forth in the review process. CM remedies this issue by classifying PRs based on the complexity, automating some actions including tagging the appropriate reviewers, assigning them PRs, and approving PRs among others to reduce the backlog. Use gitStream to add CM to your existing repos.
Since the first introduction of the term DevOps, it seems that new 'Ops" related terms pop up as quickly as technology trends. For example: AIOPs: Enhance and automate various IT processes with AI. MLOps: Develop, deploy, and manage machine learning. FinOps: Optimizing and managing cloud cost. DevSecOps: Integrate security into the Software development lifecycle (SDLC). GitOps: Manage and deploy infrastructure and applications (code and configuration) using Git. I bet the next Ops-related term will be ChatGPT-Ops ;-). Finally, an OPs term that has popped up in recent months is APIOps, but what does it mean, especially as APIs are not new and come in many different styles? APIOps is an approach that utilizes the principles of GitOps and DevOps in deploying APIs. Similar to DevOps, APIOps facilitates streamlined modification and automated deployment of API changes into production. Just like DevOps, automation is a key pillar of APIOps, but to be successful at APIOps, you must consider more than just your automation pipelines. You need to adopt the principles associated with CALMS to be successful. Culture You should treat your API as a Product. This means that you need to move away from the technical. An API should not just be seen as a Jira task and should not just be the sole responsibility of software engineers. Your API should have a Product Manager assigned to help make adoption successful. Your product will have a roadmap, a lifecycle, and business success criteria. Automation DevOps teams use DORA metrics (DevOps Research and Assessment) to gauge their performance level and determine if they fall into the category of "low performers" or "elite performers." Adopting DORA metrics will give you an insight into the delivery performance of your CI/CD pipelines to get your API into production. DORA metrics: Deployment Frequency: How often an organization successfully releases to production. Lead Time for Changes: The amount of time it takes a commitment to get into production. Change Failure Rate: The percentage of deployments causing a failure in production. Time to Restore Service: How long it takes an organization to recover from a failure in production. Make sure that when you're looking at the DORA metrics, you include items from an API-centric perspective. For example, introducing a breaking change to your API contract into production should be included in the 'Change Failure Rate' metric, especially if the change is unmanaged. Lean Success means adopting a lean approach to eliminate waste and focusing on delivering value to customers quickly and continuously. If a tree falls in the forest, does it make a sound? Similarly, if an API is not used in production, does anyone care? Don't rush into the implementation of the service of your API; first, make sure the success criteria are known. Implementation should wait until the API has been reviewed and approved by its potential consumers. For example, early feedback from consumers to show an API will address their use case/pain point. Measure Technical KPIs are table stakes for any API program; this would include transactions per second, error rate, latency, and tracking of the SLA of the API that you're providing to your consumers. In addition, you need to include more business-level goals to move to the next level of measuring what really matters. Below are some examples of what can be tracked: RoI (return on investment) KPIs, for example, is your API helping to drive direct or indirect revenue growth, or cost reductions if you hit the nirvana of API reuse, etc. Consumption KPIs: What is the growth trend in your API traffic monthly, or does your API help grow the ecosystem of partners onboarded to your organization? Engagement KPIs: Track the NPS (Net Promoter Score) of your API, or as your API is your product, are you tracking retention and churn? Share Regardless if your API is targeted as private (consumed within an organization), partner (consumed by partners of your organization), or public (consumed by anybody interested in the API), you must have a vehicle to share your APIs and for you to receive feedback from your API consumers. This vehicle would be an internal API developer portal or a public Marketplace where consumers can discover and onboard/register to your APIs in a self-service fashion. Also importantly, for the evolution of your API, you need to be able, as API consumers, to provide feedback on the API so that it can evolve in the appropriate direction. Applying the above DevOps principles to the API lifecycle, APIOps can help organizations improve collaboration, reduce time to market, deliver better customer experiences, and ultimately better business outcomes.
As a software professional handling Infrastructure as Code (IaC), chances are you work a lot with Terraform. When helping new clients use IaC, it is common to simplify things, but managing a Terraform state file is the first challenge you face. Essentially, Terraform state contains sensitive information, which shouldn’t be stored by source control but, at the same time, won’t scale if you have multiple users working on the same Terraform state. The answer to that? Backends. It is important to note that you could store that state file on an S3 bucket and use DynamoDB to manage the locking state. However, this approach will force you to create additional resources, which makes it a complicated option, especially if the client is using GitLab. GitLab recently lowered the entry barrier to integrating Terraform by providing a way to store and manage Terraform state, as well as an easy way to set up a CI around it. In this article, we will explain what a Terraform state file is, how to migrate it to GitLab, and setting up a CI pipeline for it. You can visit our repository here. Table of Contents What Is Terraform State? How To Get GitLab to Manage Terraform State How To Get GitLab to Run Your IaC Through a CI Pipeline Bonus Tip: Infracost Conclusion What Is Terraform State? Terraform records any information about the infrastructure defined in your code via a state file. Written in JSON, it essentially records a mapping from the Terraform code to the real resources created. Below is an example of what a terraform.tfstate would look like. Primarily, every time you run Terraform it will fetch the latest status for its EC2 instance and compare it with your Terraform configuration to determine what changes need to be applied: { "version": 4, "terraform_version": "0.12.0", "serial": 1, "lineage": "1f2087f9-4b3c-1b66-65db-8b78faafc6fb", "outputs": {}, "resources": [ { "mode": "managed", "type": "aws_instance", "name": "example", "provider": "provider.aws", "instances": [ { "schema_version": 1, "attributes": { "ami": "ami-0c55b159cbfafe1f0", "availability_zone": "us-west-2a", "id": "i-00a123a0accffffff", "instance_state": "running", "instance_type": "t2.micro", "(...)": "(truncated)" } } ] } ] } By default, this terraform.tfstate is stored locally where you have your Terraform files, plan, and apply your changes. For a personal project where you are just running some tests; it’s fine, but not the recommended way, here’s why: Stored in a shared location: if you were hosting this state file on your local workstation and had to work with another engineer, things would get complicated. Both of you will have to make sure you are using the latest version of the state and you could run into race conditions if you run a Terraform plan or apply at the same time. Protect sensitive information: A generated state file can contain encryption keys and infrastructure passwords. However, state files aren’t encrypted by default, and storing sensitive information in plain text is a bad idea. Locking: Most version control systems do not support any form of locking, which prevents two team members from running Terraform apply simultaneously on the same state file. This is another reason why we will not see a state file managed by source control. How To Get GitLab to Manage Terraform State With Terraform being considered the standard in cloud infrastructure provisioning, it has been a year or so since GitLab began to offer a way to store and manage your Terraform state. For this reason, we wanted to share the migration process with you as we recently started using GitLab to manage our IaC. For this article, we presume that you are using a local state, and have your state managed with an AWS S3 Bucket or another backend solution. Firstly, you will need to change your backend.tf to use HTTP: terraform { backend "http" {} } Next, you will need to set up four variables in your terminal: 1. PROJECT_ID: You can find this easily by navigating to your repo on the “Project Overview” page. 2. TF_USERNAME: The Gitlab username that has access to the repo you’re working on. 3. TF_PASSWORD: Access token generated from your GitLab user. 4. TF_ADDRESS: URL of the remote state backend. PROJECT_ID="28450092" TF_USERNAME="florianpialoux" TF_PASSWORD="123456789" TF_ADDRESS="https://gitlab.com/api/v4/projects/${PROJECT_ID}/terraform/state/aws-buckets" You may now run the migration command that will move your Terraform state from its previous location to GitLab with the following command: terraform init \ -migrate-state \ -backend-config=address=${TF_ADDRESS} \ -backend-config=lock_address=${TF_ADDRESS}/lock \ -backend-config=unlock_address=${TF_ADDRESS}/lock \ -backend-config=username=${TF_USERNAME} \ -backend-config=password=${TF_PASSWORD} \ -backend-config=lock_method=POST \ -backend-config=unlock_method=DELETE \ -backend-config=retry_wait_min=5 You will need to provide a confirmation by a “yes” so that GitLab can start managing your state file. Here’s an example from a local state to GitLab: Example S3 to GitLab: Now, you can navigate to Infrastructure > Terraform from the GitLab interface and see your state: I noticed for some of the state files I had from S3 will be blank even after using the migrate-state command ran previously. In this case, you can run this: terraform state pull > aws-buckets.json Copy and paste the content from the S3 state and run a push: terraform state push -lock=true aws-buckets.json GitLab supports versioning for your Terraform state file but viewing/restoring older versions through the WebUI will require you to be using a GitLab Premium plan. If not, you will need to make a GraphQL API request. How To Get GitLab to Run your IaC Through a CI Pipeline GitLab provides a Docker image that contains GitLab-Terraform, which is a thin wrapper script around the official Terraform binary. Alternatively, you could use the official Docker image by Hashicorp. You can find more information about the GitLab Terraform Image here. Once the Terraform apply job runs, you will be able to see when the state was used and with which pipeline. Learn more about what our gitlab-ci.yml looks like here. Below, are the variables that will need to be defined on the project level. Bonus Tip: Infracost As you might have noticed, looking at our gitlab-ci.yaml we added Infracost, which allows us to have more control over our cloud billing as it gives you a cost estimate whenever you define a new resource to your IaC. Conclusion Having your Terraform state and CI running on Gitlab is a great way to follow GitOps best practices. They both make a great combination to develop and deploy IaC. Since most of you might be already using GitLab for your repositories, it becomes much simpler to have your IaC under one roof and let GitLab manage your Terraform state by supporting encryption in transit and at rest, as well as versioning, locking, and unlocking the state.
There were times we used to create Jenkins jobs using UI alone. Later, the idea of pipeline as a code was mooted to address the rising complexity with build and deployment jobs. In Jenkins 2.0, the Jenkins team introduced Jenkinsfile to achieve pipeline as a code. If you want to create automated pull request based or branch-based Jenkins Continuous Integration and Continuous Delivery pipeline, the Jenkins multibranch pipeline is a way to go. As Jenkins multi-branch pipeline is fully a git-based pipeline as code, you can build your CI/CD workflows. Pipeline as Code (PaaC) makes it easy to bring the advantages of automation and cloud portability to your Selenium. You can use the multi-branch pipeline model to quickly and reliably build, test, deploy, monitor, report on, and manage your Selenium tests, and much more. In this Jenkins tutorial, we take a look at how to create a Jenkins multibranch pipeline and key concepts involved in configuring a Jenkins multibranch pipeline for Selenium automation testing. Let’s get started. What Is Jenkins Multibranch Pipeline? According to official documentation, a multibranch pipeline job type lets you define a job where from a single git repository, Jenkins will detect multiple branches and create nested jobs when it finds a Jenkinsfile. From the above definition, we can understand, Jenkins can scan Git repo for Jenkinsfile and create jobs automatically. All it needs from us is the Git Repo details. In this article, we are going to use a sample GitHub repository. Our sample GitHub repo contains a sample Spring Boot project, which can be deployed to Tomcat. In the root directory of the project, we have the Jenkinsfile. We used Jenkins Declarative pipeline syntax to create this Jenkinsfile. If you are new to Jenkins Declarative pipeline, please read our detailed article here. Sample Jenkinsfile pipeline { agent any stages { stage('Build Code') { steps { sh """ echo "Building Artifact" """ } } stage('Deploy Code') { steps { sh """ echo "Deploying Code" """ } } } } We created two stages “Build Code” and “Deploy Code” in our Jenkinsfile, each of them configured to print appropriate messages. Now, we have the Git repo with Jenkinsfile ready. Let’s create a Jenkins multibranch pipeline in Jenkins server. Jenkins Pipeline vs Multibranch Pipeline Jenkins pipeline is the new hotness, but it’s not for everyone. And multibranch pipelines are still awesome. In this section of the Jenkins multibranch pipeline tutorial, let’s understand the ideal use cases for Jenkins pipeline and multibranch pipeline through this Jenkins pipeline vs multibranch pipeline comparison. Jenkins pipeline is a job configuration system that allows you to configure a pipeline of jobs, which will be executed automatically on your behalf. A Jenkins pipeline can have multiple stages and each stage will be executed by a single agent, all running on a single machine or multiple machines. A pipeline is normally created for a specific branch of source code. When you create a new job, you will see an option for selecting the source code repository and branch. You can also create a fresh pipeline for a new project or new feature of an existing project. Jenkins pipeline allows you to have a flexible Jenkinsfile with stages for your build. So, you can have an initial stage where you run linting, tests, etc., and then separate stages for building artifacts or deploying them. This is very useful when you want to do multiple things in your pipeline. What if you only have one thing to do? Or if all the things you want to do are different depending on some configuration? Does it make sense to use Jenkins pipeline here? Multibranch pipeline is an alternative approach that might be more suitable in these cases. Multibranch pipeline allows you to split tasks out into branches and merge them together later. This is very similar to the way Git branching works. A multibranch pipeline is a pipeline that has multiple branches. The main advantage of using a multibranch pipeline is to build and deploy multiple branches from a single repository. Having a multibranch pipeline also allows you to have different environments for different branches. However, it is not recommended to use a multibranch pipeline if you do not have a standard branching and CI/CD strategy. Now, since you have seen the Jenkins pipeline vs multibranch pipeline comparison, let’s go through the steps to create a Jenkins multibranch pipeline. Creating a Jenkins Multibranch Pipeline Step 1 Open the Jenkins home page (http://localhost:8080 in local) and click on “New Item” from the left side menu. Step 2 Enter Jenkins job name, choose the style as “multibranch pipeline,” and click “OK.” Step 3 In the “Configure” page, we need to configure only one thing: The Git Repo source. Scroll down to the “Branch Sources” section and click on the “Add Source” dropdown. Choose “GitHub” as source because our sample GitHub repo is hosted there. Step 4 Enter the Repository HTTPS URL as https://github.com/iamvickyav/spring-boot-h2-war-tomcat.git and click on “Validate.” Since our GitHub repo is hosted as a public repo, we don’t need to configure credentials to access it. For enterprise/private repos, we may need credentials to access them. The “Credentials ok” message represents the connection between Jenkins server and the Git repo is successful. Step 5 Leave the rest of the configuration sections as such for now and click on the “Save” button at the bottom. On saving, Jenkins will perform the following steps automatically: Scan Repository Step Scan the Git repo we configured. Look for the list of branches available in the Git repo. Select the branches that have Jenkinsfile. Running Build Step Run build for each of the branches found in the previous step with the steps mentioned in Jenkinsfile. From the “Scan Repository Log” section, we can understand what happened during the Scan repository step. Since we only have a master branch in our git repo, the Scan Repository Log says “1 branches were processed.” After the scan is complete, Jenkins will create and run a build job for each processed branch separately. In our case, we had only one branch called master. Hence, the build will run for our master branch alone. We can check the same by clicking on “Status” in the left side menu. We can see a build job created for the master branch in the status section. Click on the branch name to see the build job log and status. “Stage View” gives a visual representation on how much time each stage took to execute and the status of the build job. Access Build Job Run Logs Step 1 Click on the “Build number” under the “Build history” section. Step 2 Next, choose the “Console Output” from the left side menu to see the logs. What happens if we have more than one branch in our Git repo? Let’s check that now. In the Git repo, a new branch called “develop” is created. To differentiate the “develop” branch build, we made small changes in echo commands in Jenkinsfile. Jenkinsfile in Master Branch pipeline { agent any stages { stage('Build Code') { steps { sh """ echo "Building Artifact" """ } } stage('Deploy Code') { steps { sh """ echo "Deploying Code" """ } } } } Jenkinsfile in Develop Branch pipeline { agent any stages { stage('Build Code') { steps { sh """ echo "Building Artifact from Develop Branch" """ } } stage('Deploy Code') { steps { sh """ echo "Deploying Code from Develop Branch" """ } } } } Now, we have two Jenkinsfile in two different branches. Let’s rerun the repository scan in Jenkins to see the behavior. We can see the new branch (develop branch) got detected by Jenkins. Hence, a new job was created separately for the develop branch. On clicking “develop,” we can see the log for the develop branch’s build job. In the previous example, we kept different contents for Jenkinsfile in the master and develop branches. But that’s not how we do it in real world applications. We leverage the when blocks within stage block to check the branch. Here is an example with combined steps for the master and develop branches. This same content will be placed in both of the master and develop branch Jenkinsfiles. pipeline { agent any stages { stage('Master Branch Deploy Code') { when { branch 'master' } steps { sh """ echo "Building Artifact from Master branch" """ sh """ echo "Deploying Code from Master branch" """ } } stage('Develop Branch Deploy Code') { when { branch 'develop' } steps { sh """ echo "Building Artifact from Develop branch" """ sh """ echo "Deploying Code from Develop branch" """ } } } } Step 3 Click on “Scan Repository” from the left side menu for Jenkins to detect the new changes from Git repo. By this time, you could have noticed, we are using the scan repository everytime we want Jenkins to detect the changes from repo. How about automating this step? Periodic Trigger for Jenkins Multibranch Pipeline Scan Step 1 Click “Configure” from the left side menu. Step 2 Scroll down to the “Scan Repository Triggers” section and enable the “Periodically if not otherwise run” checkbox and choose the time interval for the scan to run periodically (two minutes in our example). Step 3 Click on the “Save” button. From now on, Jenkins will scan the repo every two minutes. If a new commit is found in any branch, Jenkins will run a new build job for that particular branch using Jenkinsfile. Below is the “Scan Repository Log,” which clearly shows the scan triggered every two minutes. Real-Time Use Cases for Jenkins Multibranch Pipeline Below are a few scenarios where Jenkins multibranch pipeline can be handy: Any new commit in the master branch has to be deployed in the server automatically. If a developer tries to raise Pull Request (PR) to develop a branch then: Code should get built successfully without compilation error. Code should have minimum 80% test coverage. Code should pass the SONAR code quality test. If developers try to push the code to a branch other than master or develop, the code should compile successfully. If not, send an alert email. Here is a sample Jenkinsfile covering a few of the above use cases: pipeline { agent any tools { maven 'MAVEN_PATH' jdk 'jdk8' } stages { stage("Tools initialization") { steps { sh "mvn --version" sh "java -version" } } stage("Checkout Code") { steps { checkout scm } } stage("Check Code Health") { when { not { anyOf { branch 'master'; branch 'develop' } } } steps { sh "mvn clean compile" } } stage("Run Test cases") { when { branch 'develop'; } steps { sh "mvn clean test" } } stage("Check Code coverage") { when { branch 'develop' } steps { jacoco( execPattern: '**/target/**.exec', classPattern: '**/target/classes', sourcePattern: '**/src', inclusionPattern: 'com/iamvickyav/**', changeBuildStatus: true, minimumInstructionCoverage: '30', maximumInstructionCoverage: '80') } } stage("Build and Deploy Code") { when { branch 'master' } steps { sh "mvn tomcat7:deploy" } } } } We committed this new Jenkinsfile in the master and develop branches, so it can be detected by the Jenkins multibranch in the next repository scan. Selenium Automation Testing With Jenkins Multibranch Pipeline Let’s consider we are writing automation test cases for a website. Whenever a new test case is committed in a branch, we want to run them and make sure they are executing as expected. Running automation test cases on every browser and operating system combination is a nightmare for any developer. That’s where LambdaTest’s powerful automation testing infrastructure can prove handy. Using LambdaTest Selenium grid, you can maximize your browser coverage. In this section, we will see how to leverage the testing infrastructure of LambdaTest with the Jenkins multibranch pipeline. To demonstrate, we hosted a sample Todo app here—LambdaTest ToDo App. Automation test cases written with Cucumber are committed in the sample repo. From Jenkins, we want to run these test cases in the LambdaTest platform. Running test cases in LambdaTest needs username and accessToken. Register with LambdaTest platform for free to get your credentials. Setting Up Environment Variable When the test case runs, they will look for LambdaTest’s username (LT_USERNAME) and password (LT_ACCESS_KEY) in environment variables. So, we need to configure them beforehand. To avoid storing them with source code, we configured them as secrets in Jenkins and loaded environment variables from them: environment { LAMBDATEST_CRED = credentials('Lambda-Test-Credentials-For-multibranch') LT_USERNAME = "$LAMBDATEST_CRED_USR" LT_ACCESS_KEY = "$LAMBDATEST_CRED_PSW" } Here is our final Jenkinsfile: pipeline { agent any tools { maven 'MAVEN_PATH' jdk 'jdk8' } stages { stage("Tools initialization") { steps { sh "mvn --version" sh "java -version" } } stage("Checkout Code") { steps { checkout scm } } stage("Check Code Health") { when { not { anyOf { branch 'master'; branch 'develop' } } } steps { sh "mvn clean compile" } } stage("Run Test cases in LambdaTest") { when { branch 'develop'; } environment { LAMBDATEST_CRED = credentials('Lambda-Test-Credentials-For-multibranch') LT_USERNAME = "$LAMBDATEST_CRED_USR" LT_ACCESS_KEY = "$LAMBDATEST_CRED_PSW" } steps { sh "mvn test" } } } } Now, we will create a new “Job” in Jenkins as a multibranch pipeline by following the steps mentioned in above sections. Lets point to the sample repo. Once the build runs successfully, visit LambdaTest automation dashboard to get the test logs. Conclusion With this, we have learned how to create a Jenkins multibranch pipeline, how to configure a git repo in it, different build steps for different branches, using periodic autoscan of repo by Jenkins and leveraging LambdaTest’s powerful automation test infrastructure to automate our CI/CD builds. I hope you found this article useful. Please share your feedback in the comments section.
When delivering software to the market faster, there is a critical need to onboard automated tests into your continuous delivery pipeline to verify the software adheres to the standards your customer expects. Your continuous delivery pipeline could also consist of many stages that should trigger these automated tests to verify defined quality gates before the software can move to the next stage and eventually into production (see Figure 1). Depending on the stage of your pipeline, your automated tests could range in complexity from unit, integration, end-to-end, and performance tests. When considering the quantity and complexity of tests, along with the possibility of having multiple stages in your pipeline, there could be many challenges when onboarding, executing, and evaluating the quality of your software before it is released. This article will describe some of these challenges. I will also provide some best practice guidelines on how your automated tests could follow a contract to help increase the delivery of your software to the market faster while maintaining quality. Following a contract helps to onboard your tests in a timely and more efficient manner. This also helps when others in your organization might need to troubleshoot issues in the pipeline. Strive to Run Any Test, Anywhere, Anytime, by Anyone! Figure 1: Example Software Continuous Delivery Pipeline Challenges There could be several challenges when onboarding your tests into your continuous delivery pipeline that could delay your organization from delivering software to market in a reliable manner: Quantity of Technologies Automated tests can be developed in many technologies. Examples include pytest, Junit, Selenium, Cucumber, and more. These technologies might have competing installation requirements such as operating system levels, browser versions, third-party libraries, and more. Also, the infrastructure that hosts your pipeline may not have enough dedicated resources or elasticity to support these varieties of technologies. It would be efficient to execute tests in any environment without having to worry about competing requirements. Test Runtime Dependencies Tests can also depend on a variety of inputs during runtime that could include text files, images, and/or database tables, to name a few. Being able to access these input items can be challenging as these inputs could be persisted in an external location that your test must retrieve during execution. These external repositories may be offline during runtime and cause unanticipated test failures. Different Input Parameters When onboarding and sharing your tests into your organization’s CI/CD process, it is common for your tests to have input parameters to pass into your test suite to execute the tests with different values. For example, your tests may have an input parameter that tells your test suite what environment to target when executing the automated tests. One test author may name this input parameter –base-URL while another test author in your organization may name this input parameter –target. It would be advantageous to have a common signature contact, with the same parameter naming conventions when onboarding into your organization’s CI/CD process. Different Output Formats The variety of technologies being used for your testing could produce different output formats by default. For example, your pytest test suite could generate plain text output while your Selenium test suite may produce HTML output. Standardizing on a common output format will assist in collecting, aggregating, reporting, and analyzing the results of executing all the tests onboarded into your enterprise CI/CD process. Difficulties Troubleshooting If a test fails in your CI/CD pipeline, this may cause a delay in moving your software out to market and thus there will be a need to debug the test failure and remediate it quickly. Having informative logging enabled for each phase of your test will be beneficial when triaging the failure by others in your organization such as your DevOps team. Guidelines Containerize Your Tests By now, you have heard of containers and their many benefits. The advantage of containerizing your tests is that you have an isolated environment with all your required technologies installed in the container. Also, having your tests containerized would allow the container to be run in the cloud, on any machine, or in any continuous integration (CI) environment because the container is designed to be portable. Have a Common Input Contract Having your test container follow a common contract for input parameters helps with portability. It also reduces the friction to run that test by providing clarity to the consumer about what the test requires. When the tests are containerized, the input parameters should use environment variables. For example, the docker command below uses the -e option to define environment variables to be made available to the test container during runtime: docker run-e BASE_URL=http://machine.com:80-e TEST_USER=testuser-e TEST_PW=xxxxxxxxxxx-e TEST_LEVEL=smoke Also, there could be a large quantity of test containers onboarded into your pipeline that will be run at various stages. Having a standard naming convention for your input parameters would be beneficial when other individuals in your organization need to run your test container for debugging or exploratory purposes. For example, if tests need to define an input parameter that defines the user to use in the tests, have a common naming convention that all test authors follow, such as TEST_USER. Have a Common Output Contract As mentioned earlier, the variety of technologies being used by your tests could produce different output formats by default. Following a contract to standardize the test output helps when collecting, aggregating, and analyzing the test results across all test containers to see if the overall results meet your organization's software delivery guidelines. For example, there are test containers using pytest, Junit, Selenium, and Cucumber. If the contract said to produce output in xUnit format, then all the test results generated from the running of these containers could be collected and reported on in the same manner. Provide Usage/Help Information When onboarding your test container in your pipeline, you are sharing your tests with others in your organization, such as the DevOps and engineering teams that support the pipeline. Others in your organization might have a need to use your test container as an example as they design their test container. To assist others in the execution of your test container, having a common option to display the help and usage information to the consumer would be beneficial. The help text could include: A description of what your test container is attempting to verify Descriptions of the available input parameters Descriptions of the output format One or two example command line executions of your test container Informative Logging Logging captures details about what took place during test execution at that moment in time. To assist with faster remediation when there is a failure, the following logging guidelines are beneficial: Implement a standard record format that would be easy to parse by industry tooling for observability Use descriptive messages about the stages and state of the tests at that moment in time Ensure that there is NO sensitive data, such as passwords or keys, in the generated log files that might violate your organization's security policies Log API (Application Program Interfaces) requests and responses to assist in tracking the workflow Package Test Dependencies Inside of the Container As mentioned earlier, tests can have various runtime dependencies such as input data, database tables, and binary inputs to name a few. When these dependencies are contained outside of the test container, they may not be available at runtime. To onboard your test container in your pipeline more efficiently, having your input dependencies built and contained directly inside of your container would ensure that they are always available. However, there are use cases where it may not make sense to build your dependencies directly inside of your test container. For example, you have a need to use a large input dataset that is gigabytes in size, in your test. In this case, it may make more sense to work with your DevOps team to have this input dataset available on a mounted filesystem that is made available in your container. Setup and Teardown Resources Automated tests may require the creation of resources during execution time. For example, there could be a requirement to create multiple Account resources in your shared deployment under test and perform multiple operations on these Account resources. If there are other tests running in parallel against the same deployment that might also have a requirement to perform some related operations on the same Account resource, then there could be some unexpected errors. A test design strategy that would create an Account resource with a unique naming convention, perform the operation, assert things were completed correctly, and then remove the Account resource at the end of the test would reduce the risk of failure. This strategy would ensure that there is a known state at the beginning and end of the test. Have Code Review Guidelines Code review is the process of evaluating new code by someone else on your team or organization before it is merged into the main branch and packaged for consumption by others. In addition to finding bugs much earlier, there are also benefits to having other engineers review your test code before it is merged: Verify the test is following the correct input and output contracts before it is onboarded into your CI/CD pipeline Ensure there is appropriate logging enabled for readability and observability Establish the tests have code comments and are well documented Ensure the tests have correct exception handling enabled and the appropriate exit codes Evaluate the quantity of the input dependencies Promote collaboration by reviewing if the tests are satisfying the requirements Conclusion It is important to consider how your test design could impact your CI/CD pipeline and the delivery of your software to market in a timely manner while maintaining quality. Having a defined contract for your tests will allow one to onboard tests into your organization’s software delivery pipeline more efficiently and reduce the rate of failure.
With the modern patterns and practices of DevOps and DevSecOps, it’s not clear who the front-line owners are anymore. Today, most organizations' internal audit processes have lots of toils and low efficacy. This is something John Willis, in his new role as Distinguished Researcher at Kosli, has referred to in previous presentations as “Security and Compliance Theatre.” It's a topic he has previously touched on with Dan Lines on the Dev Interrupted Podcast, also featured on DZone. In this talk, filmed at Exploring DevOps, Security, Audit compliance and Thriving in the Digital Age, John takes a deep dive into DevSecOps and what effective governance will look like as regulation and automation continue to have competing impacts on the way software is delivered. He'll ask how we came to be at the current pass with references to well-known risk and compliance failures at Equifax, Knight Capital, Capital One, and Solar Winds. Full Transcript So, if you think back in time to my evolution of trying to get people to change the way they think about what we do in security from a DevOps perspective, the Abraham Wald story, you have probably heard it; you just haven't heard it with his name. So, during World War Two, there were a bunch of mathematicians and statisticians whose job was to figure out how to do weight distribution and repair fighter planes that came back with bullet holes. This Abraham Wald one day woke up and said — and this is really the definition of survival bias — “Wait a minute, we're repairing the planes where the bullet holes are? They're the ones that are coming back. We should be repairing the planes where the bullet holes aren’t because they're the ones that aren't coming back.” I think that's a great metaphor for the way we think about security: maybe we're looking in the wrong place. And so I asked this meta question about three or four years ago, which I hope makes your brain hurt a little bit, but in the Abraham Wald way — which was “What if DevSecOps happened before DevOps?” Well, the world would be different. Because if you think about it — I'm pro-DevSecOps, and I think everybody should have a good DevSecOps reference architecture — basically what happened was we did all this DevOps work, and then we put an overlay of security on it, and that's good, it’s necessary. But maybe we already had this bias of bullet holes when we were thinking about that. What if we started with security? What if some security person said, “I’ve got a great way to do security. I'm going to call it DevSecOps!” and we started in that order? Could things be different? Would we be thinking differently, or have we not thought differently? So Shannon Lietz, who is one of my mentors — she wrote the DevSecOps Manifesto — coined the term “everyone is responsible for security.” We were talking at the break about these three lines of defense. So, I don't come from an auditor background. All I know is that I get brought into all these companies, and they would ask, “Hey John, look at our DevSecOps reference architecture!” And I’d go, “Well, that’s awesome,” and then we would have a conversation. “Yeah, we buy the three lines of defense model.” “Erm, yeah, that one is not so awesome!” Because Andrew Clay Shafer, in the earliest days of DevOps, pre-DevSecOps, made this beautiful character of what described the problem of the original DevOps problem statement. There was a wall of confusion — and some of you people look like you might be close to as old as I am — but there was a day when developers would throw their code over the wall, and operations would catch it and say, “It doesn’t work!” And the other side would say, “No, you broke it; it works! And this would go on for weeks and weeks and weeks, and Andrew would talk about figuring out a way to break that wall. And there were some of the original DevOps, these beautiful stories of developers working together in collaboration, and there is a whole industry that's been built out of it. So, there is busting the wall, and it becomes a metaphor for any non-collaborative groups in an organization. And so when I was thinking about what was the problem statement that drove DevOps, where are we now with the problem statement of what happens in a large organization between the first line, second line, and third line? The way I view it when I have these conversations, the second line is, by definition, a buffer for the third line. The second line has no way to communicate with the first. And this is what “dev” and “ops” looked like 15 years ago. We didn't have the tools. We didn't even have the cognitive mapping to have those discussions. We didn't even know that we should be having those concerns. In Investments Unlimited, we have a short description about how I'm not going to go to the Institute of Internal Auditors and say, “Hey, I'm John Willis; you’ve never heard of me; get rid of three lines of defense!” That ain't happening. But what I am going to say is, just like we do a separation, can we reframe — and we did this a little bit in the book — the conversation of how we think about this? Why can't we create that DSL I talked about, where the second line can meet the first line in designer requirements? And here's the kicker, right? I don't claim to be a genius about all things. But what I do know is in every bank and every company I've walked in, what's the purpose of that second line? They basically make sure they do the job that they need to do. What's the job they need to do? Protect the brand. That's it, right? Everything falls on protecting the brand. When you're Equifax, and you lose a $5 billion market cap in a day, or you're another company called Knight Capital that was the second largest high-frequency trading company on the NYSE, they lost $440million in 45 minutes and were out of business in 24 hours. That's what they're supposed to do. And our relationship now is to basically hide things from them. That's got to change. And that's why we get into the likes of Equifax, Ignite Capital, and Capital One. So what do you do? I had this idea when I started thinking about security. Have you ever been to RSA? You go into the exhibition hall at the RSA you're like, “I’ve gotta get out of here!” There are too many lights and too many vendors; this is just too confusing. It was almost impossible to come up with a taxonomy for security because there are just so many ways to discuss it and look at it. So, I started thinking about how do I make it simple. Could I come up with it? And like I always say — and I’ll keep saying it until I get punched in the face for saying it — a post-cloud native world or using cloud-native as the marker to make the conversation simpler. I'm not implying that security doesn't exist for legacy in mainframes; it certainly does. But we could have a simpler conversation if we just assumed there was a line where we could say everything to the right is cloud native. And so with that, I will tell you that what we need to do and what we do, are inconsistent. What we need to do or what we do is how do we prove we're safe when we have some form of usually subjective audit service to our records that tell stories about things that might lead to screen prints? And then how do we demonstrate we have a poor second line in our internal auditors or external auditors, try to figure out what all those subjective descriptions are, and we need to do both. So we need to be able to make a system that can prove we're safe and be very consistent with what we demonstrate. And that's the whole point of something like Kosli or, just in general, this idea of digitally signed, immutable data that represents what we say we do. So, then the audit isn't a subjective 40-day conversation, it's a one-second look at the SHA, and we're done. So we move from implicit security models to explicit proof models, and we change subjective to objective and then verifiable. Back to the cloud-native model, if you can accept there’s a post-cloud-native world, then I can tell you that I can give you a simple taxonomy for thinking about security and not having to think about it in like 40 horizontal and 50 vertical ways. I worked with a couple of groups, and I started hearing from these CISOs, and I said, “I don't want to call it taxonomy, but we can look at it as risk defense and trust, and we can look at it as a transition from subjective, to objective, to verifiable.” So we went through in the last presentation in 20 minutes the risk from a change at the attestation. I didn't talk about continuous verification, but there are some really interesting products that are basically trying to use chaos monkey-like tools to go beyond just breaking things to actually, for example, this port should never be open… let's just open the port. If this vulnerability should have never gotten through the pipeline, let's launch an application that has that vulnerability, right? So there's some really interesting continuous verification. I'll spend a little more on that. But then, on defense, it's table stakes that you detect and respond by Azure and all that stuff. And then everybody's basically trying to build a data lake right now, a cyber data lake. That’s the in, hip thing to do. I'm not making fun of it, it's required, but there's some real thought process that isn't happening about how you build a cyber data lake that isn’t just a bunch of junk. So there are a couple of vendors and projects that are thinking about "Can we normalize data and ingest like coming out of the provider?" So, for example, you take a message that might come from Amazon, the same message might come from Google Cloud and might come from Oracle, and it might be the same thing, like increased privileges. But the message is completely different; there's no normalization, and so if you shove that all the way to the right in a cyber data lake, you’re going to have a hard time figuring out what the message even is, let alone that each one of them has a different meta definition for the ID and all that stuff, and at some point, you really want to you want to attach that to a NIST or a minor attack framework tactic. So, let's do all that on the left side, and there's some good work happening there. And then the trust thing is interesting too because the thing that we're seeing anybody could file the SDS. When Mike said I sold a company to Docker, what I actually did is I had this crazy idea of ‘Could we do software-defined networking in containers?’ And we did it. We were literally me and this guy who pretty much invented software networking; we built it, and as you know, this whole idea of how you do trust and build around it. If you think about SDN, it was changing a north-south paradigm of how traffic came in and out to the east-west. If you looked at some of the traffic patterns going back 15, 20 years ago, 90% of your traffic was north-south. And then all of a sudden, the more you got into high-scaled service, service mesh, all that stuff, it flipped. It went to 80% east-west. And we built a network around that. Well, I believe we have to do that for trust now. And we already see evidence of this when we start getting into Kubernetes and clusters and stuff like that. We're seeing pods, like SPIFFE and SPIRE, some of the new service mesh stuff, ambient mesh — I am throwing out a lot of terms — but there is this possibility to instead of building this on or off north-south trust, we could create ephemeral trust within a cluster, and it goes away. So, even things like secrets management — I think Vault is a great product today, but that stuff could happen really at the mesh level, where a secret just exists in this clustered pod or cluster for the life of the cluster. And by the way, you're in, or you're out like you're authorized for that cluster. So, I think there's incredibly interesting stuff around what I call moving. And zero trust is table stakes, right? I'm talking about more of, let's really go to a level of trust where the world is going to be — I don't know if it's Kubernetes — but it's definitely going to be a cluster-based compute, and then we could build our trust around that model. I know it sounds crazy, but hey. So, risk differently. We talked about this in Investments Unlimited. This was the Capital One breach, which is fun for everybody except Capital One! Basically, this was the stretch to Jakarta. Oh, wait a minute, this is actually Equifax, but that's fine. So what happened was there was a vulnerability in one of the stretch two libraries, which almost everybody uses, where if you are an unauthorized system, you put a command in, and it runs. And if that was their system, you could do whatever you want – this is what I told you about the breach. But this one's a little more interesting; this is Capital One’s breach. If you follow aviation disasters, this is like the Air France 447 of computing, if that makes any sense to anybody. So what was interesting about this one is they were basically rolling IDSs, so there was this window of anywhere from seconds to about five minutes where an adversary could poke in. There was this woman who was a crypto miner, who basically runs a billion curls a day, looking for one person to put up a proxy with the defaults on. And this team that was in a hurry got an exception and put up a proxy that left the defaults, and this one proxy, the default bypass, was on. So, this crypto miner got really lucky because the IDSs were rolling, they popped through, they hit that IP address — it’s a hardwired address that anybody who has ever worked with Amazon Web Service knows is the metadata server — so that capitalone.com question mark URL equals the metadata server, and because they were in a hurry, they probably cut and paste from a stack overflow some VPC definitions. And they were privileged, so they got right through, were able to dump the privileges, and assume a super user power. Meanwhile, some developers left 100 million business credit card applications in the S3 bucket. Here's where it gets really worse. In business credit cards, the PCI DSS requires Social Security numbers to be tokenized, but it doesn't require the corporation ID to be that. I'm sure it's everywhere, but basically, half of the S corps are small businesses, and they use the Social Security number as the corporation ID. So again, there are just all these loopholes that happen. And that's called server-side request forgery. I was actually brought into SolarWinds. One of the authors of the book worked for a big five, and they wanted to get the contract for the clean-up job. So I was brought in to talk about automated governance. Again, we can make fun of SolarWinds all day long, but every software company out there is basically as bad as they are. By the way, all the software that you're buying – now that I actually don't work for a software company, we’re SaaS-based, we’re good! – you look at what SolarWinds was, and it was terrible. The pipelines it was just horrendous. And so I go in talking about advanced stuff, and they're like, “No, no, we’ve just got to get DevOps!” So they weren't really that interested. But I thought I'd be Johnny on the spot and go in there and take this CrowdStrike minor attack framework and say, “Alright, I'm going to really show these guys what they should use.” Because basically, what happened was the adversary got into the Microsoft compiler. These are supply chain attacks; they are the really scary ones — where they're not even going after you; they're going after whoever you're delivering stuff to. So they got into there. And by the way, they apparently rolled their logs after 18 months, so they don't even know how long they were in there. They could have been there for years. So CrowdStrike did a really good analysis, and one of the ones that I just caught — in fact, it’s in our demo, that's why I sent our demo — was they weren't checking the image SHA. So what happened is they said, “I must build!” and they start injecting nefarious code into it so that when that product goes out to the Department of Defense or the bank, they've got this open backdoor out there. And a table stakes attestation would be if it's a clean image or a job file, is doing a baseline SHA and be able to look before or after, and be able to see if it should have been this, and there are other ways to detect. And the other thing that was really interesting about why this idea of automated governance has to have an immutable and non-tamperable data store, they went in and actually created logs. That's really scary if they get to live in your company. And by the way, they’re in your company right now, don't think they're not there now. They may not be finding a way around to do real damage, but you are incredibly naive if you don't think there are adversaries sitting in your corporation. There's polymorphic malware – I spent 20 minutes explaining how polymorphic malware works — they are in your company. The question is how hard it is or what opportunities arise. The Air France 447 allows them to get to the next step and to the next step. If they're really smart, this is where it gets really scary. They can actually tamper with the logs to remove the evidence that they actually did things. One of the biggest things to Equifax, when that was said and done, the Equifax story is really interesting. I know a lot of people who worked at Equifax, and so their external auditors are the thing that drove almost everybody to quit for the next two years after the breach: they wanted everybody to prove the negative and note the negative. In other words, they were like, you know what? We survived a nightmare because they didn't change the data. That's the scary thing. It's one thing to have an adversary that dumps a bunch of confidential data out in the wild, and that's not good; it's going to hurt the brand. You'll go out of business if they change your system or record data and they publish that. If you're a bank and I change your counts, they were in Marriott for five years for that breach. So, if they're really smart — and this is evidence that they do this — not only might they mutate your data, they’ll mutate the evidence of the data. That's why it has to be in an immutable, non-tamperable store. Defense differently: again, I talked a lot about this. You have to think about your site, don't just build a cyber data lake. There are some really good opportunities to think about how you ingest at the provider level. And there are a couple of providers now building these interesting SDKs — it's called automated cloud governance. It's from a group out in New York called ONUG, where you can basically use these SDKs from Microsoft, Oracle, IBM, and Google, where you can start building these NIST user data, and you can normalize the messages themselves. So by the time you get in the data lake, you're not doing an incredible amount of compute consumption to try to correlate. And trust differently; zero trust is table stakes. But I think the really interesting stuff becomes, certainly, in this state, the 207, and the good news is when I first was writing this, SPIFEE and SPIRE were just external projects. Now they're actually built into the Istio, Envoy, and service mesh, so they’re all there. But Sigstore is really interesting, a Merkle tree-based solution that deserves to be looked at. The thing that I'm trying to get Mike and James, and everybody, really excited about is what's coming down the pike. And here's the thing: in the past, we had this buffer as IT, our first line people who know that the adversaries and the auditors, we are ahead of them. We’ve got Kubernetes. They won't figure out all the dangers, ghosts, and dragons in Kubernetes until next year. We've been sort of living in that sort of delayed buffer. Well, now what's happening is the people who write the Google stuff, like service mesh, Istio, and Envoy, are now writing or getting contracted by NIST to write the documentation. So now some of the incredibly dangerous stuff that's in Istio and Envoy, which is the service mesh part of Kubernetes, is well documented in English — easily read, and both the adversaries and the auditors can see that there's something called the blue-green deploy that used to only happen in layer three. Basically, what happened now is all the way through; stuff can happen to layer seven. And in layer three, stuff or switch config, which is very hard for an adversary to get in your system and tamper with that stuff. But now an adversary just sees a find one leaky API or the YAML file, and they can basically say, “You know what? I'm going to take 1% of all the paid traffic and send it to some latest version.” I ask people, “Have you ever heard of the Envoy? Do you even turn on the Envoy access log?” “What's that?” So that means there are banks that are running production, customer payment, or funds-based stuff in service mesh — and they have no evidence of that. So this is for people like Kosli and us, who want to get ahead of the curve, a treasure trove. It's stuff that's going to happen so fast, and people aren't even ready. And again, the takeaway is we had the luxury to think about the adversaries, how we don't think they'll figure that out because it's so advanced. The people who write Envoy, Istio, and that stuff are now writing this documentation on how it works. So the adversaries are not stupid. You know, when you tell them there's something called the blue-green deploy, they might not know what it is, but once they realize it's a reroute of traffic, then they'll know exactly what to do with it. By the way, that's a GPT-3 image; all I put in was Bacciagalupe as John Willis, and that's what I got. And the only thing I will say is — and this is probably worth a drink of water — I think the internet thinks I'm a clown. So that's OK! We've got some time for a Q&A, so I'll bring a couple of chairs up, and we can have a bit of a fireside chat. If you have any questions, put them in on the QR code on your lanyard. So before we get into the questions from the audience, I'd like to pick up on what you were saying about the network stuff because I have to say, when you started talking about this, Istio and Envoy, can we just stick with what we've got for now? And the more I started thinking about it, the more I thought, “Oh wait, hold on, this is quite interesting because, again, it goes back to the DevOps story because it's another example of things that used to be in another department in the business where the developers get so pissed off with it, they decide that we're going to put this in software now. So first, it was billed as security, its deployments, its cloud, and its containers. Time after time, we talk about everything being code, but it's really developers doing civil disobedience against other parts of the org in some way. So networking is one area, but some of the conversations I've had this week are also about data. Maybe you could say a bit about that? Oh, yeah. I mean, that's another thing. So one of the things that John Rzeszotarski implemented this thing and one of the first interesting conversations that happened after, and he had he built it, and our whole thought process was this is for the software supply chain. And it turns out one of the API development teams saw it and did a pull request, and said, “Hey, we built a system for API development workflow.” “Ooh, that's interesting!” Since this isn’t really a software-defined workflow, it's a workflow that shows evidence of the decisions you made in defining an API, right? Like leaky API, all that stuff. And that sort of opened up this idea that it's just a model for workflow evidence. And so I started thinking about what else are we doing. And right at the time, this concept of data ops was starting. Go back 15 or 20 years, let's say 20 years ago; there were a lot of big banks that the way they put software into production was they just put the software into production. And then DevOps came to CI and CD, and then DevOps like it'd be very rare and almost probably pretty close to a criminal if that happened in a large bank today. There are people that sort of bypass the system, but in general, I don't run into any financial organizations that don't have some form of a pipeline. But those same organizations have developers that stuff 145 million business credit cards into an S3 bucket through no operational pattern. And so this movement of data ops is ‘Could we do a workflow for how we move data?’ And we've been sort of doing the data warehouse forever, but now whenever a developer wants to get some data, there should be a process of how you tokenize it. Does it get ETL’d? What's the evidence so when it’s sitting in an S3 bucket? So imagine all the way that you're processing. They say that you're taking it from the raw data, maybe ETL process, maybe you're sort of tokenizing, maybe it's going through Kafka and doing this, this, and this, and it's winding up here. What if you were keeping these Kosli or just this attestational evidence all the way so that when it goes to the S3 bucket, you could reject it? Just like you could reject the build, fail to build? Or even better, have a scanner scanning all the S3 buckets looking for any data that doesn't have a verifiable meta of evidence. Again, the two worlds have to mature and meet together, and I think the more of the conversations that happen about data ops, the more it puts us in a better or anybody who's doing this model of that kind of evidence should naturally happen, and it could happen for it. I've seen people talk about doing it for modeling, for example, Monte Carlo modeling. What were the decisions that you made, and what's the data that shows? When the model runs like it's force majeure, right? I mean, at that point, once it's been trained, it's going to do what it does. Now if it does really bad stuff, at least I can show evidence that these are decisions that we made when we were building the model. This gentleman had a question. I know you; you gave a great presentation the other night! Thanks! I was just thinking about the information within the data, and the kind of situation that we are in is that the regulations keep changing; everything changes, right? So if we even have this tokenization or verification of the data that you're using, whatever that is in the architecture, if the regulations change, what are you going to do about it? That’s what I was thinking because if you don't scan for it, but if you know where it is, that means that you can go out and you can pick it out. So the GDPR regulations, OK, we can't keep it for six months anymore; it's only three. If you get the meta, it will tell you everything. Then you know where you have what, so you can actually change on the spot. Here's the beauty part of that: it's the same thing with software delivery, right? Remember, I said earlier, the beauty of having that DSL associated as an artifact in the evidence chain is because if the requirements today are that you had to have this, this, and this, and then in six months now, there's some executive order where we realize, oh, you had to have this, this and this, it's the point in time evidence because the artifact is part of the evidence. So when you're looking at that data or that record, the evidence said you only had to have this. Well, it's even more true with data. With data, you might have reclassified. I did some work with Nike — you want to talk about how interesting their data classification is in the cloud? You guys might not know who Michael Jordan is because you don't follow American basketball, but at the time, the ownership of his data at Nike is cherished as a bank’s account data. The data about some of their big clients, so data classification, but then how do you mature the meta around it? And I think that's a great point — if the policy changes so that it needs to be from six months to three months, if you have the meta tagged — which this model I think works really well for that — then you could just scan around and say, “OK, we got rid of all the data that's been sitting around for four months, that used to be for six months, and now should be three months. I think, just to add to all of this, I agree with everything that's been said. But we know from the SRE book and from Google that 70% of system outages are due to changes in a live system. We focus a lot on changes nowadays in the DevOps world about deployments, but there's so much change that there isn't a deployment, right? There's a database migration, or a user is provisioned, or, you know, somebody needs to fix up a record in a transaction or something. It's just so much more. But it's the same thing, right? The data currently is siloed, ephemeral, and disconnected. And we talked about this the other day. What are the basics? And I'll just throw out the basic four – it’s probably five, maybe six — but what are the basic four about an audit? When did the change happen? Who did the change? Who approved the change? And then, usually, some variant of it was successful; was there a backup plan? And that's whether it's data, whether it's software artifact, whether it's configuration. And again, when the orders come in, they ask about this artifact, which is some library we still, without something like Kosli or a solution like that, spend a lot of time grabbing a lot of stuff to prove it. But when they ask us that same question about databases, I can tell you the answer is it's chaos because, one, we don't use data ops as a model, and two, if we had data ops, we could actually be more aligned with giving the evidence of who made it. Those should be standard in any delivery of any workflow, whether it's an API, whether it's Monte Carlo modeling, whether it's data ops, or whether it's software delivery. One-hundred percent agree. But in the interest of getting through some of the other questions, we had a question from Axel which I think is quite interesting: where do you think the CSO should be put in an organization, both in terms of the formal setup but also the activities and where they do it from. That's an interesting question. I had a great conversation during the break, so I'll give you my first uneducated answer. And it's not too uneducated because Mark Schwartz — he's another writer for IT Revolution and he's written a bunch of books — but one of his books is A Seat At The Table. And it's an interesting book that sort of talks about ‘Are you really an IT company if your CIO isn’t really on a real seat at the table?’ And actually what I've done when I go into large companies, not that I'm talking to the CEO – I do actually get to work for CIOs quite often, but not CEOs, I don't dress well enough for that – but the question I like to find out the answer to is ‘Where does your CIO sit today?’ Do they sit on the kiddies' table or the big grown-up table? Because if they're not on the grown-up table, I don't care how much you tell the world you’re a data company or software company – you’re not a software company. So he makes that point really well, and then he says that even companies that do create this category of achieving — and this might offend some people — but he says that creating a chief data officer is basically an excuse that you're doing it terribly, because why isn’t that part of the CIO? Is data not part of information technology? So. my only point is — the John Willis answer is — you call it CIO or whatever you want to call it, but they all should be aligned. Why is security completely segregated? Compliance and risk are over here, the CISOs here, and CIO here — is security, not information technology? Now, you pointed out that there are some requirements where they have to be firewalled, but then I go back to John Willis doesn't say get rid of the three lines of defense – I say we have to reframe the way we do things. So if I can't change you structurally, I'm not going to get rid of the three lines defenses, but I'm going to ask you until you kick me out of the building, “Why isn’t the second line in designer requirements?” every time I talk to you until you either tell me to get lost or you finally say, “OK, they're going to start showing up, John.” So, I think that there's somewhere in that's how you solve the problem, where it's a hardwired regulation. You work around it by reframing the mindset and the collaboration. But I think it's quite an interesting concept as well because I know some banks, even in this room, their second line doesn't report internally; it reports to the board as an independent control function, which makes a lot of sense. But it's interesting that you would take information security as a control function externally rather than an internal cultural thing that you need to do. Yeah, part of the legacy of our company. I'd say five years into the 10-year DevOps journey, oh my goodness, we forgot to bring security along. Our industry talks about bleeding end information. I've seen CTOs at banks like Citibank say, “We need to be more like Google in like the third slide. Fifteen slides later, they have a slide that says, ‘Do more with less.’ No, that's not how Google does its business! They don't do more with less! They hire incredibly expensive people. When a person tries to leave Google for a startup, they basically add about $1,000,000 to their yearly salary. So they don't do more with less. I was really surprised that the IT budget for places like JPMorgan. It’s incredible how much money they spend, though; it’s more than Google. So, a good friend of mine, I can't say who it is, but when you fire up a backup IBM mainframe, you immediately have to write a $1,000,000 check to IBM. And by the way, there are products called Net View — there are millions and millions of dollars that go into legacy budgets. But yes, the big banks — JPMorgan, Goldman Sachs — Goldman have been trying to figure out Quantum for trading applications. They put an incredible amount of investment money into bleeding-edge tech. I was at Docker, and they were literally the first large financial institution that was going all in on figuring out how they can use containers for tier-one trading applications. So, they definitely do spend money. Great, OK. So another question from an anonymous source. We have a whistle here in the room! So how do we overcome skepticism and resistance among non-tech stakeholders? You can't imagine life without a CAB. I have some opinions! It all goes back to trust. And there are actually a couple of really good books written by Scott Prugh, who is part of the Gene Kim tribe of people. There's a methodical way, and it all comes back to you just having to create the trust model. And it sounds simple, but it could be what we're talking about. One of the things I had to take out of the slide deck because I couldn't do it in 20 minutes; what got me really interested in working with Topo Pal is back in 2017 — he was the first fellow at Capital One — he wrote a blog article about how they did their pipelines. This is 2017, and it is a great article out there — if you want it, I can get a link to it; it’s still very relevant — he defined what he called 16 gates. And the idea was that they told the developers, “If you can provide evidence for these 16 things and we can pick it up, you don't have to go to the CAB.” So the first model is the way you get rid of the CAB is trusted data, right? And there are ways to create trust. I heard somebody say recently that their auditors don't want to hear anything about SHAs or anything like that. What are they thinking about when they're asking questions about funds? Because that tells you it's all encrypted. And if it's not, they’ve got way worse problems than worrying about what we do, you know? So it's how you frame things. If you go to a second line and you talk about SHAs and crypto, and we use Vault to do this, you're going to lose them. But if you try to explain it in a way that says, “The way we protect our system record data and data like our banking information is the same model we're using.” That reframes that conversation to, “Oh, I get it. Yeah, that makes sense.” I think we've got a question in the audience. There was a comment just before you started this about the trust model because I'm thinking that is what is important. If you skip the part about the governance coming down and we go back to DevOps, we need to have a little legitimacy. I think that developers need to have a mandate, or they need to feel a legitimacy to the auditors or the ones controlling them, that they can give away the data, they can give away the code, the 16 gates of trust kind of thing is really important. And I have an example if you want to hear it. I wrote a master's thesis on the security police in Norway because they had to do a complete reorg after the terror attacks that we had on 22 July a few years back. And my question to them was: how do you trust an organizational change? What they did was ask all the departments and department heads what they needed to work on. And they all said more money, more people, and then I'll fix it. And then they fired all of them. Literally, they had to apply for their own jobs. So the solution to all of this was that they asked everybody that worked on the very root level of the organization: what do you need to work? And they said, “Well, I need to talk to my colleague more. I need to sit in the same room. We need to establish the value chains from the bottom and then up.” So they did that, and they did it all internally without any external company auditing them. And it's a completely different matter. Don't even get me started on Dr. Deming because we will not end the day. But probably one of the greatest research projects of the 21st century in our industry is called Project Aristotle by Google. And they asked one question: how do you create great teams? And the single answer — although there was a ton of data, they talked to anthropologists, they talked to software engineers, they interviewed an incredible wealth to figure out this question — and the answer was psychological safety. And if you think about the umbrella of psychological safety, it includes everything you just talked about. Because if I'm a junior female worker in a corporation that's been around for 30 years, that has a bunch of fat old men like me, can that person say, “I don't think that's going to work,” and not get, “You’ve only been here for a week! How would you know?!” A psychologically safe person would say, “We need to take a look at that.” So, I'm not saying this for you, but it's easy to say we need to collaborate. But you can't have collaboration until you can take into account diversity and all these things that you can break down. And again, some of the best, strongest research that has ever happened in our industry comes out of something Google did. And there are some really great resources for people who just track psychological safety. I think it's number one. I'll get on my meta-horse — put me in front of the CEO and make me king for a day where they’re forced to listen to me, and there are two things I would tell them they have to do systemically in that organization. One is systemically pervasive psychological safety throughout the whole company. And second, I'd want them to pervasively create a systemic mindset around systems thinking. Those are the two things I would basically create, and I tell you, everything else will fall into place.
I start some of my talks with a joke: back in my day, we didn’t have monitoring or observability. We’d go to the server and give it a kick. Hear the HD spin? It’s working! We didn’t have DevOps. If we were lucky, we had some admins and a technician to solve hardware issues. That’s it. In a small company, we would do all of that ourselves. Today this is no longer practical. The complexity of deployment and scale is so big it’s hard to imagine a growing company without an engineer dedicated to the operations. In this series, I hope to introduce you to some of the core principles and tools used by DevOps. This is an important skill that we need to master in a startup where we might not have a DevOps role at all. Or in a big corporation, where we need to communicate with the DevOps team and explain our needs or requirements. What Is DevOps? DevOps is a software development methodology that aims to bridge the gap between development and operations teams. It emphasizes collaboration and communication between these two teams to ensure the seamless delivery of high-quality software products. The core principles behind it are: Continuous Integration and Continuous Delivery (CI/CD) — CI/CD is one of the key principles of DevOps. It involves automated processes for building, testing, and deploying software. With CI/CD, developers can identify and fix bugs early in the development cycle, leading to faster and more reliable delivery of software. As a developer, CI/CD can help you by giving you a faster feedback loop, enabling you to make changes to the code and see the results in real time. This helps you to quickly identify and fix any issues, which saves time and ensures that your code is always in a releasable state. Notice that CD stands for both Continuous Delivery and Deployment. This is a terribly frustrating acronym. The difference between the two is simple, though. Deployment relies on delivery. We can’t deploy an application unless it was built and delivered. The deployment aspect means that merging our commits into the main branch will result in a change to production at some point without any user involvement. Automation — Automation involves automating repetitive tasks such as building, testing, and deploying software. This helps to reduce the time and effort required to perform these tasks, freeing up developers to focus on more important tasks. As a developer, automation can help you by freeing up your time and allowing you to focus on writing code rather than spending time on manual tasks. Additionally, automation helps reduce the risk of human error, ensuring that your code is always deployed correctly. Collaboration and Communication — DevOps emphasizes collaboration and communication between development and operations teams. This helps ensure that everyone is on the same page and working towards a common goal. It also helps reduce the time and effort required to resolve any issues that may arise. Platform Engineering Recently there’s been a rise in the field of platform engineering. This is somewhat confusing as the overlap between the role of DevOps and a Platform Engineer isn’t necessarily clear. However, they are two related but distinct fields within software development. While both are concerned with improving software delivery and operation processes, they have different focuses and approaches. Platform Engineering is a discipline that focuses on building and maintaining the infrastructure and tools required to support the software development process. This includes the underlying hardware, software, and network infrastructure, as well as the tools and platforms used by development and operations teams. In other words, DevOps is concerned with improving the way software is developed and delivered, while Platform Engineering is concerned with building and maintaining the platforms and tools that support that process. While both DevOps and Platform Engineering complement each other, they serve different purposes. DevOps helps teams to work together more effectively and deliver software faster, while Platform Engineering provides the infrastructure and tools needed to support that process. Where Do We Start? When learning DevOps, it is important to have a solid understanding of the tools and techniques commonly used in the field. Here are some of the most important tools and techniques to learn: Version control systems: Understanding how to use version control systems, such as Git, is a key component of DevOps. Version control systems allow teams to track changes to their code, collaborate on projects, and roll back changes if necessary. I assume you know Git and will skip it and go directly to the next stage. Continuous Integration (CI) and Continuous Deployment (CD) tools: CI/CD tools are at the heart of DevOps and are used to automate the build, test, and deployment of code. Popular CI/CD tools include Jenkins, Travis CI, CircleCI, and GitLab CI/CD. I will focus on GitHub Actions. It isn’t a popular tool in the DevOps space since it’s relatively limited, but for our needs as developers, it’s pretty great. Infrastructure as Code (IaC) tools: IaC tools let us manage our infrastructure as if it was source code. This makes it easier to automate the provisioning, configuration, and deployment of infrastructure. Popular IaC tools include Terraform, CloudFormation, and Ansible. I also like Pulumi, which lets you use regular programming languages to describe the infrastructure, including Java. Containerization: Containerization technologies, such as Docker, allow you to package and deploy applications in a consistent and portable way, making it easier to move applications between development, testing, and production environments. Orchestration: Orchestration refers to the automated coordination and management of multiple tasks and processes, often across multiple systems and technologies. In DevOps, orchestration is used to automate the deployment and management of complex, multi-tier applications and infrastructure.Popular orchestration tools include Kubernetes, Docker Swarm, and Apache Mesos. These tools allow teams to manage and deploy containers, automate the scaling of applications, and manage the overall health and availability of their systems. Monitoring and logging tools: Monitoring and logging tools allow you to keep track of the performance and behavior of your systems and applications. Popular monitoring tools include Nagios, Zabbix, and New Relic. Prometheus and Grafana are probably the most popular in this field in recent years. Popular logging tools include ELK Stack (Elasticsearch, Logstash, and Kibana), Graylog, and Fluentd. Configuration management tools: Configuration management tools, such as Puppet, Chef, and Ansible, allow you to automate the configuration and management of your servers and applications. Cloud computing platforms: Cloud computing platforms, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). They provide the infrastructure and services necessary for DevOps practices. In addition to these tools, it is also important to understand DevOps practices and methodologies, such as agile. Remember, the specific tools and techniques you need to learn will depend on the needs of your organization and the projects you are working on. However, by having a solid understanding of the most commonly used tools and techniques in DevOps, you will be well-prepared to tackle a wide range of projects and challenges. Most features and capabilities are transferable. If you learn CI principles in one tool, moving to another won’t be seamless. But it will be relatively easy. Version Control We all use Git, or at least I hope so. Git's dominance in version control has made it much easier to build solutions that integrate deeply. As developers, Git is primarily viewed as a version control system that helps us manage and track changes to our codebase. We use Git to collaborate with other developers, create and manage branches, merge code changes, and track issues and bugs. Git is an essential tool for developers as it allows them to work efficiently and effectively on code projects. DevOps have a different vantage point. Git is viewed as a critical component of the CI/CD pipeline. In this context, Git is used as a repository to store code and other artifacts such as configuration files, scripts, and build files. DevOps professionals use Git to manage the release pipeline, automate builds, and manage deployment configurations. Git is an important part of the DevOps toolchain as it allows for the seamless integration of code changes into the CI/CD pipeline, ensuring the timely delivery of software to production. Branch Protection By default, GitHub projects allow anyone to commit changes to the main (master) branch. This is problematic in most projects. We usually want to prevent commits to that branch so we can control the quality of the mainline. This is especially true when working with CI, as a break in the master can stop the work of other developers. We can minimize this risk by forcing everyone to work on branches and submit pull requests to the master. This can be taken further with code review rules that require one or more reviewers. GitHub has highly configurable rules that can be enabled in the project settings. As you can see here. Enabling branch protection on the master branch in GitHub provides several benefits, including: Preventing accidental changes to the master branch: By enabling branch protection on the master branch, you can prevent contributors from accidentally pushing changes to the branch. This helps to ensure that the master branch always contains stable and tested code. Enforcing code reviews: You can require that all changes to the master branch be reviewed by one or more people before they are merged. This helps to ensure that changes to the master branch are high quality and meet the standards of your team. Preventing force pushes: Enabling branch protection on the master branch can prevent contributors from force-pushing changes to the branch, which can overwrite changes made by others. This helps to ensure that changes to the master branch are made intentionally and with careful consideration. Enforcing status checks: You can require that certain criteria, such as passing tests or successful builds, are met before changes to the master branch are merged. This helps to ensure that changes to the master branch are of high quality and do not introduce new bugs or issues. Overall, enabling branch protection on the master branch in GitHub can help to ensure that changes to your codebase are carefully reviewed, tested, and of high quality. This can help to improve the stability and reliability of your software. Working With Pull Requests As developers, we find that working with branches and pull requests allow us to collect multiple separate commits and changes to a single feature. This is one of the first areas of overlap between our role as developers and the role of DevOps. Pull requests let us collaborate and review each other's code before merging it into the main branch. This helps identify issues and ensures that the codebase remains stable and consistent. With pull requests, the team can discuss and review code changes, suggest improvements, and catch bugs before they reach production. This is critical for maintaining code quality, reducing technical debt, and ensuring that the codebase is maintainable. The role of DevOps is to tune the quality vs. churn. How many reviewers should we have for a pull request? Is a specific reviewer required? Do we require test coverage levels? DevOps needs to tune the ratio between developer productivity, stability, and churn. By increasing the reviewer count or forcing review by a specific engineer, we create bottlenecks and slow development. The flip side is a potential increase in quality. We decide on these metrics based on rules of thumb and best practices. But a good DevOps engineer will follow through with everything with metrics that help an informed decision down the road. E.g., If we force two reviewers, we can then look at the time it takes to merge a pull request which will probably increase. But we can compare it to the number of regressions and issues after the policy took place. That way, we have a clear and factual indication of the costs and benefits of a policy. The second benefit of pull requests is their crucial role in the CI/CD process. When a developer creates a pull request, it triggers an automated build and testing process, which verifies that the code changes are compatible with the rest of the codebase and that all tests pass. This helps identify any issues early in the development process and prevents bugs from reaching production. Once the build and test processes are successful, the pull request can be merged into the main branch, triggering the release pipeline to deploy the changes to production. I will cover CI more in-depth in the next installment of this series. Finally I feel that the discussion of DevOps is often very vague. There are no hard lines between the role of a DevOps engineer and the role of a developer since they are developers and are a part of the R&D team. DevOps navigate over that fine line between administration and development. They need to satisfy the sometimes conflicting requirements on both ends. I think understanding their jobs and tools can help make us better developers, better teammates, and better managers. Next time we’ll discuss building a CI pipeline using GitHub actions. Working on your artifacts. Managing secrets and keeping everything in check. Notice we won’t discuss the continuous delivery in great detail at this stage because that would drag us into a discussion of deployment. I fully intend to circle back to it and discuss CD as well once we cover deployment technologies such as IaC, Kubernetes, Docker, etc.
What Exactly Is DevOps Anyway? Way back in 2006, Amazon CTO Werner Vogels got the ball rolling when he famously said, “Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, throw it over, and then forget about it. Not at Amazon. You build it; you run it. This brings developers into contact with the day-to-day operation of their software.” While the “you build it, you run it” mantra has become synonymous with DevOps, the actual application has not been so clear for many organizations. Dozens of articles and Reddit posts would seem to indicate that there is a very big range of opinions on this topic. Is DevOps a cultural approach that bridges the gap between siloed ops and dev teams that get them to cooperate more closely? Perhaps it really is, as Werner probably intended, where software developers take full responsibility for infrastructure and operational processes. This would make it overlap a bit with the SRE role described by Benjamin Treynor Sloss, VP of Engineering, Google and founder of SRE, as “what you get when you treat operations as a software problem and you staff it with software engineers.” Or maybe it’s an over-used and abused job title that, in the real world, confuses job seekers, recruiters, and executives alike? As such, should the term DevOps be used in the hiring process, or perhaps recruiters should just list out all the required skills the candidate will need to have? And What of Platform Engineering? Is This Something New or Just a Variation of DevOps? Some might even argue that DevOps is just a sexier way to describe a Sysadmin role! Are companies hurting themselves by trying to force the organization to adopt DevOps without fully understanding what it’s meant to accomplish? Maybe true DevOps is only relevant for very advanced organizations like Amazon, Netflix, and other tech elites? And what is the most important ingredient for successful DevOps? Is it about excellent communication between dev and ops? Is it about using the best automation and tooling? Or perhaps great DevOps can only be achieved if a company has developers who are infrastructure and operations experts and can handle managing it all on top of their daily coding. Finally, assuming that there is some consensus that DevOps is dead, is it time to pay our respects and say farewell? In this fast and furious session, these questions and more are hotly debated by Andreas Grabner, DevOps Activist at Dynatrace, Angel Rivera, Principal Developer Advocate at CircleCI, Oshrat Nir, Head of Product Marketing at Armo, and Fabian Met, CTO at Fullstaq. Each of the panelists brings years of technology experience at leading companies and shares their unique perspectives and opinions on whether DevOps is dead or not. The debate is moderated by Viktor Farcic, a popular YouTube educator, Upbound Developer Advocate, CDF ambassador, and published author. So put your headphones on, grab some popcorn, and enjoy!
Testing is a crucial part of the Software Development Lifecycle (SDLC). Testing should be included in every stage of the SDLC to get faster feedback and bake the quality within the product. Test automation can get you excellent results if it is implemented and used in an efficient way and continuous testing is the right approach. According to Markets and Markets, the continuous testing market is expected to grow at a compound annual growth rate of 15.9% during the forecast period of 2018-2023 and reach $2.41 billion by 2023. 2017 was considered the base year for estimating the market size. In this article, we will discuss what continuous testing is, how to implement it, and what benefits we can get out of it. What Is Continuous Testing? Continuous testing helps provide faster feedback in all stages of the Software Development Lifecycle (SDLC). In most SDLC cases, it is seen that minimal automated tests are written at the core level, hence, increasing the pressure on the top level of the test pyramid to perform manual exploratory testing. This actually hits the quality because catching an error after the development is complete is very costly. The table below shows the cost to fix a bug at Google and you can see it costs a whopping $5000 when a bug is discovered in the “System Testing” phase. Continuous testing helps us evaluate this fear of software failing by providing early feedback as soon as the code is committed to the repository. The main goal of continuous testing is to test early at all stages of the SDLC with automation, test as often as possible, and get faster feedback on the builds. You might know about the Go/No-Go meetings, which are set up before every release, this meeting helps you find out whether you are headed in the right direction. It helps you decide whether you are good to release the application with the respective features in the production or not. Similarly, continuous testing works, it provides you the test results based on what you decide you can move to the next stage of development. Using continuous testing, we can fix all failures as soon as they occur before moving on to the next stage, which eventually helps save time and money. Why Is Continuous Testing Needed? In one of my previous projects, we were working on a mobile application to be developed for iOS and Android platforms. The client wanted everything to be automated from the inception itself. Any bug leakage into production meant it will impact the business directly and cost millions of dollars. We were asked to present a plan for automation where testing would be carried out in every stage of the development to minimize the risk of bug leakage. Hence, we decided to implement the test pyramid and create a CI/CD pipeline where testing would be done continuously at every stage. Here is the graphical representation of the CI/CD Pipeline that. can also be taken as a practical guide to implementing continuous testing in a project: To bake the quality within the product, we came up with a plan to introduce testing at every stage in the pipeline, and as soon as any red flag appears, it should be fixed before we move on to another phase. So, as soon as the dev commits the code to the remote repository, the following scans would run: Static code analysis: This will ensure the best coding practices are followed and alert us with code smells in case of any errors. SecOps scan: This will scan the code and all the libraries used within the code for any security vulnerabilities and raise a red flag in case of “Critical,” “High,” “Medium,” or “Low” level vulnerabilities that should be taken care of. Once the above scans are successful, the pipeline would move ahead and run the following tests in the development environment: Unit tests Integration tests System tests End-to-end tests All of the above tests will ensure the code is working perfectly as expected. If any of the above tests fails, the pipeline will break and a red flag will be raised. It is the developer’s responsibility to fix those respective failing tests. It’s not about playing the blame game, but finding the commit that broke the build and fixing it. The team will offer help to the developer to fix the issue. After all the above-mentioned tests pass successfully, the build will be deployed to the QA environment where end-to-end automated tests will run on the QA build as a part of regression tests. Once the end-to-end automated tests pass on a QA build, QA will pick up the build and perform manual exploratory tests to uncover any further defects in the build. Once QA signs off the build, it will eventually be deployed to the UAT env where further rounds of testing will be done by the UAT team of testers. Finally, after the sign off, the build will be deployed into production. This plan worked for us tremendously as we uncovered many issues in the first and second stage of the pipeline when the unit and integration tests were run. The Static Code Analysis and SecOps scan helped us implement the best coding practices and fixing vulnerable libraries by updating to the latest version or discarding and using the libraries that were less prone to vulnerabilities and also frequently updating them so the code is less prone to security risks. Though we also discovered issues in the exploratory testing, which was done manually; however, those were not that critical. Most of the issues were resolved in the initial phase, which provided us faster feedback. Continuous Testing: The Need of the Hour No Longer Good but a Must Have in the SDLC Lifecycle From my experience, the following points are derived, which states why continuous testing is needed: Requirements change frequently: With the requirement changing frequently, the need arises to change the code as well, and with every change we do, there is a risk involved. There are two risks involved here: Whether the changed code will work as expected. If this change impacts the existing code. With continuous testing, we can tackle both of these risks by setting up an automated pipeline, which will run the unit, integration, and, eventually, automated regression tests. Continuous integration:With agile development in place, continuous integration has gained a lot of popularity where developers merge their code to the main branch as often as possible, to make it production ready. Continuous testing helps us here because before merging takes place, the code goes through a pipeline where automated tests are run against the build. If there is a failure, the code does not merge and a red flag is raised. Production ready: With continuous testing, we can be production ready as all our checks and tests run on an automated pipeline as soon as the developer commits the code. Reduce human errors: In the case of regression tests, if an automated test is written, it can serve as a documentation proof for the feature and help reduce human errors in testing. Advantages of Continuous Testing Fast feedback: In the traditional software development process, the team had to wait for the tester’s feedback, who would test the build manually after the developer completed writing the feature from their end. After the tester’s feedback, they had to rework to fix the issues that were time-consuming and more costly. With continuous testing, we can get faster feedback on the newly committed code and save time and money. Quality baked within the product: With all tests running in the automated pipeline, from unit, integration, functional, security, performance, and end-to-end user journeys, we can be sure that quality is baked within the product itself and need not worry about releasing it to production. Reduces bug leakage: Continuous testing helps in eliminating the chances of bugs occurring in the build by providing us with timely updates about software failures. Minimize the risks: It also helps find the risk, address them, and improve the quality of the product. Important Types of Continuous Testing Unit tests: This involves testing a piece of code in isolation. Basically, testing every method written for the feature. The main objective of this test is to check that the code is working as expected, meaning all the functionalities, inputs, outputs, and performance of the code are as desired. Integration tests: This involves testing the two modules together. The goal of this test is to ensure the integration between the two components is working fine. Regression tests: This is the most widely used test and is used to ensure the existing functionality of the application is working as expected after the latest addition or modification to the code repository, End-to-end Journey tests: These tests are added to check the end-to-end working of the software. The goal of these tests is to ensure the end user is able to use the application end to end. Future of Continuous Testing With the ever-increasing demand for high-quality software and the economies flourishing with digitalization at its core, continuous testing is considered an important aspect. A software company is required to respond to frequent changes happening daily in the SDLC and continuous testing is the answer. The following points are the benefits of continuous testing: To adapt to the frequent changes in the software. To achieve maximized automation in the delivery cycle and avoid loopholes in the process. Minimize human errors. To provide a cost-effective solution to the end customer. Beat the competition and outperform the competitors by releasing bug-free software. Baking the quality within the product. As technology progresses, so is the need to upgrade the process. With continuous testing, best results can be achieved. Role of Cloud Services Platform in Continuous Testing Last year, I was working on a mobile application development project, which had iOS and Android versions to be rolled out. Now, as it was planned to be rolled out in different regions around Germany, we studied mobile phone usage in the different areas of Germany and came to know that Android and iPhones are used in all areas. Hence, we concluded that we will need a combination of at least six devices (three Android devices and three iPhones) to test our build, two devices to test minimum supported versions, two devices with the latest versions, and two devices with random versions between the highest and minimum supported ones. The hard part was to get these devices, as mobile phones and their versions are updated nearly every 2-3 months, so even if the organization invested and bought these devices, it would be required to update them occasionally when the new version launches. Here, the cloud platform services came to the rescue and we simply purchased the right plan as per our requirement and got access to the required real devices, which helped us test the applications smoothly by running automated and manual exploratory tests on the real devices on the cloud platform. In today’s fast-paced world, there are multiple platforms where software works, from browsers to mobile phones and tablets. As we release the application to production, we need to make sure it runs on all the desired platforms and fix the things we find that are not working. However, to do that, we need to test it on the respective devices/browsers to ensure it works hassle-free. This is possible but will cost money and time as we will have to purchase the hardware and provide the required resources to make it work. From hiring engineers to setting up the infrastructure. As we are testing continuously, performing parallel runs on the different browsers and their respective versions OR on different mobile devices with different OS versions, these services help us test continuously by providing us with the required device, browsers/OS and their variety of versions, so we catch the bugs early and using the early feedback to fix the required issue and stop the bug leakage. Conclusion Quality is a crucial part of software and needs to be baked in the software. Continuous testing helps us build the right product by implementing testing at every phase of the Software Development Lifecycle. We need to be production ready with every feature we build, which is necessary to get fast feedback with a fail-fast strategy. There are various testing types available that help us implement continuous testing using automated pipelines. Cloud Services Platforms provide us with the right infrastructure to keep up the pace with testing continuously by providing the required infrastructure.
Boris Zaikin
Senior Software Cloud Architect,
Nordcloud GmBH
Pavan Belagatti
Developer Advocate,
Harness
Nicolas Giron
Site Reliability Engineer (SRE),
KumoMind
Alireza Chegini
DevOps Architect / Azure Specialist,
Smartwyre