Agile, Waterfall, and Lean are just a few of the project-centric methodologies for software development that you'll find in this Zone. Whether your team is focused on goals like achieving greater speed, having well-defined project scopes, or using fewer resources, the approach you adopt will offer clear guidelines to help structure your team's work. In this Zone, you'll find resources on user stories, implementation examples, and more to help you decide which methodology is the best fit and apply it in your development practices.
The concept of observability involves understanding a system’s internal states through the examination of logs, metrics, and traces. This approach provides a comprehensive system view, allowing for a thorough investigation and analysis. While incorporating observability into a system may seem daunting, the benefits are significant. One well-known example is PhonePe, which experienced a 2000% growth in its data infrastructure and a 65% reduction in data management costs with the implementation of a data observability solution. This helped mitigate performance issues and minimize downtime. The impact of Observability-Driven Development (ODD) is not limited to just PhonePe. Numerous organizations have experienced the benefits of ODD, with a 2.1 times higher likelihood of issue detection and a 69% improvement in the mean time to resolution. What Is ODD? Observability-Driven Development (ODD) is an approach to shift left observability to the earliest stage of the software development life cycle. It uses trace-based testing as a core part of the development process. In ODD, developers write code while declaring desired output and specifications that you need to view the system’s internal state and process. It applies at a component level and as a whole system. ODD is also a function to standardize instrumentation. It can be across programming languages, frameworks, SDKs, and APIs. What Is TDD? Test-Driven Development (TDD) is a widely adopted software development methodology that emphasizes the writing of automated tests prior to coding. The process of TDD involves defining the desired behavior of software through the creation of a test case, running the test to confirm its failure, writing the minimum necessary code to make the test pass, and refining the code through refactoring. This cycle is repeated for each new feature or requirement, and the resulting tests serve as a safeguard against potential future regressions. The philosophy behind TDD is that writing tests compels developers to consider the problem at hand and produce focused, well-structured code. Adherence to TDD improves software quality and requirement compliance and facilitates the early detection and correction of bugs. TDD is recognized as an effective method for enhancing the quality, reliability, and maintainability of software systems. Comparison of Observability and Testing-Driven Development Similarities Observability-Driven Development (ODD) and Testing-Driven Development (TDD) strive towards enhancing the quality and reliability of software systems. Both methodologies aim to ensure that software operates as intended, minimizing downtime and user-facing issues while promoting a commitment to continuous improvement and monitoring. Differences Focus: The focus of ODD is to continuously monitor the behavior of software systems and their components in real time to identify potential issues and understand system behavior under different conditions. TDD, on the other hand, prioritizes detecting and correcting bugs before they cause harm to the system or users and verifies software functionality to meet requirements. Time and resource allocation: Implementing ODD requires a substantial investment of time and resources for setting up monitoring and logging tools and infrastructure. TDD, in contrast, demands a significant investment of time and resources during the development phase for writing and executing tests. Impact on software quality: ODD can significantly impact software quality by providing real-time visibility into system behavior, enabling teams to detect and resolve issues before they escalate. TDD also has the potential to significantly impact software quality by detecting and fixing bugs before they reach production. However, if tests are not comprehensive, bugs may still evade detection, potentially affecting software quality. Moving From TDD to ODD in Production Moving from a Test-Driven Development (TDD) methodology to an Observability-Driven Development (ODD) approach in software development is a significant change. For several years, TDD has been the established method for testing software before its release to production. While TDD provides consistency and accuracy through repeated tests, it cannot provide insight into the performance of the entire application or the customer experience in a real-world scenario. The tests conducted through TDD are isolated and do not guarantee the absence of errors in the live application. Furthermore, TDD relies on a consistent production environment for conducting automated tests, which is not representative of real-world scenarios. Observability, on the other hand, is an evolved version of TDD that offers full-stack visibility into the infrastructure, application, and production environment. It identifies the root cause of issues affecting the user experience and product release through telemetry data such as logs, traces, and metrics. This continuous monitoring and tracking help predict the end user’s perception of the application. Additionally, with observability, it is possible to write and ship better code before it reaches the source control, as it is part of the set of tools, processes, and culture. Best Practices for Implementing ODD Here are some best practices for implementing Observability-Driven Development (ODD): Prioritize observability from the outset: Start incorporating observability considerations in the development process right from the beginning. This will help you identify potential issues early and make necessary changes in real time. Embrace an end-to-end approach: Ensure observability covers all aspects of the system, including the infrastructure, application, and end-user experience. Monitor and log everything: Gather data from all sources, including logs, traces, and metrics, to get a complete picture of the system’s behavior. Use automated tools: Utilize automated observability tools to monitor the system in real-time and alert you of any anomalies. Collaborate with other teams: Collaborate with teams, such as DevOps, QA, and production, to ensure observability is integrated into the development process. Continuously monitor and improve: Regularly monitor the system, analyze data, and make improvements as needed to ensure optimal performance. Embrace a culture of continuous improvement: Encourage the development team to embrace a culture of continuous improvement and to continuously monitor and improve the system. Conclusion Both Observability-Driven Development (ODD) and Test-Driven Development (TDD) play an important role in ensuring the quality and reliability of software systems. TDD focuses on detecting and fixing bugs before they can harm the system or its users, while ODD focuses on monitoring the behavior of the software system in real-time to identify potential problems and understand its behavior in different scenarios. Did I miss any of the important information regarding the same? Let me know in the comments section below.
During a practice meeting at my organization, a team member mentioned taking a class on LeSS (Large-Scale Scrum). Many questions were asked as to how LeSS differed from SAFe. I volunteered to present a comparison in a later meeting. The larger Agile community might benefit from this information as well. The below article will attempt to answer the following questions: What are the differences? Why do companies choose one over the other? How do the roles differ? How do the events differ? How do the certifications? What percentage of organizations use SAFe vs. LeSS? Does the organizational structure differ? What are the pros and cons of implementation? What is the average cost and time of education? What is the average time to fully implement? When was SAFe vs. LeSS published? Geographically, where is SAFe vs. LeSS being adopted? What Are the Differences Between SAFe and LeSS Frameworks? SAFe (Scaled Agile Framework) and LeSS (Large-Scale Scrum) are both frameworks used for scaling Agile practices to large organizations, but they have different approaches and principles. SAFe emphasizes a more prescriptive approach, providing detailed guidance and structure for implementing Agile at scale. For example, SAFe defines three levels of planning and execution: portfolio, program, and team, and offers specific roles, artifacts, and ceremonies for each level. It also includes Lean-Agile principles, such as Lean systems thinking, Agile development, and Lean portfolio management. On the other hand, LeSS emphasizes simplicity and adapting to each organization's unique context. It promotes a single-team mindset, emphasizing that all teams should work towards a shared goal and collaborate closely. LeSS defines two frameworks: basic LeSS, which is for up to eight teams, and LeSS Huge, which can support up to thousands of team members. Why Do Companies Choose One Over the Other? The choice between SAFe and LeSS depends on several factors, such as the organization's size, culture, and goals. For example, companies with a more traditional management culture that want a more prescriptive approach may prefer SAFe. In contrast, those with a more Agile mindset and desire more flexibility may prefer LeSS. SAFe is generally better suited for larger organizations, while LeSS may be more appropriate for smaller or mid-sized organizations. Ultimately, the decision between SAFe and LeSS should be based on the organization's specific needs and goals and involve carefully considering and evaluating both frameworks. How Do the Roles Differ From SAFe to LeSS? Framework Level Role Description SAFe Portfolio Portfolio Manager Responsible for setting the strategic direction of the organization Enterprise Architect Responsible for defining the technical direction of the organization Epic Owner Responsible for defining the business value and prioritization of epics Program Release Train Engineer (RTE) Responsible for coordinating and facilitating the Agile Release Train (ART) Product Owner (PO): Responsible for defining the product vision and priorities Scrum Master (SM) Responsible for coaching the team and facilitating the Scrum process Agile Team The cross-functional team responsible for delivering value Team Product Owner (PO) Responsible for defining and prioritizing user stories Scrum Master (SM) Responsible for coaching the team and facilitating the Scrum process Development Team The cross-functional team responsible for delivering user stories LeSS Key Roles Product Owner Responsible for maximizing the value of the product and managing the product backlog Scrum Master Responsible for facilitating the Scrum process and removing impediments Development Team The cross-functional team responsible for delivering the product Other Roles Area Product Owner Responsible for managing the product backlog for a specific area of the product Chief Product Owner Responsible for coordinating the work of multiple Product Owners across the organization How Do the Events Differ From SAFe to LeSS? In SAFe, there are three levels of planning and execution: Portfolio, Program, and Team. Each level has its own set of events. Portfolio level Program level Team level Portfolio Kanban: Visualize and manage the flow of epics and features across the organization Program Increment (PI) Planning: Two-day planning event where teams plan the work for the next Program Increment Sprint Planning: Meeting where the team plans the work for the upcoming Sprint Portfolio Sync: Regular meetings to align the portfolio backlog with the organization's strategy Daily Stand-up: Daily meeting where teams synchronize their work and identify any obstacles Daily Stand-up: Daily meeting where team members synchronize their work and identify any obstacles Portfolio Review: Meeting to review progress and adjust the portfolio backlog Iteration Review: Meeting to review progress and demonstrate the working software Sprint Review: Meeting to review progress and demonstrate the working software Iteration Retrospective: Meeting to reflect on the previous iteration and identify areas for improvement Sprint Retrospective: Meeting to reflect on the previous Sprint and identify areas for improvement In LeSS, the key events are: Event Description Sprint Planning A meeting where the team plans the work for the upcoming Sprint Daily Scrum A daily meeting where team members synchronize their work and identify any obstacles Sprint Review A meeting to review progress and demonstrate the working product Sprint Retrospective A meeting to reflect on the previous Sprint and identify areas for improvement Overall Retrospective A meeting to reflect on the overall progress of the organization Sprint Review (Whole Group) A meeting where multiple teams come together to review progress and demonstrate their work Sprint Planning (Whole Group) A meeting where multiple teams come together to plan their work for the upcoming Sprint SAFe and LeSS have similar events such as Sprint Planning, Daily Stand-up, Sprint Review, and Sprint Retrospective. However, SAFe also includes additional events such as Portfolio Kanban, Portfolio Sync, and PI Planning, while LeSS includes events such as the Overall Retrospective and the Sprint Review (Whole-Group). The choice of events to use will depend on the specific needs of the organization and the scale of the Agile implementation. How Do the Certifications Differ From SAFe to LeSS? Both frameworks offer different certifications to help practitioners develop their skills and knowledge. Here are some key differences between the certifications offered by SAFe and LeSS: Framework Certification Levels Focus Approach Requirements Community SAFe Agilist Practitioner Program Consultant Product Owner/Product Manager Scrum Master Advanced Scrum Master Lean Portfolio Manager Release Train Engineer DevOps Practitioner Architect Agile Product Manager Government Practitioner Agile Software Engineer Focuses on implementing agile practices in large organizations using a framework that integrates several agile methodologies Uses a top-down approach to implementing agile practices at scale, with a prescribed framework and set of practices Certification requires candidates to complete a two-day training course and pass an online exam. Has a large and active community of practitioners and trainers, with numerous resources available for certification candidates LeSS LeSS Practitioner (CLP) LeSS for Executives (CLFE) LeSS Basics (CLB) Focuses exclusively on applying Scrum practices to large-scale projects Takes a more flexible approach, emphasizing the need to adapt Scrum practices to the specific needs of the organization Certification requires candidates to attend a three-day training course, pass an online exam, and demonstrate practical experience applying LeSS practices. It has a smaller community of practitioners and trainers, but it proliferates and offers a supportive and engaged network of practitioners. Overall, the certifications offered by SAFe and LeSS differ in their focus, approach, and requirements. However, both frameworks offer valuable tools and practices for implementing agile at scale, and certification can help practitioners develop their skills and knowledge in this area. What Percentage of Organizations Use SAFe vs. LeSS? There is no definitive answer to what percentage of organizations use SAFe vs. LeSS, as there is no publicly available data on this topic. It can vary depending on factors such as industry, size of the organization, and geographical location. However, according to some surveys and reports, SAFe is more widely adopted than LeSS. For example, the 14th Annual State of Agile Report by VersionOne found that SAFe was the most popular scaling framework, used by 30% of respondents, while LeSS was used by 6%. Similarly, a survey by Agile Alliance found that SAFe was the most used scaling framework, used by 29% of respondents, while LeSS was used by 6%. It's worth noting that both SAFe and LeSS have their proponents and critics, and the choice of scaling framework depends on various factors, including the organization's goals, culture, and context. Therefore, it's essential to evaluate each framework's strengths and weaknesses and choose the one that best fits the organization's needs. Is the Organizational Structure Different Between SAFe and LeSS? Yes, the organizational structure in SAFe and LeSS can differ in some ways. However, both frameworks are designed to help large organizations scale Agile principles and practices. In SAFe, the framework is designed around three levels of organizational structure: Team Level Program Level Portfolio Level cross-functional Agile teams work together to deliver value, following the principles of Scrum or Kanban Agile teams work together to deliver more significant initiatives, called Agile Release Trains (ARTs), aligned with the organization's strategic goals. Strategic planning and governance are performed to align the organization's initiatives and investments with its long-term objectives. LeSS is: Design The Framework Organization Designed around the principles of Scrum, with a focus on simplicity and minimizing unnecessary bureaucracy Encourages organizations to adopt a flat, decentralized organizational structure where all teams work together as part of a single product development effort Organize around a product, rather than a functional or departmental structure, to foster collaboration and focus on delivering value to customers. Overall, while both SAFe and LeSS are designed to help organizations scale Agile practices, they have different approaches to organizational structure, with SAFe being more hierarchical and LeSS emphasizing a flatter, decentralized structure. How Does the Organizational Structure Between SAFe and LeSS Differ? While both SAFe and LeSS are designed to help organizations scale Agile practices, they have different approaches to organizational structure, and how they address organizational change can differ. SAFe: Emphasizes a virtual reporting structure, where Agile teams are organized into Agile Release Trains (ARTs), which are virtual teams that work together to deliver value. The ARTs are aligned with the organization's strategic goals and have clear accountability for the value they deliver. SAFe encourages organizations to keep the existing reporting structure in place but to establish new roles and responsibilities that support Agile practices. LeSS: Emphasizes a physical, organizational change, where organizations restructure themselves to be organized around products or product lines rather than functional or departmental silos. It recommended that organizations adopt a flat, decentralized structure, with all teams working as part of a single product development effort. LeSS emphasizes that this physical reorganization is essential to break down barriers and silos between teams and foster collaboration and innovation. While both SAFe and LeSS can require some organizational change, they have different approaches to addressing it. For example, SAFe emphasizes a virtual reporting structure, while LeSS emphasizes a physical, organizational change to break down silos and foster collaboration. What Are the Pros and Cons of Implementing SAFe vs. LeSS? Implementing SAFe vs. LeSS has several pros and cons. Here are some of the key advantages and disadvantages of each framework: Framework Pros Cons SAFe Provides a structured approach to scaling Agile practices to larger organizations Offers a comprehensive framework with multiple layers of management and control, which can help manage complexity and align the organization's initiatives with its strategic goals Provides a standardized vocabulary and set of practices, which can help facilitate communication and collaboration between teams Implementing it can be complex and challenging, particularly for organizations that still need to start using Agile practices. It may be perceived as too hierarchical and bureaucratic by some Agile practitioners. Implementing it can be expensive, particularly if the organization needs to train many people. LeSS Emphasizes simplicity and decentralized decision-making, which can foster collaboration, innovation, and continuous improvement Encourages a flat, cross-functional organizational structure, which can help break down silos and improve communication and collaboration between teams Offers a flexible framework that can be adapted to the organization's specific needs and context It may require significant organizational change, which can be difficult and time-consuming. Some organizations that prefer more standardized practices may perceive it as too loose and unstructured. It may require a higher level of maturity and expertise in Agile practices to implement effectively. The choice between SAFe and LeSS depends on the organization's specific needs, context, and goals. SAFe may be a better fit for organizations that need a more structured approach to scale Agile practices. In comparison, LeSS may be a better fit for organizations prioritizing flexibility, collaboration, and continuous improvement. What Is the Average Cost and Time of Education for SAFe vs. LeSS? The cost and time of education for SAFe vs. LeSS can vary depending on several factors, such as the level of certification or training, the location, and the training provider. However, here are some general estimates based on the most common training programs: Framework Certification Cost Days SAFe Agilist $995 to $1,295 2-3 days Program Consultant (SPC) $3,995 to $4,995 4-5 days Product Owner/Product Manager (POPM) $995 to $1,295 Two days LeSS LeSS Practitioner (CLP) $1,500 to $3,500 Three days LeSS for Executives (CLFE) $500 to $1,500 One day LeSS Basics (CLB) $500 to $1,500 One day It's important to note that these estimates are only general guidelines, and the actual cost and time of education can vary depending on several factors. Organizations may also incur additional costs for implementing SAFe or LeSS, such as hiring consultants or trainers, purchasing tools or software, and investing in infrastructure and resources to support Agile practices. What Is the Average Time to Fully Implement SAFe vs. LeSS? The time to fully implement SAFe or LeSS can vary depending on several factors, such as the size and complexity of the organization, the level of experience with Agile practices, and the level of commitment from leadership and teams. However, here are some general estimates based on the most common implementation programs: Framework Timeframe Description SAFe Implementation Roadmap 12-24 months Provides a step-by-step guide for implementing SAFe in an organization. The roadmap includes several milestones, such as setting up Agile teams, establishing a portfolio management process, and aligning the organization's strategy with its Agile initiatives. LeSS Implementation Guide 6-12 months Guides on how to implement LeSS in an organization. The guide includes several steps, such as forming cross-functional teams, creating a shared product backlog, and establishing a continuous improvement process. It's important to note that these estimates are only general guidelines, and the actual time to fully implement SAFe or LeSS can vary depending on several factors. Additionally, organizations may implement these frameworks in phases, starting with a pilot project or a specific business unit and gradually expanding to other parts of the organization. Nevertheless, this approach can help manage the complexity and risk of implementing Agile practices at scale. When Was SAFe vs. LeSS Published? Framework Title Year Author(s) SAFe Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise 2011 Dean Leffingwell LeSS Scaling Lean & Agile Development: Thinking and Organizational Tools for Large-Scale Scrum 2010 Craig Larman and Bas Vodde Since their initial publication, SAFe and LeSS has evolved and expanded to incorporate new ideas, best practices, and feedback from the Agile community. Today, both frameworks have a significant following and are widely used by organizations worldwide. Geographically, Where is SAFe vs. LeSS Being Adopted? Framework Strong Presence in Other regions deployed SAFe United States Europe, Asia, and Australia LeSS Europe United States, Asia, and Australia Both frameworks have been translated into multiple languages, and active communities of users and practitioners worldwide exist. However, adopting either framework may depend on factors such as the local business culture, regulatory environment, and availability of trained professionals. References 15 Bureaucratic Leadership Style Advantages and Disadvantages Agile Software Requirements: Lean Requirements Practices for Teams Petrini, Stefano, and Jorge Muniz. "Scrum Management Approach Applied In Aerospace Sector." IIE Annual Conference. Proceedings, Institute of Industrial and Systems Engineers (IISE), Jan. 2014, p. 434. Scaling Lean & Agile Development: Thinking and Organizational Tools for Scrum Fundamentals Certified exam Answers - Everything Trending.
Site Reliability Engineering (SRE) is a systematic and data-driven approach to improving the reliability, scalability, and efficiency of systems. It combines principles of software engineering, operations, and quality assurance to ensure that systems meet performance goals and business objectives. This article discusses the key elements of SRE, including reliability goals and objectives, reliability testing, workload modeling, chaos engineering, and infrastructure readiness testing. The importance of SRE in improving user experience, system efficiency, scalability, and reliability, and achieving better business outcomes is also discussed. Site Reliability Engineering (SRE) is an emerging field that seeks to address the challenge of delivering high-quality, highly available systems. It combines the principles of software engineering, operations, and quality assurance to ensure that systems meet performance goals and business objectives. SRE is a proactive and systematic approach to reliability optimization characterized by the use of data-driven models, continuous monitoring, and a focus on continuous improvement. SRE is a combination of software engineering and IT operations, combining the principles of DevOps with a focus on reliability. The goal of SRE is to automate repetitive tasks and to prioritize availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. The benefits of adopting SRE include increased reliability, faster resolution of incidents, reduced mean time to recovery, improved efficiency through automation, and increased collaboration between development and operations teams. In addition, organizations that adopt SRE principles can improve their overall system performance, increase the speed of innovation, and better meet the needs of their customers. SRE 5 Why's 1. Why Is SRE Important for Organizations? SRE is important for organizations because it ensures high availability, performance, and scalability of complex systems, leading to improved user experience and better business outcomes. 2. Why Is SRE Necessary in Today's Technology Landscape? SRE is necessary for today's technology landscape because systems and infrastructure have become increasingly complex and prone to failures, and organizations need a reliable and efficient approach to manage these systems. 3. Why Does SRE Involve Combining Software Engineering and Systems Administration? SRE involves combining software engineering and systems administration because both disciplines bring unique skills and expertise to the table. Software engineers have a deep understanding of how to design and build scalable and reliable systems, while systems administrators have a deep understanding of how to operate and manage these systems in production. 4. Why Is Infrastructure Readiness Testing a Critical Component of SRE? Infrastructure Readiness Testing is a critical component of SRE because it ensures that the infrastructure is prepared to support the desired system reliability goals. By testing the capacity and resilience of infrastructure before it is put into production, organizations can avoid critical failures and improve overall system performance. 5. Why Is Chaos Engineering an Important Aspect of SRE? Chaos Engineering is an important aspect of SRE because it tests the system's ability to handle and recover from failures in real-world conditions. By proactively identifying and fixing weaknesses, organizations can improve the resilience and reliability of their systems, reducing downtime and increasing confidence in their ability to respond to failures. Key Elements of SRE Reliability Metrics, Goals, and Objectives: Defining the desired reliability characteristics of the system and setting reliability targets. Reliability Testing: Using reliability testing techniques to measure and evaluate system reliability, including disaster recovery testing, availability testing, and fault tolerance testing. Workload Modeling: Creating mathematical models to represent system reliability, including Little's Law and capacity planning. Chaos Engineering: Intentionally introducing controlled failures and disruptions into production systems to test their ability to recover and maintain reliability. Infrastructure Readiness Testing: Evaluating the readiness of an infrastructure to support the desired reliability goals of a system. Reliability Metrics In SRE Reliability metrics are used in SRE is used to measure the quality and stability of systems, as well as to guide continuous improvement efforts. Availability: This metric measures the proportion of time a system is available and functioning correctly. It is often expressed as a percentage and calculated as the total uptime divided by the total time the system is expected to be running. Response Time: This measures the time it takes for the infrastructure to respond to a user request. Throughput: This measures the number of requests that can be processed in a given time period. Resource Utilization: This measures the utilization of the infrastructure's resources, such as CPU, memory, Network, Heap, caching, and storage. Error Rate: This measures the number of errors or failures that occur during the testing process. Mean Time to Recovery (MTTR): This metric measures the average time it takes to recover from a system failure or disruption, which provides insight into how quickly the system can be restored after a failure occurs. Mean Time Between Failures (MTBF): This metric measures the average time between failures for a system. MTBF helps organizations understand how reliable a system is over time and can inform decision-making about when to perform maintenance or upgrades. Reliability Testing In SRE Performance Testing: This involves evaluating the response time, processing time, and resource utilization of the infrastructure to identify any performance issues under BAU scenario 1X load. Load Testing: This technique involves simulating real-world user traffic and measuring the performance of the infrastructure under heavy loads 2X Load. Stress Testing: This technique involves applying more load than the expected maximum to test the infrastructure's ability to handle unexpected traffic spikes 3X Load. Chaos or Resilience Testing: This involves simulating different types of failures (e.g., network outages, hardware failures) to evaluate the infrastructure's ability to recover and continue operating. Security Testing: This involves evaluating the infrastructure's security posture and identifying any potential vulnerabilities or risks. Capacity Planning: This involves evaluating the current and future hardware, network, and storage requirements of the infrastructure to ensure it has the capacity to meet the growing demand. Workload Modeling In SRE Workload Modeling is a crucial part of SRE, which involves creating mathematical models to represent the expected behavior of systems. Little's Law is a key principle in this area, which states that the average number of items in a system, W, is equal to the average arrival rate (λ) multiplied by the average time each item spends in the system (T): W = λ * T. This formula can be used to determine the expected number of requests a system can handle under different conditions. Example: Consider a system that receives an average of 200 requests per minute, with an average response time of 2 seconds. We can calculate the average number of requests in the system using Little's Law as follows: W = λ * T W = 200 requests/minute * 2 seconds/request W = 400 requests This result indicates that the system can handle up to 400 requests before it becomes overwhelmed and reliability degradation occurs. By using the right workload modeling, organizations can determine the maximum workload that their systems can handle and take proactive steps to scale their infrastructure and improve reliability and allow them to identify potential issues and design solutions to improve system performance before they become real problems. Tools and techniques used for modeling and simulation: Performance Profiling: This technique involves monitoring the performance of an existing system under normal and peak loads to identify bottlenecks and determine the system's capacity limits. Load Testing: This is the process of simulating real-world user traffic to test the performance and stability of an IT system. Load testing helps organizations identify performance issues and ensure that the system can handle expected workloads. Traffic Modeling: This involves creating a mathematical model of the expected traffic patterns on a system. The model can be used to predict resource utilization and system behavior under different workload scenarios. Resource Utilization Modeling: This involves creating a mathematical model of the expected resource utilization of a system. The model can be used to predict resource utilization and system behavior under different workload scenarios. Capacity Planning Tools: There are various tools available that automate the process of capacity planning, including spreadsheet tools, predictive analytics tools, and cloud-based tools. Chaos Engineering and Infrastructure Readiness in SRE Chaos Engineering and Infrastructure Readiness are important components of a successful SRE strategy. They both involve intentionally inducing failures and stress into systems to assess their strength and identify weaknesses. Infrastructure readiness testing is done to verify the system's ability to handle failure scenarios, while chaos engineering tests the system's recovery and reliability under adverse conditions. The benefits of chaos engineering include improved system reliability, reduced downtime, and increased confidence in the system's ability to handle real-world failures and proactively identify and fix weaknesses; organizations can avoid costly downtime, improve customer experience, and reduce the risk of data loss or security breaches. Integrating chaos engineering into DevOps practices (CI\CD) can ensure their systems are thoroughly tested and validated before deployment. Methods of chaos engineering typically involve running experiments or simulations on a system to stress and test its various components, identify any weaknesses or bottlenecks, and assess its overall reliability. This is done by introducing controlled failures, such as network partitions, simulated resource exhaustion, or random process crashes, and observing the system's behavior and response. Example Scenarios for Chaos Testing Random Instance Termination: Selecting and terminating an instance from a cluster to test the system response to the failure. Network Partition: Partitioning the network between instances to simulate a network failure and assess the system's ability to recover. Increased Load: Increasing the load on the system to test its response to stress and observing any performance degradation or resource exhaustion. Configuration Change: Altering a configuration parameter to observe the system's response, including any unexpected behavior or errors. Database Failure: Simulating a database failure by shutting it down and observing the system's reaction, including any errors or unexpected behavior. By conducting both chaos experiments and infrastructure readiness testing, organizations can deepen their understanding of system behavior and improve their resilience and reliability. Conclusion In conclusion, SRE is a critical discipline for organizations that want to deliver highly reliable, highly available systems. By adopting SRE principles and practices, organizations can improve system reliability, reduce downtime, and improve the overall user experience.
Application development has become an integral part of modern business operations. With the rapid growth of technology and the widespread use of mobile devices, the demand for software applications has increased manifold. Besides, from mobile apps to web applications, businesses require custom solutions that can cater to their specific needs and provide a seamless user experience. In this article, we will discuss the various types of application development, the stages involved in the development process, and the latest trends in the industry. What Is Application Development? Application development is the process of designing, building, and deploying software applications for various platforms such as web, mobile, desktop, and cloud. It involves several stages: requirements gathering, design, development, performance testing, deployment, and maintenance. Furthermore, application development aims to provide software solutions to meet the needs and requirements of businesses and users. It also requires a team of developers, designers, testers, project managers, and other professionals to work collaboratively to ensure the application meets the required quality standards. Application development is a complex process requiring technical skills, creativity, and project management expertise. However, a well-designed and developed application can provide significant benefits to the users and the business, including increased productivity, improved efficiency, and enhanced customer experience. Let's check the type of application development. Types of Application Development There are primarily two types of application development – mobile and web. Mobile applications are designed specifically for mobile devices, whereas web applications are accessible through a web browser. Mobile Application Development Mobile app development involves the creation of software applications specifically for mobile devices such as smartphones and tablets. These apps can be built for various platforms, such as Android, iOS, and Windows. Mobile apps can be native, hybrid, or web-based. The future of Mobile Applications is bright for sure, as mobile users are increasing daily. Developers build native apps for a particular platform using the language specific to that platform. For instance, Java or Kotlin is used to develop Android apps, whereas Swift or Objective-C is used to create iOS apps. Native apps provide better performance, speed, and security than other types of apps. Hybrid apps, on the other hand, are a combination of native and web apps. They are built using web technologies such as HTML, CSS, and JavaScript and are wrapped in a native app container. Hybrid apps provide a better user experience than web apps but may not be as fast as native apps. Web apps are accessed through a web browser and do not require installation on the device. They are written in web technologies such as HTML, CSS, and JavaScript. Web apps are accessible from any device with an internet connection and are platform-independent. However, they may not provide the same level of functionality as native or hybrid apps. Web Application Development Web application development involves the creation of software applications that is accessible through a web browser. Developers build applications that can run on various devices, such as desktops, laptops, and mobile devices. Besides, they use various technologies, such as HTML, CSS, JavaScript, and server-side scripting languages like PHP, Ruby on Rails, and Node.js to build web applications. In some cases, developers also use ready-to-use admin templates as well. An admin template is a collection of web pages created with HTML, CSS, and JavaScript or any JavaScript libraries that are used to form the user interface of a web application's backend. It can save lots of time and money that one needs to invest during web app development. In addition, you can build progressive web apps and SPA using it. There are basically two types of web applications – static and dynamic. Static web applications are basic websites that provide information to the user. They do not require any interaction with the server and are primarily used for informational purposes. On the other hand, dynamic web applications provide a more interactive experience to the user. They require interaction with the server and can provide various functionalities such as user authentication, data storage, and retrieval. Dynamic web applications can be built using various JavaScript frameworks such as AngularJS, ReactJS, and Vue.js. Methodologies of Application Development Successful projects are well-managed. To effectively manage a project, the manager or development team must select the software development methodology that is best suited to the project at hand. Every methodology has different strengths and weaknesses and exists for various reasons. Here's an overview of the most commonly used software development methodologies and why they exist. Agile Development Methodology The Agile development methodology is an iterative and flexible approach to software development that emphasizes collaboration, customer satisfaction, and rapid delivery. The methodology involves breaking the project down into smaller, manageable pieces called sprints, which typically last between one and four weeks. At the end of each sprint, the team reviews and adjusts its progress and priorities based on feedback and changing requirements. There are several benefits of Agile methodology. First, it emphasizes communication and collaboration between developers, customers, and stakeholders, promoting flexibility and adaptability in response to changing business needs. The result is a more customer-focused approach that delivers high-quality software in a shorter timeframe. DevOps Deployment Methodology DevOps deployment methodology is a software development approach that focuses on collaboration and automation between development and operations teams to improve software delivery speed, quality, and reliability. The methodology involves using Continuous Integration and Continuous Deployment tools to automate the build, test, and deployment process, ensuring code changes are thoroughly validated before they are deployed to production. DevOps deployment methodology enables teams to reduce the time and effort required to release new features or updates, allowing them to respond quickly to changing customer needs and market demands. Waterfall Development Method The Waterfall model is a traditional linear software development methodology that flows sequentially through conception, initiation, planning, design, construction, testing, deployment, and maintenance phases. The software development team defines requirements upfront and completes each phase before moving on to the next. The Waterfall methodology can become inflexible when there are changes to requirements or a need for iterative development. Additionally, the methodology may not catch errors until later in the project cycle, making it more difficult and costly to address them. Rapid Application Development Rapid Application Development (RAD) is a software development methodology that emphasizes speed and flexibility in the development process. The methodology involves breaking down the project into small modules and using iterative development to quickly build, test, and refine the software. RAD typically involves the use of visual tools, prototyping tools, and user feedback to speed up the development process. RAD aims to deliver a working prototype of the software to customers early in the development cycle, allowing for early feedback and adjustments. This approach enables developers to quickly respond to changing customer needs and deliver a high-quality product in a short amount of time. Now, let's head toward the stages of Application Development. Stages of Application Development Application development involves several stages, each of which is essential for the success of the project. The following are the various stages of application development: Planning The planning stage is crucial to the success of the project as it sets the foundation for the rest of the development process. During this stage, the development team works with the client to define the project objectives, target audience, and the features and functionalities that the application should have. The team also determines the project scope, budget, and timelines. The outcome of this stage is a comprehensive project plan that outlines the requirements, scope, timelines, and budget. The plan serves as a roadmap for the development team and ensures that everyone is on the same page before proceeding to the design stage. Design The design stage involves creating wireframes and prototypes of the application. This stage consists of the user interface (UI) and user experience (UX) of the application. Next, the design team works with the development team to ensure that the design is technically feasible and aligns with the project requirements. The design team here uses different tools, such as UI Kits, prototyping tools (like adobe and Figma), wireframes, etc. design an appealing app. The outcome of this stage is a visual representation of the application, including the layout, color scheme, typography, and other design elements. Usually, the client reviews the design for approval before proceeding to the development stage. A well-designed application is critical to its success as it directly impacts user engagement and retention. Development The development stage is where the actual coding takes place. Here, the development team follows the design guidelines and uses the required technologies and tools to build the application. The development team usually works in sprints, with each sprint delivering a set of features or functionalities. During the development stage, the team adheres to the best practices for coding, documentation, and version control. The team also ensures the optimization of the code for better performance, security, and scalability. Regular communication and collaboration between the development team, the design team, and the client are essential to ensure that the development aligns with the project requirements. The outcome of this stage is a working application that meets the project requirements. To make sure that application is bug-free, the developer team rigorously tests it with various testing methods. If they find any issues or bugs, then they fix them before proceeding to the deployment stage. A well-developed application is critical to its success as it directly impacts user experience and satisfaction. Testing The testing stage involves validating the functionality, performance, and usability of the application. The testing team uses various testing methods, including unit testing, integration testing, system testing, usability testing, and user acceptance testing (UAT), to ensure that the application works as expected. During the testing stage, the team identifies and documents any issues or bugs and communicates them to the development team for fixing. The team also ensures that the application is accessible, responsive, and user-friendly. The software testing stage is critical to the success of the project as it ensures that the application is ready for deployment. The outcome of this stage is a tested and validated application that meets the project requirements. Next, the testing team provides a report of the testing results, including any issues or bugs, to the development team for fixing. Finally, after resolving all the issues, the application becomes ready for deployment. Deployment The deployment stage involves releasing the application to the production environment. The deployment team follows a deployment plan that outlines the steps required to deploy the application. The team ensures deployment of the application goes without any downtime or disruption to the users. The deployment stage is critical to the success of the project as it ensures that the application is available to the users. The deployment team also ensures that the application is secure and meets the required standards and regulations. In addition, the team monitors the application after deployment to ensure it performs optimally. Any issues or challenges that arise during the deployment process are documented and communicated to the development team for future reference. After the deployment of the application, the development team provides ongoing maintenance and support to ensure that the application continues to function optimally. A successful deployment ensures that the application is accessible to the users and meets their expectations. Maintenance The maintenance stage involves the ongoing support and maintenance of the application. The development team provides ongoing maintenance and support to ensure that the application continues to function optimally. In addition, the team monitors the application for any issues or bugs and fixes them promptly. During the maintenance stage, the development team also ensures that the application is updated with the latest technologies and security patches. In addition, the team also adds new features and functionalities as required by the client. The maintenance stage is critical to the success of the project as it ensures that the application continues to meet the user's requirements and expectations. The outcome of this stage is a well-maintained application that continues to function optimally and meet the user's expectations. In addition, a successful maintenance stage ensures that the application remains relevant and continues to provide value to the users. Now, let's check the app development trends you should know. App Development Trends in 2023 In 2023, many changes will be expected in the app development world. Well, the following are some burning application development trends that will indeed rule the world. Adoption of Cloud Technology The Future of Cloud Engineering is evolving as cloud technology is a game changer in application development. It enables businesses to easily scale their IT infrastructure, reduce costs, and increase agility. The adoption of cloud technology is increasing daily as it provides access to resources and services on-demand, allowing businesses to focus on their core competencies. Application developers can use cloud technology to build and deploy applications in a distributed environment, which allows users to easily access them from any location using any device with an internet connection. As more businesses recognize the advantages of cloud technology and transfer their IT operations to the cloud, this trend will definitely continue. Usage of AI and Machine Learning Technologies AI and machine learning technologies are transforming the way we interact with applications. From personalized recommendations to intelligent chatbots, AI and machine learning are revolutionizing the user experience. ChatGPT is the latest example of it. These technologies enable applications to learn from user behavior and preferences, providing a more personalized experience. Developers use these technologies to improve application performance, optimize resource utilization, and reduce maintenance costs. As more data becomes available, AI and machine learning algorithms become more accurate and effective. This application development trend will continuously evolve as AI and machine learning technologies become more accessible to developers and businesses alike. Metaverse-Like Experiences Metaverse-like experiences are a new trend in application development. These experiences are immersive and interactive, providing users with a virtual environment to explore and interact with. This trend will remain for upcoming years with the increasing popularity of virtual and augmented reality technologies. The metaverse will become a major part of the digital landscape, providing users with a new way to engage with applications and each other. Developers are exploring new ways to incorporate metaverse-like experiences into their applications, creating new opportunities for businesses to engage with their customers. Integration of Mobile Apps With Other Devices and Platforms Integration of mobile apps with other devices and platforms is another trend in application development. The proliferation of mobile devices has led to increasing demand for applications that can be accessed from multiple devices and platforms. As a result, developers are using technologies such as APIs and SDKs to enable seamless integration between mobile apps and other devices and platforms. The core reason behind this application integration trend is the need to offer users a consistent experience, regardless of their device or platform. Developers and businesses can expect this trend to persist as more devices become connected and provide new opportunities. Improved Native Cybersecurity Improved native cybersecurity is a critical trend in application development as privacy and security have become major concerns for businesses and users alike. Furthermore, with the increasing number of cyber threats, it is important for applications to be secure and resilient. Developers are incorporating security features into their applications from the ground up, making security an integral part of the development process. This includes features such as encryption, authentication, and authorization. In addition, as cyber threats continue to evolve, developers are expected to continue to improve native cybersecurity, ensuring that applications remain secure and resilient. Low Code/No Code Is the Future As per a report, the low-code development market will generate $187 billion by 2030. Low-code/no-code platforms are becoming increasingly popular among businesses and app developers. These platforms allow developers to create applications using visual interfaces and drag-and-drop components without requiring extensive programming knowledge. This trend will continue in upcoming years as more businesses and developers embrace the benefits of low-code/no-code platforms, such as faster development times and reduced costs. Conclusion Well, here we briefly discussed application development, consisting of various types, stages, and trends. The intention here is to provide you with a comprehensive overview of the Application Development process, what it requires, and what trends will be vital in 2023. I hope you find this article noteworthy and helpful. If you have any inputs, you can share them with me through the comment section.
I had the opportunity to catch up with Andi Grabner, DevOps Activist at Dynatrace, during day two of Dynatrace Perform. I've known Andi for seven years, and he's one of the people that has helped me understand DevOps since I began writing for DZone. We covered several topics that I'll share in a series of posts. How Do DevOps and SRE Work Together? SRE is the term itself that comes from Google. But essentially, it is the automation that DevOps put into deployment for deploying changes faster by automating the pipeline. SRE is about automating the operational aspects of the software. And automating the operational aspects of software means as an as an SRE, maybe five years ago, you were just calling ITOps. Now, it's called SRE, or Site Reliability Engineering. I think both DevOps and SRE have evolved to use automation and code to automate something in a smart way and also in a codified way. Code is important because you can source control code. You can keep history of all of your pipelines. The same is true for SRE. SRE tries to use the same things automated through code for the operational aspects of your software. Therefore, SRE and DevOps work really nicely in tandem. I have a slide where DevOps and SRE are holding hands. They're holding hands because in the end, it's all about automating delivery through automation. SRE really focuses more on automating the resiliency of the stuff that comes out of DevOps. How About Shift Left Versus Shift Right? Is That an "And," or Is It "And/Or?" It's an "and." Shift left is really about thinking about all of these constraints earlier, how we deal with observability, and encouraging the developers to think about what type of data they need to figure out if the system is healthy. Traces, logs, and starting testing earlier is the classical shifting left. Shifting right is about knowing how my system is performing. It's like knowing the heart rate of my system – like my response time. In development, shifting right means I want to make sure the SRE team that is responsible for running my software, the time shifting, this is how you run it, this is what I want to see from an observability perspective, and these are my thresholds. If these are not met, then I want you to execute these actions from a performance, availability, and reliability perspective. I think we always had the classical Dev and Ops divide. Development would build something and throw it over the wall. Then Operations had to figure out how to run it properly, how to scale it, and how to do capacity control. Now, we're saying we need to look at all of these aspects much earlier. We need to figure out upfront how we do observability in development, not just in operations. That's why we define observability, to test it out. We are taking all of these ingredients and identifying what we are going to observe. Let's also observe it in production. We know what the thresholds are. We know what makes our system healthy. Let's make sure we are also validating this in production. We know if something is failing in testing, what do we do to bring the system back to an ideal state. Let's codify this also in production to bring the system back in an automated way. That's my definition of shifting right.
Companies are in continuous motion: new requirements, new data streams, and new technologies are popping up every day. When designing new data platforms supporting the needs of your company, failing to perform a complete assessment of the options available can have disastrous effects on a company’s capability to innovate and make sure its data assets are usable and reusable in the long term. Having a standard assessment methodology is an absolute must to avoid personal bias and properly evaluate the various solutions across all the needed axes. The SOFT Methodology provides a comprehensive guide of all the evaluation points to define robust and future-proof data solutions. However, the original blog doesn’t discuss a couple of important factors: why is applying a methodology like SOFT important? And, even more, what risks can we encounter if we’re not doing so? This blog aims to cover both aspects. The Why Data platforms are here to stay: the recent history of technology has told us that data decisions made now have a long-lasting effect. We commonly see a frequent rework of the front end, but radical changes in the back-end data platforms used are rare. Front-end rework can radically change the perception of a product, but when the same is done on a backend the changes are not immediately impacting the end users. Changing the product provider is nowadays quite frictionless, but porting a solution across different backend tech stacks is, despite the eternal promise, very complex and costly, both financially and time-wise. Some options exist to ease the experience, but the code compatibility and performances are never a 100% match. Furthermore, when talking about data solutions, performance consistency is key. Any change in the backend technology is therefore seen as a high-risk scenario, and most of the time refused with the statement “don’t fix what isn’t broken." The fear of change blocks both new tech adoption as well as upgrades of existing solutions. In summary, the world has plenty of examples of companies using backend data platforms chosen ages ago, sometimes with old, unsupported versions. Therefore, any data decision made today needs to be robust and age well in order to support the companies in their future data growth. Having a standard methodology helps understand the playing field, evaluate all the possible directions, and accurately compare the options. The Risks of Being (Data) Stuck Ok, you’re in the long-term game now. Swapping back-end or data pipeline solutions is not easy, therefore selecting the right one is crucial. But what problems will we face if we fail in our selection process? What are the risks of being stuck with a sub-optimal choice? Features When thinking about being stuck, it’s tempting to compare the chosen solution with the new and shiny tooling available at the moment, and their promised future features. New options and functionalities could enhance a company’s productivity, system management, integration, and remove friction at any point of the data journey. Being stuck with a suboptimal solution without a clear innovation path and without any capability to influence its direction puts the company in a potentially weak position regarding innovation. Evaluating the community and the vendors behind a certain technology could help decrease the risk of stagnating tools. It’s very important to evaluate which features and functionality is relevant/needed and define a list of “must haves” to reduce time spent on due diligence. Scaling The SOFT methodology blog post linked above touches on several directions of scaling: human, technological, business case, and financial. Hitting any of these problems could mean that the identified solution: Could not be supported by a lack of talent Could hit technical limits and prevent growth Could expose security/regulatory problems Could be perfectly fine to run on a sandbox, but financially impractical on production-size data volumes Hitting scaling limits, therefore, means that companies adopting a specific technology could be forced to either slow down growth or completely rebuild solutions starting from a different technology choice. Support and Upgrade Path Sometimes the chosen technology advances, but companies are afraid or can’t find the time/budget to upgrade to the new version. The associated risk is that the older the software version, the more complex (and risky) the upgrade path will be. In exceptional circumstances, the upgrade path could not exist, forcing a complete re-implementation of the solution. Support needs a similar discussion: staying on a very old version could mean a premium support fee in the best case or a complete lack of vendor/community help in a vast majority of the scenarios. Community and Talent The risk associated with talent shortage was already covered in the scaling chapter. New development and workload scaling heavily depend on the humans behind the tool. Moreover, not evaluating the community and talent pool behind a certain technology decision could create support problems once the chosen solution becomes mature and the first set of developers/supporters leave the company without proper replacement. The lack of a vibrant community around a data solution could rapidly decrease the talent pool, creating issues for new features, new developments, and existing support. Performance It’s impossible to know what the future will hold in terms of new technologies and integrations. But selecting a closed solution, with limited (or no) capabilities of integration forces companies to run only “at the speed of the chosen technology,” exposing companies to a risk of not being able to unleash new use cases because of technical limitations. Moreover, not paying attention to the speed of development and recovery could expose limits on the innovation and resilience fronts. Black Box When defining new data solutions, an important aspect is an ability to make data assets and related pipelines discoverable and understandable. Dealing with a black box approach means exposing companies to repeated efforts and inconsistent results which decrease the trust in the solution and open the door to misalignments in the results across departments. Overthinking The opposite risk is overthinking: the more time spent evaluating solutions, the more technologies, options, and needs will pile up, making the final decision process even longer. An inventory of the needs, timeframes, and acceptable performance is necessary to reduce the scope, take a decision, and start implementing. Conclusion When designing a data platform, it is very important to address the right questions and avoid the “risk of being stuck." The SOFT Methodology aims at providing all the important questions you should ask yourself in order to avoid pitfalls and create a robust solution. Do you feel all the risks are covered? Have a different opinion? Let me know!
When two people get together to write code on a single computer, it is given the name of pair programming. Pair programming was popularized by the eXtreme programming book by Kent Beck, in there he reports the technique to develop software in pairs which spiked the interest of researchers in the subject. Lan Cao and Peng Xu found that pair programming leads to a deeper level of thinking and engagement in the task at hand. Pair programming also carries different approaches. There are different styles of pair programming, such as the drive/navigator, ping-pong, strong-style, and pair development. All of them are well described by Birgitta Böckeler and Nina Siessegger. Their article describes the approach to how to practice each style. Here, we will focus especially on only two of them: the drive/navigator and ping-pong, as it seems that both are the most commonly used. The objective is to have a look at what should be avoided when developing software in pairs. First, we briefly introduce each pair programming style, and then we follow the behaviors to avoid. Driver/Navigator For my own taste, the driver and navigator are the most popular among practitioners. In this style, the driver is the one that is writing the code and thinking about the solution in place to make concrete steps to advance in the task at hand. The navigator, on the other hand, is watching the driver and also giving insights on the task at hand. But not only that, the navigator is the one thinking in a broader way, and she's also in charge of giving support. The communication between the driver and navigator is constant. This style also is the one that fits well with the Pomodoro technique. Ping/Pong Ping-pong is the style that "embraces the Test Driven Development" methodology; the reason behind that is the way in which that dynamic works. Let's assume we have a pair that will start working together, Sandra and Clara. The ping/pong session should go something similar to the following: Sandra start writing a failing test Clara makes the test to pass Now, Clara can decide if she wants to refactor Clara now writes a failing test for Sandra The loop repeats It is also possible to expand the ping/pong to a broader approach. One might start a session writing a class diagram, and the next person in the pair implements the first set of classes. Regardless of the style, what is key to the success of pair programming is collaboration. Behaviors To Avoid Despite its popularity, pair programming seems to be a methodology that is not wildly adopted by the industry. When it is, it might vary on what "pair" and "programming" means given a specific context. Sometimes pair programming is used in specific moments throughout the day of practitioners, as reported by Lauren Peate on the podcast Software Engineering Unlocked hosted by Michaela Greiler to fulfill specific tasks. But, in the XP, pair programming is the default approach to developing all the aspects of the software. Due to the variation and interpretation of what pair programming is, companies that adopt it might face some miss conceptions of how to practice it. Often, this is the root cause of having a poor experience while pairing. Lack of soft (social) skills Lack of knowledge in the practice of pair programming In the following sections, we will go over some miss conceptions of the practice. Avoiding those might lead to a better experience when pairing. Lack of Communication The driver and navigator is the style that requires the pair to focus on a single problem at once. Therefore, the navigator is the one that should give support and question the driver's decisions to keep both in sync. When it does not happen, the collaboration session might suffer from a lack of interaction between the pair. The first miss conception of the driver/navigator approach is that the navigator just watches the driver and does nothing; it should be the opposite. As much communication as possible is a sign that the pair is progressing. Of course, we haven't mentioned the knowledge variance that the drive and navigator might have. Multi-Tasking Checking the phone for notifications or deviating attention to another thing that is not the problem at hand is a warning that the pair is not in sync. The advent of remote pair programming sessions might even facilitate such distraction during the session. The navigator should give as much support as possible and even more when the driver is blocked for whatever reason. Some activities that the navigator might want to perform: Checking documentation for the piece of code that the driver is writing Verifying if the task at hand goes towards the end goal of the task (it should prevent the pair from going into a path that is out of scope) Control the Pomodoro cycle if agreed On the other hand, the driver is also expected to write the code and not just be the navigator's puppet. When it happens, the collaboration in the session might suffer, leading to a heavy load on the navigator.
Big teams often struggle with communication, coordination, decision-making, and delivery of large-scale projects. Agile provides a framework to help reduce these issues, allowing teams to move quickly and adapt to changes. It encourages teams to work together more collaboratively, breaking down large projects into smaller, manageable chunks. Agile also helps to prioritize tasks, identify and manage dependencies, and provide clarity for the overall project goals. This helps big teams to stay organized and on track and to make sure everyone is working towards the same objectives. Perhaps you’ve heard about Scaled Agile Frameworks and wonder whether it’s worth investing in one of them to get you to the next level. But, how far you can go with Agile frameworks? There are some blind spots that should be considered when using Agile frameworks. One of the main problems with Agile frameworks is that they structure communication in a way where domain experts are at the top, and their will is taken to production. This poses a risk because there is no guarantee that the software engineers will implement the same thing as the domain experts expect. Therefore, it is important to ensure that all stakeholders are included in the discussion, and software engineers must have a clear understanding of what is going on as Alberto Brandolini says: It is not the domain expert’s knowledge that goes into production, it is the assumption of the developers that goes into production. How are developers supposed to solve a problem when they do not know what the problem is? Are developers’ assumptions right? Is there any shared understanding? This article is about how we can use BDD and DDD tools and technics to overcome complexities, blind spots, and misunderstandings to properly knowledge-crunching and find a ubiquitous language. Domain-Driven Design Is Linguistic In order to communicate effectively, domain experts and the development team must have a shared understanding. Developers do not need to be experts, but they must be familiar with the terms and their context in that domain. For example, if they are working in the health domain, they do not need to know how to treat a patient, but they do need to know the exact meaning of the word shift in that domain. It is not enough to just know the meaning of the term; they must also understand the context in which it is used. Separation of the same term in different contexts is at the core of Domain-Driven Design. Though a term may have the same meaning within two different contexts, this is not our primary consideration. Instead, it is the context that defines the meaning of the specific term that matters most. Let me clarify further. Imagine a cup of coffee with a specific cup, taste, and even brand. When it is served in a coffee shop, it has value and you will pay for it. But, if the same coffee is left on a bench in a park, would you pay for it or even drink it? Of course not - it is the same coffee yet in a different context. This is why context matters. If OOP is in terms of object, DDD is in terms of context. Many developers make mistakes and unintendedly think in terms of data and taking care of certain states rather than behavior. Behavior is not data: data is the product of behaviors under fixed circumstances and it’s what we might make a decision based on. In fact, the behavior that leads to a certain state is more important than the actual state. We can prove this by math: f(x) = y and g(x) = y =/=> f = g. We can not conclude f and g are replaceable based on their final state. As I mentioned above, we preserve the meaning in a specific context, instead of the final state. Remember the coffee example: we do not care how the coffee is made (as it is a normal cup of coffee), but we do care about the context. Coffee is our data, and the coffee shop and park are the contexts in which its behavior is defined. Can we replace them? I mean, we see the final output as a normal cup of coffee - we cannot replace the park with a coffee shop. In other words, we cannot say, “I want a cup of coffee, I can take it from the park or coffee shop - who cares?” We separate park coffee and coffee shop coffee with the help of bounded contexts in Domain-Driven Design (DDD). We encapsulate and make ubiquitous each term within its own context. This means that we would not have a global ubiquitous language; instead, each bounded context would have its own ubiquitous language. Miscommunication during knowledge crunching sessions would have different reasons, such as cognitive bias, which is a type of error in reasoning, decision-making, and perception that occurs due to the way our brains perceive and process information. This type of bias occurs when an individual’s cognitive processes lead them to form inaccurate conclusions or make irrational decisions. For example, when betting on a roulette table, if previous outcomes have landed on red, then we might mistakenly assume that the next outcome will be black; however, these events are independent of each other (i.e., the probability of their results do not affect each other). Also, apophenia is the tendency to perceive meaningful connections between unrelated things, such as conspiracy theories or the moment we think we get it but actually, we do not get it. A good example of this could be an image sent from Mars that includes a shape on a rock that you might think is the face of an alien, but it’s just a random shape of a rock. Sometimes people lie, not in a bad way; I mean, not on purpose. Imagine while you are working on your laptop, your roommate might ask when you to go to the kitchen to turn the light off. After 20 minutes, you go but forget to turn the light off—without any intention. These are inevitable during collaboration sessions, and we need to find ways to reduce them. Developers should collaborate effectively to create a ubiquitous language and have a clear understanding of the different terms in the domain to create proper bounded contexts. However, making sure they understand every problem in the domain is a difficult task. The proper way to address this problem can only be achieved by employing DDD and BDD. These approaches have different tools to cover every blind spot and minimize ambiguity. Domain (Problem Space) For this article, I define a straightforward domain for an imaginary camper-vans rental company within Texas. The main service of the company is renting out campervans, which are managed in the HQ. Clients can only rent available campervans, each of which is separated by its unique car tag. They must be picked up or returned at one of the company’s stations (e.g., Houston, San Antonio, Dallas, Austin, etc.). There is no limitation regarding the return station; that is, campervans can be returned to any of the stations. Clients can cancel their rent before pick up, which means that they would not be able to cancel after they picked up the campervan. Also, they can not have three cancel in a row else their account would be limited. But, they must be returned before the due date of rent. If the client returns late, they must pay a penalty. The penalty is currently a fixed price. Every campervan can be serviced and repaired in the company’s repair garage. Whenever a campervan is in the repair process, it is not available for rent. Every campervan must be repaired after five rentals or three months after the last repair. The company has portable equipment (like portable toilets, bed sheets, sleeping bags, camping tables, chairs, etc.); equipment is added or removed with their respective stock in the HQ of the company. Clients can book any number and type of available equipment for their rent, in addition to their campervan. Equipment is stored at stations and has a limited count at any given point in time. Once a client drives off the station, the available amount of equipment at that station is reduced (by the number of equipment the client took with them) and when the client returns the campervan, the number of equipment at the station is increased accordingly. Since the amount of equipment is limited and during the high season the number of campervans and the equipment used simultaneously is the highest, your business needs to plan ahead for the amount of equipment needed at each station per day in the future. This mitigates the risk of running out of equipment. For simplicity, I have omitted the payment part. Let’s use event storming and BDD to model our domain. Event Storming One of the best approaches for knowledge crunching is event storming. It is a flexible workshop format for the collaborative exploration of complex business domains. For event storming, I will come up with a series of articles soon, but for this article, I will have a glimpse of it. What makes event storming so efficient? It is a rapid, lightweight, intense, highly interactive, and collaborative workshop that helps to build ubiquitous language. It is the most important part of the whole storming, where the common goal is to share the maximum domain knowledge from each of the participants. There are different steps of event storming workshops: a big picture, process modeling, and design. A big picture session is usually used to discover the business domain and share knowledge. Process modeling and design are more focused on system design, and defining aggregates, and involve developers, product owners, UX/UI designers, and engineering managers. Let’s start off with big picture, like a blind man in the room. You may tackle the whole business in this session. As you know in enterprise companies knowledge is shared. Each department has its own expert who does not know about others or knows little. The real value of any brainstorming is people and their knowledge. Therefore, for a successful event storming, inviting the right people with appropriate knowledge is essential as well. It starts with chaotic exploration, sticking orange sticky notes to the wall. On each note, there is a past sentence that describes a domain event. Domain Events Each event is expressed in a past term verb, written down on an orange sticky note, and their respective actors. Here are a few points to help you understand what domain events are: You could read about them in domain books. Domain experts understand them. Writing them in the past tense is a trick to create meaningful events. They are not actions of someone or something. Even though some events will result from actions, we are not interested in actions yet. They are not technical, and should not be specific to our system’s implementation. In the imaginary camper-vans rental company domain, the chaotic exploration phase would be like this: Whenever we come across or agree on a domain word, feel free to write a definition for it on a large yellow sticky note. This is a way to build up a domain of ubiquitous language. This is very helpful to improve communication between all of us. This in turn improves how we work in many different aspects. It should be added to the wall like this: What about a question we cannot answer something that does not seem right, or any problem we should look into? we use purple sticks to park "problems." After this, ask attendees to identify actors (users with a role) that trigger or respond to events. The convention is to use a small yellow sticky note for that. Note: There is no need to add an actor to every event, sticking one at the beginning of a chain of events is enough. Similarly, complex systems also interact with external systems. External systems are not humans, but they could be an online API for example. The convention is to use blue post-its for external systems. Place some at a place where the events interact with them. Command Now, it’s time to focus on the command that triggers appropriate events. Write down "Command, important for the business," on a light blue sticky note and place them left in the event they spawn. A command is a message that represents the intention of a user and could be expressed as an action, like request booking, cancel booking, request refund, etc. Policy Policy artifact is used to document conditions and policies for events to happen. So, on a storming wall, a policy stays between a domain event and a command. Policies are formalized like, "Whenever…X, then…Y," or, "If…X, then…Y." Imagine when a client picks up a camper van we need to update the related equipment stock for the origin and destination stations. For the destination, it must be applied to the expected return date. The policy would be an Equipment Stock Change Policy which states that if a client picks up a camper van, then update the related stock. Aggregate In Domain-Driven Design, aggregates are a cluster of domain objects which can be treated as a single unit. An aggregate will have one of its component objects be the aggregate root. The aggregate root is the only member of the aggregate that outside objects are allowed to hold references. All of the other objects within the aggregate will only be accessed through the aggregate root. This enforces data integrity and consistency within the aggregate. The hard part of Domain-Driven Design is identifying aggregates: it’s always hard and confusing. However, with event storming, this would be much easier and more understandable. In this case, go through all commands and events that are not linked by an external system. Add an empty yellow sticky note there. Please don’t call them aggregates. It’s going to work better if you call them "Business Rules." Ask participants to fill in these business rules: Preconditions: These are things that must be true before a method is called. The method tells clients “this is what I expect from you. "Postconditions: These are the things that must be true after the method is complete. The method tells clients “this is what I promise to do for you." Invariants: Invariants are the things that are always true and won’t change. The method tells clients, if this was true before you called me, I promise it’ll still be true when I’m done. Let’s gather business rules, commands, and events that occur together, no matter where they are in the process. The last step is finding a proper name for your aggregate. Also, I identified bounded contexts: What Is BDD(Behavior-Driven Development)? BDD is a collaborative practice. These days engineers think BDD is Gherkin (given, when, then) or they might know it as Cucumber, which is totally wrong. I mean, BDD is not about UI, log-in, etc. BDD is an iterative approach to fill gaps in Agile. BDD starts off with a team and the team needs to discover first then formalize and at the end implement. The discovery phase is an activity that leads to ubiquitous language through collaboration through examples and conversation on rules. Due to its nature, the discovery practice works best when the team discusses the requirements together. The discussion of a user story usually generates examples. Collecting examples can be started by focusing on simple questions, such as “Could you please give an example for this?” We collect examples for the user story’s rules, which are already defined in the event storm. After that, take examples, and recheck what is going on in the event storming wall to see if we can find any errors or realignments. This will help to improve and justify. At this stage, it is perfectly acceptable to capture the examples as a list of steps that describe the behavior of the system in a particular case. In the formulation step, we transform the examples into scenarios using the "Given/When/Then" keywords. The format of the scenarios and keywords is called Gherkin. The scenarios written with the Gherkin syntax can be executed as automated tests by different tools, like Cucumber. Write Gherkin for rules that have priority. A sample Gerkin for the rental company domain would be: Scenario: As a client, I want to update my rent date. Given that I have a rental request for a campervan When I decide to update my rent Then the rent should be canceled, and I should be able to rent a new campervan on the desired date. Good Gherkin scenarios are business readable, and via automation, they verify the application. When they fail, the failure is understandable both by the business side and the delivery side of the project. So essentially they make a connection between the problem and the solution or in other words the members of the distributed team. Scenarios represent our shared knowledge and shared responsibility to produce quality software. Conclusion Every tool has its own blind spots and every blind spot might be million dollar mistake. Let’s use DDD and BDD tools together to build better models.
DevOps has brought the topic of an organizational culture firmly to the table. While culture was always an element of Agile and Lean, the research into DevOps has shown it's just as important as the more technical capabilities. The DevOps structural equation model has several elements related to people and culture, so it's clear that human issues are an important part of the DevOps picture. The five cultural capabilities in the model are: Climate for learning Westrum organizational culture Psychological safety Job satisfaction Identity The cultural capabilities drive software delivery and operations performance, which predict successful business outcomes. At the same time, there have been several HR hot topics across all industries that have trended over the past few years: Work to rule (quiet quitting) The Great Resignation / Reshuffle The four-day workweek Hybrid working As Emily Freeman (author of DevOps for Dummies) said: "The biggest challenges facing tech aren't technical, but human." So, where should you start when it comes to understanding culture in the context of DevOps? The Fundamental Assumption In 1960, Douglas McGregor published a book called The Human Side of Enterprise. In the book, he described how a fundamental assumption about human behavior results in different management styles. You either believe that: Theory X: People don't want to work and need to be motivated by rewards and punishments. Theory Y: People are intrinsically motivated to do good work. Many decisions in the workplace involve a trade-off, but these fundamental assumptions are mutually exclusive. If you believe Theory X, you: Centralize decision-making. Track individual performance. Use rewards and punishments to motivate workers. With Theory Y, you: Focus on setting clear goals. Let people direct their own efforts. When you follow Theory Y, employees become the organization's most valuable asset. McGregor considered Theory X and Theory Y to be two options a manager would choose from after assessing a workplace. First, you'd review the work and the people and decide whether you need an authoritarian style or a more hands-off approach. We've since learned through the study of system failures that cultures with high trust and low blame are safer than bureaucratic or pathological cultures. Theory Y is foundational to Lean, Agile, and DevOps and is the underlying assumption of a generative culture. Mission Command Although military organizations are traditionally seen as Theory X cultures, modern military units operate using mission command. The mission command pattern decentralizes decision-making by providing clear goals. As a result, the soldiers with boots on the ground can respond dynamically as events unfold rather than waiting for orders. This is the application of Theory Y culture. The civilian version of this is called workplace empowerment, which requires that: You share information with everyone. You create autonomy through boundaries. You replace hierarchy with self-directed teams. Workplace empowerment combines centralized intent with decentralized execution. In software delivery, this typically involves a shared vision implemented by a cross-functional, self-organizing team. Culture Predicts Safety When you feel safe speaking up, nobody will be blamed, and near-misses and minor faults fuel learning. Each incident results in positive action to make the workplace safer, whether the industry is manufacturing, nuclear power plant, aviation, or software delivery. If you don't feel safe to report close calls, the unspoken risks accumulate until, very often, a disaster happens. You don't have to be in a safety-critical industry to benefit from this relationship. The same cultural traits that predict safety are also related to communication, collaboration, innovation, and problem-solving. In addition, culture affects the flow of information, which is critical to all these activities. "In 2022, we found that the biggest predictor of an organization's application-development security practices was cultural, not technical: high-trust, low-blame cultures focused on performance were significantly more likely to adopt emerging security practices." The Accelerate State of DevOps Report, 2022 Theory X management restricts the flow of information and limits who can take action. Managers draw information up and pass decisions back down. Theory Y leadership leads to strong information flow and prompt action in response. Information flows freely, and decisions are made close to the work. Changing Culture Changing team and organization culture is one of the toughest challenges in software delivery. Not even the most complex automation task in your deployment pipeline comes close. You need a clear vision of your intended future state, which needs to be pushed rapidly, firmly, and regularly to ensure the goal remains clear. You need leaders and managers to understand their roles are to enable self-organizing teams that use each team member's talent. You need to move away from systems that centralize information and decision-making and transfer to systems aligned to distributed responsibility. For example, suppose you use centralized tools to organize tasks and assign them to people. In that case, you need to move to a system that aligns with setting a clear mission without removing the ability of teams to self-organize and respond to dynamic situations. You may need to replace a tool entirely or use the tool in a new way. Your Gantt charts might have to go, but your task-tracking app can remain if the team can re-purpose it. The leadership role in a culture change is to: Relentlessly push the desired end state. Reinforce the role of leaders and managers as enablers. Ensure teams become self-organized. A healthy culture should also be clear about the importance of the flow of information and must set a standard for communication style. We follow the Radical Candor approach. This lets us be direct in our communications but in a framework where we all care about each other. Radical Candor lets individuals show courage and challenge others when they might otherwise remain silent. This ultimately means we can all work better without harmful or toxic behavior. You won't make a dent in culture without a clear, robust, and sustained push. You have to overcome inertia and battle organizational immune responses. Despite the difficulty, the research is conclusive that culture is vital to high performance. Conclusion When people talk about culture in the context of DevOps, they're referring to Westrum's generative culture, which is based on McGregor's Theory Y assumption. Simply put, you should aim for a clear, shared mission combined with decentralized decision-making. All modern software delivery methods refer to this concept of empowered teams in different ways. We refer to this as modern workplace culture, yet the ideas are over 100 years old. For example, mission command dates back to the 1800s, Theories X and Y are explained in a book from 1960, and Westrum's typology of organizational cultures was designed in the 1980s. You'll find culture is the toughest nut to crack in DevOps. It's tempting to rely on research and statistics to prove the case for a generative culture. Still, the reality is that cultural change depends on compelling storytelling and creating a compelling vision of what the organization will look like after the transition.
TDD has been a subject of interest for practitioners at least for the last ten years or so, even before that if we take into account the eXtreme Programming practices and the agile manifesto. Despite its claimed popularity today and its symbolism of quality the practice of writing the test before production code is still uneven. It varies based on the practitioner's context, past experiences, and the practitioner's learning path. We could elaborate further on the uneven knowledge of TDD starting from the formal education on the subject, therefore, it might require even more discussions about its applicability. Is it possible to teach effectively TDD without professional project experience? Some might argue that it is possible, while others will say the opposite. Despite great content published by renowned publishers such as O'Reilly, packt, # Addison-Wesley Professional Computing Series, Apress, and Manning the practice of TDD is still a challenge, even the best books, the best examples, cannot automatically translate its content to the unique problems that practitioners face on the daily basis. Katas are a tool that might be used to fill in this gap for both: formality in learning TDD and uniqueness problems that practitioners face. Practicing with katas is not a replacement, it can be understood as an aid instead. The Mismatch With Real Problems Practitioners have tried different approaches to internalize test-driven development. Despite the effort, the mismatch between training and production code exists. The patterns found in practicing Katas are close to green field projects. In the day-to-day, it is most likely that practitioners will join a brownfield project that is not that friendly to maintain. There are books that focus only on this aspect of things, for example, Working efficiently with legacy code by Michael Feathers, Refactoring: Improving the Design of Existing Code by Martin Fowler, Refactoring to Patterns by Joshua Kerievsky, and many more. The patterns that practitioners use for Katas but are usually a mismatch with production code that frequently appears together are: Approach to test from the outside - when should I switch to the unit? Persistent layer - I use an ORM (Object Relational Mapping) or the layers of my application are mixed together There are different approaches that one might take to write code, what is usually shared across the source code is the technique of splitting problems and then combining all the pieces to solve the problem. Let's dive into the chunking and what it means to use it to start tackling the transition from Katas to production code. Chunking The process of chunking happens without one noticing it, but practitioners are experts in this technique. The chunking approach is described by Learning how to learn by Barbara Oakley and also depicted by Felienne Hermans in her book The programmer's Brain. The process is the same as an algorithm: given a complex problem, what are the pieces that compose the problem? and what are the pieces that can be split? With each step, move forward the needle to get the problem solved. Splitting the problem is important, as it gives room for our brain to work without overloading it, we do have limitations to working with information. Looking specifically at practitioners, this is one of the reasons that one might not have the entire system architecture in her head as described in What Makes A Great Software Engineer? by Paul Luo Li, Amy J. Ko, and Jiamin Zhu. Taking a step back, if we are talking about Katas, they are the first step of chunking. In this stage, we are focused (but limited) on: Learn something new (such as the practice of writing tests first) Sharp a skill, given that TDD is a known subject, one might want to try different styles. Such as: with or without test doubles, a new architectural style, a new programming language, etc. Without this first step, it might be difficult for practitioners, on the job, to learn TDD, learn baby steps, learn simple design, learn to refactor, learn architectural styles that might fit the problem at hand, learn the pragmatic approach to a problem, and so on. There is a lot to take into account, Katas abstract that away, and focuses on a single technique that is at hand. For example, take the following list(that is not exhaustive) as the focus point: fizzbuzz focuses on baby steps mars rover focuses on the TDD flow smart fridge focuses on test doubles gilded rose focuses on legacy code Katas are here to ease the process of learning the techniques that are used on the daily basis by practitioners. Of course, this is just the first step, the first chunk that allows practitioners to become effective in their work. Expanding to Production Code Moving from a Kata setting to a professional project that is in production is not as transparent as it might look. Let's take into account brownfield projects, which are the most likely faced project by practitioners. The first barrier that is not easily transported from Katas is that the code might be mixing too many responsibilities in a single class, or that there is too much to understand of the code, as it was a developer that left the company already written, or the dependencies of the project are too many. Regardless of what it might be, the challenges sum up in the bill. Referring back to chunking, the first step here is to identify a single point that can be tackled. This is an important step, as the technique is already in training. Focusing on a single aspect at a time to improve production code plays an important role. Let's think about the following scenario: We have a application that was developed with a MVC (model-View-Control) framework, but the layers have been mixed, there is no clear layering going on, besides that, there are no testing in place, the application is mainly tested through a manual approach. To top that, practitioners want to apply the new techniques they've learned to make the code maintainable. As we already discussed, the key point here is to identify the pieces of the puzzle first. Trying to tackle all the problems that were listed at once might lead to more harm than bring benefits. Let's enumerate the key chunks: Mixed business logic between layers It is difficult to read a line between the layers, often leading to a manual end-to-end test - if it were an API, it would be done through post, if it were a web application, it would be done accessing the browser and navigating as the end user would do. There is no automated testing in place Practitioners want to apply new techniques learned An example of an approach could be implementing a new algorithm that performs faster Restructure the code to fit an architectural style If we think about them one by one, we start to see a correlation with specific Katas we might want to perform: Mixed Business Logic Between Layers Gilded rose is a good candidate for that, applying the technique of golden master, helps to improve the internal structure of the code. There Is no Automated Testing in Place Once again, Gilded rose allows the creation of new test cases that do not exists, as the previous step, used the golden master, now the code should allow practitioners to write new tests that can code specific edge cases that weren't before. Practitioners Want to Apply New Techniques Learned At this point in time, the two previous steps are in place already, with that, the code should be testable enough that the production code is not highly coupled with the test code - This should be a health check, before implementing the new techniques. Can you answer the question: If I refactor, the tests don't change and I have the confidence to release? If the answer is yes, then applying new techniques learned should be doable.
Stefan Wolpers
Agile Coach,
Berlin Product People GmbH
Søren Pedersen
Co-founder,
BuildingBetterSoftware
Hiren Dhaduk
CTO,
Simform
Daniel Stori
Software Development Manager,
AWS