Testing, Deployment, and Maintenance Resources

DZone's Featured Testing, Deployment, and Maintenance Resources

DevOps

The DevOps movement has paved the way for CI/CD and streamlined application delivery and release orchestration. These nuanced methodologies have not only increased the scale and speed at which we release software, but also redistributed responsibilities onto the developer and led to innovation and automation throughout the SDLC.DZone's 2023 DevOps: CI/CD, Application Delivery, and Release Orchestration Trend Report explores these derivatives of DevOps by diving into how AIOps and MLOps practices affect CI/CD, the proper way to build an effective CI/CD pipeline, strategies for source code management and branching for GitOps and CI/CD, and more. Our research builds on previous years with its focus on the challenges of CI/CD, a responsibility assessment, and the impact of release strategies, to name a few. The goal of this Trend Report is to provide developers with the information they need to further innovate on their integration and delivery pipelines.

The Four Pillars of Programming Logic in Software Quality Engineering

By Stelios Manioudakis

CORE

Software development, like constructing any intricate masterpiece, requires a strong foundation. This foundation isn't just made of lines of code, but also of solid logic. Just as architects rely on the laws of physics, software developers use the principles of logic. This article showcases the fundamentals of four powerful pillars of logic, each offering unique capabilities to shape and empower creations of quality. Imagine these pillars as bridges connecting different aspects of quality in our code. Propositional logic, the simplest among them, lays the groundwork with clear-cut true and false statements, like the building blocks of your structure. Then comes predicate logic, a more expressive cousin, allowing us to define complex relationships and variables, adding intricate details and dynamic behaviors. But software doesn't exist in a vacuum — temporal logic steps in, enabling us to reason about the flow of time in our code, ensuring actions happen in the right sequence and at the right moments. Finally, fuzzy logic acknowledges the nuances of the real world, letting us deal with concepts that aren't always black and white, adding adaptability and responsiveness to our code. I will explore the basic strengths and weaknesses of each pillar giving quick examples in Python. Propositional Logic: The Building Blocks of Truth A proposition is an unambiguous sentence that is either true or false. Propositions serve as the fundamental units of evaluation of truth. They are essentially statements that can be definitively classified as either true or false, offering the groundwork for clear and unambiguous reasoning. They are the basis for constructing sound arguments and logical conclusions. Key Characteristics of Propositions Clarity: The meaning of a proposition should be unequivocal, leaving no room for interpretation or subjective opinions. For example, "The sky is blue" is a proposition, while "This movie is fantastic" is not, as it expresses personal preference. Truth value: Every proposition can be conclusively determined to be either true or false. "The sun is a star" is demonstrably true, while "Unicorns exist" is definitively false. Specificity: Propositions avoid vague or ambiguous language that could lead to confusion. "It's going to rain tomorrow" is less precise than "The current weather forecast predicts a 90% chance of precipitation tomorrow." Examples of Propositions The number of planets in our solar system is eight. (True) All dogs are mammals. (True) This object is made of wood. (Either true or false, depending on the actual object) Pizza is the best food ever. (Expresses an opinion, not a factual statement, and therefore not a proposition) It's crucial to understand that propositions operate within the realm of factual statements, not opinions or subjective impressions. Statements like "This music is beautiful" or "That painting is captivating" express individual preferences, not verifiable truths. By grasping the essence of propositions, we equip ourselves with a valuable tool for clear thinking and logical analysis, essential for various endeavors, from scientific exploration to quality coding and everyday life. Propositional logic has operations, expressions, and identities that are very similar (in fact, they are isomorphic) to set theory. Imagine logic as a LEGO set, where propositions are the individual bricks. Each brick represents a simple, declarative statement that can be either true or false. We express these statements using variables like p and q, and combine them with logical operators like AND (∧), OR (∨), NOT (¬), IF-THEN (→), and IF-AND-ONLY-IF (↔). Think of operators as the connectors that snap the bricks together, building more complex logical structures. Strengths Simplicity: Easy to understand and implement, making it a great starting point for logic applications. After all, simplicity is a cornerstone of quality. Efficiency: Offers a concise way to represent simple conditions and decision-making in code. Versatility: Applicable to various situations where basic truth value evaluations are needed. Limitations Limited Expressiveness: Cannot represent relationships between objects or quantifiers like "for all" and "there exists." Higher-order logic can address this limitation. Focus on Boolean Values: Only deals with true or false, not more nuanced conditions or variables. Python Examples Checking if a user is logged in and has admin privileges: Python logged_in = True admin = False if logged_in and admin: print("Welcome, Administrator!") else: print("Please log in or request admin privileges.") Validating user input for age: Python age = int(input("Enter your age: ")) if age >= 18: print("You are eligible to proceed.") else: print("Sorry, you must be 18 or older.") Predicate Logic: Beyond True and False While propositional logic deals with individual blocks, predicate logic introduces variables and functions, allowing you to create more dynamic and expressive structures. Imagine these as advanced LEGO pieces that can represent objects, properties, and relationships. The core concept here is a predicate, which acts like a function that evaluates to true or false based on specific conditions. Strengths Expressive power: Can represent complex relationships between objects and express conditions beyond simple true/false. Flexibility: Allows using variables within predicates, making them adaptable to various situations. Foundations for more advanced logic: Forms the basis for powerful techniques like formal verification. Limitations Increased complexity: Requires a deeper understanding of logic and can be more challenging to implement. Computational cost: Evaluating complex predicates can be computationally expensive compared to simpler propositions. Python Examples Checking if a number is even or odd: Python def is_even(number): return number % 2 == 0 num = int(input("Enter a number: ")) if is_even(num): print(f"{num} is even.") else: print(f"{num} is odd.") Validating email format: Python import re def is_valid_email(email): regex = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$" return re.match(regex, email) is not None email = input("Enter your email address: ") if is_valid_email(email): print("Valid email address.") else: print("Invalid email format.") Combining Forces: An Example Imagine an online store where a user needs to be logged in, have a valid email address, and have placed an order before they can write a review. Here's how we can combine propositional and predicate logic: Python def can_write_review(user): # Propositional logic for basic conditions logged_in = user.is_logged_in() has_email = user.has_valid_email() placed_order = user.has_placed_order() # Predicate logic to check email format def is_valid_email_format(email): # ... (implement email validation logic using regex) return logged_in and has_email(is_valid_email_format) and placed_order In this example, we use both: Propositional logic checks the overall conditions of logged_in, has_email, and placed_order using AND operations. Predicate logic is embedded within has_email, where we define a separate function is_valid_email_format (implementation not shown) to validate the email format using a more complex condition (potentially using regular expressions). This demonstrates how the two logics can work together to express intricate rules and decision-making in code. The Third Pillar: Temporal Logic While propositional and predicate logic focuses on truth values at specific points in time, temporal logic allows us to reason about the behavior of our code over time, ensuring proper sequencing and timing. Imagine adding arrow blocks to our LEGO set, connecting actions and states across different time points. Temporal logic provides operators like: Eventually (◇): Something will eventually happen. Always (□): Something will always happen or be true. Until (U): Something will happen before another thing happens. Strengths Expressive power: Allows reasoning about the behavior of systems over time, ensuring proper sequencing and timing. Verification: This can be used to formally verify properties of temporal systems, guaranteeing desired behavior. Flexibility: Various operators like eventually, always, and until offer rich expressiveness. Weaknesses Complexity: Requires a deeper understanding of logic and can be challenging to implement. Computational cost: Verifying complex temporal properties can be computationally expensive. Abstraction: Requires careful mapping between temporal logic statements and actual code implementation. Traffic Light Control System Imagine a traffic light system with two perpendicular roads (North-South and East-West). We want to ensure: Safety: No cars from both directions ever cross at the same time. Liveness: Each direction eventually gets a green light (doesn't wait forever). Logic Breakdown Propositional Logic: north_red = True and east_red = True represent both lights being red (initial state). north_green = not east_green ensures only one light is green at a time. Predicate Logic: has_waited_enough(direction): checks if a direction has waited for a minimum time while red. Temporal Logic: ◇(north_green U east_green): eventually, either north or east light will be green. □(eventually north_green ∧ eventually east_green): both directions will eventually get a green light. Python Example Python import time north_red = True east_red = True north_wait_time = 0 east_wait_time = 0 def has_waited_enough(direction): if direction == "north": return north_wait_time >= 5 # Adjust minimum wait time as needed else: return east_wait_time >= 5 while True: # Handle pedestrian button presses or other external events here... # Switch lights based on logic if north_red and has_waited_enough("north"): north_red = False north_green = True north_wait_time = 0 elif east_red and has_waited_enough("east"): east_red = False east_green = True east_wait_time = 0 # Update wait times if north_green: north_wait_time += 1 if east_green: east_wait_time += 1 # Display light states print("North:", "Red" if north_red else "Green") print("East:", "Red" if east_red else "Green") time.sleep(1) # Simulate time passing This example incorporates: Propositional logic for basic state changes and ensuring only one light is green. Predicate logic to dynamically determine when a direction has waited long enough. Temporal logic to guarantee both directions eventually get a green light. This is a simplified example. Real-world implementations might involve additional factors and complexities. By combining these logic types, we can create more robust and dynamic systems that exhibit both safety and liveness properties. Fuzzy Logic: The Shades of Grey The fourth pillar in our logic toolbox is Fuzzy Logic. Unlike the crisp true/false of propositional logic and the structured relationships of predicate logic, fuzzy logic deals with the shades of grey. It allows us to represent and reason about concepts that are inherently imprecise or subjective, using degrees of truth between 0 (completely false) and 1 (completely true). Strengths Real-world applicability: Handles imprecise or subjective concepts effectively, reflecting human decision-making. Flexibility: Can adapt to changing conditions and provide nuanced outputs based on degrees of truth. Robustness: Less sensitive to minor changes in input data compared to crisp logic. Weaknesses Interpretation: Defining fuzzy sets and membership functions can be subjective and require domain expertise. Computational cost: Implementing fuzzy inference and reasoning can be computationally intensive. Verification: Verifying and debugging fuzzy systems can be challenging due to their non-deterministic nature. Real-World Example Consider a thermostat controlling your home's temperature. Instead of just "on" or "off," fuzzy logic allows you to define "cold," "comfortable," and "hot" as fuzzy sets with gradual transitions between them. This enables the thermostat to respond more naturally to temperature changes, adjusting heating/cooling intensity based on the degree of "hot" or "cold" it detects. Bringing Them All Together: Traffic Light With Fuzzy Logic Now, let's revisit our traffic light control system and add a layer of fuzzy logic. Problem In our previous example, the wait time for each direction was fixed. But what if traffic volume varies? We want to prioritize the direction with more waiting cars. Solution Propositional logic: Maintain the core safety rule: north_red ∧ east_red. Predicate logic: Use has_waiting_cars(direction) to count cars in each direction. Temporal logic: Ensure fairness: ◇(north_green U east_green). Fuzzy logic: Define fuzzy sets for "high," "medium," and "low" traffic based on car count. Use these to dynamically adjust wait times. At a very basic level, our Python code could look like: Python import time from skfuzzy import control as ctrl # Propositional logic variables north_red = True east_red = True # Predicate logic function def has_waiting_cars(direction): # Simulate car count (replace with actual sensor data) if direction == "north": return random.randint(0, 10) > 0 # Adjust threshold as needed else: return random.randint(0, 10) > 0 # Temporal logic fairness rule fairness_satisfied = False # Fuzzy logic variables traffic_level = ctrl.Antecedent(np.arange(0, 11), 'traffic_level') wait_time_adjust = ctrl.Consequent(np.arange(-5, 6), 'wait_time_adjust') # Fuzzy membership functions for traffic level low_traffic = ctrl.fuzzy.trapmf(traffic_level, 0, 3, 5, 7) medium_traffic = ctrl.fuzzy.trapmf(traffic_level, 3, 5, 7, 9) high_traffic = ctrl.fuzzy.trapmf(traffic_level, 7, 9, 11, 11) # Fuzzy rules for wait time adjustment rule1 = ctrl.Rule(low_traffic, wait_time_adjust, 3) rule2 = ctrl.Rule(medium_traffic, wait_time_adjust, 0) rule3 = ctrl.Rule(high_traffic, wait_time_adjust, -3) # Control system and simulation wait_ctrl = ctrl.ControlSystem([rule1, rule2, rule3]) wait_sim = ctrl.ControlSystemSimulation(wait_ctrl) while True: # Update logic states # Propositional logic: safety rule north_red = not east_red # Ensure only one light is green at a time # Predicate logic: check waiting cars north_cars = has_waiting_cars("north") east_cars = has_waiting_cars("east") # Temporal logic: fairness rule if not fairness_satisfied: # Initial green light assignment (randomly choose a direction) if fairness_satisfied is False: if random.random() < 0.5: north_red = False else: east_red = False # Ensure both directions eventually get a green light if north_red and east_red: if north_cars >= east_cars: north_red = False else: east_red = False elif north_red or east_red: # At least one green light active fairness_satisfied = True # Fuzzy logic: calculate wait time adjustment if north_red: traffic_sim.input['traffic_level'] = north_cars else: traffic_sim.input['traffic_level'] = east_cars traffic_sim.compute() adjusted_wait_time = ctrl.control_output(traffic_sim, wait_time_adjust, defuzzifier=ctrl.Defuzzifier(method='centroid')) # Update wait times based on adjusted value and fairness considerations if north_red: north_wait_time += adjusted_wait_time else: north_wait_time = 0 # Reset wait time when light turns green if east_red: east_wait_time += adjusted_wait_time else: east_wait_time = 0 # Simulate light duration (replace with actual control mechanisms) time.sleep(1) # Display light states and wait times print("North:", "Red" if north_red else "Green") print("East:", "Red" if east_red else "Green") print("North wait time:", north_wait_time) print("East wait time:", east_wait_time) print("---") There are various Python libraries like fuzzywuzzy and scikit-fuzzy that can help to implement fuzzy logic functionalities. Choose one that suits your project and explore its documentation for specific usage details. Remember, this is a simplified example, and the actual implementation will depend on your specific requirements and chosen fuzzy logic approach. This basic example is written for the sole purpose of demonstrating the core concepts. The code is by no means optimal, and it can be further refined in many ways for efficiency, fairness, error handling, and realism, among others. Explanation We define fuzzy sets for traffic_level and wait_time_adjust using trapezoidal membership functions. Adjust the ranges (0-11 for traffic level, -5-5 for wait time) based on your desired behavior. We define three fuzzy rules that map the combined degrees of truth for each traffic level to a wait time adjustment. You can add or modify these rules for more complex behavior. We use the scikit-fuzzy library to create a control system and simulation, passing the traffic_level as input. The simulation outputs a fuzzy set for wait_time_adjust. We defuzzify this set using the centroid method to get a crisp wait time value. Wrapping Up This article highlights four types of logic as a foundation for quality code. Each line of code represents a statement, a decision, a relationship — essentially, a logical step in the overall flow. Understanding and applying different logical frameworks, from the simple truths of propositional logic to the temporal constraints of temporal logic, empowers developers to build systems that are not only functional but also efficient, adaptable, and elegant. Propositional Logic This fundamental building block lays the groundwork by representing basic truths and falsehoods (e.g., "user is logged in" or "file exists"). Conditional statements and operators allow for simple decision-making within the code, ensuring proper flow and error handling. Predicate Logic Expanding on propositions, it introduces variables and relationships, enabling dynamic representation of complex entities and scenarios. For instance, functions in object-oriented programming can be viewed as predicates operating on specific objects and data. This expressive power can enhance code modularity and reusability. Temporal Logic With the flow of time being crucial in software, temporal logic ensures proper sequencing and timing. It allows us to express constraints like "before accessing data, validation must occur" or "the system must respond within 10 milliseconds." This temporal reasoning leads to code that adheres to timing requirements and can avoid race conditions. Fuzzy Logic Not every situation is black and white. Fuzzy logic embraces the shades of grey by dealing with imprecise or subjective concepts. A recommendation system can analyze user preferences or item features with degrees of relevance, leading to more nuanced and personalized recommendations. This adaptability enhances user experience and handles real-world complexities. Each type of logic plays a role in constructing well-designed software. Propositional logic forms the bedrock, predicate logic adds structure, temporal logic ensures timing, and fuzzy logic handles nuances. Their combined power leads to more reliable, efficient, and adaptable code, contributing to the foundation of high-quality software. More

Refcard #368

Getting Started With OpenTelemetry

By Joana Carvalho

CORE

Code Search Using Retrieval Augmented Generation

By Raghavan Muthuregunathan

How To Implement Code Reviews Into Your DevOps Practice

By Joydip Kanjilal

CORE

O11y Guide, Cloud-Native Observability Pitfalls: Ignoring Existing Landscape

Are you looking at your organization's efforts to enter or expand into the cloud-native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud-native observability? When you're moving so fast with agile practices across your DevOps, SREs, and platform engineering teams, it's no wonder this can seem a bit confusing. Unfortunately, the choices being made have a great impact on both your business, your budgets, and the ultimate success of your cloud-native initiatives that hasty decisions upfront lead to big headaches very quickly down the road. In the previous article, we looked at the problem of underestimating cardinality in our cloud-native observability solutions. Now it's time to move on to another common mistake organizations make, that of ignoring our existing landscape. By sharing common pitfalls in this series, the hope is that we can learn from them. This article could also have been titled, "Underestimating Our Existing Landscape." When we start planning to integrate our application landscape into our observability solution, we often end up with large discrepancies between planning and outcomes. They Can't Hurt Me The truth is we have a lot of applications out there in our architecture. The strange thing is during the decision-making process around cloud native observability and scoping solutions, they often are forgotten. Well, not necessarily forgotten, but certainly underestimated. The cost that they bring is in the hidden story around instrumentation. We have auto-instrumentation that suggests it's quick and easy, but often does not bring the exactly needed insights. On top of that, auto-instrumentation generates extra data from metrics and tracing activities that we are often not that interested in. Manual instrumentation is the real cost to provide our exact insights and the data we want to watch from our application landscape. This is what often results in unexpected or incorrectly scoped work (a.k.a., costs) with it as we change, test, and deploy new versions of existing applications. We want to stay with open source and open standards in our architecture, so we are going to end up in the cloud native standards found within the Cloud Native Computing Foundation. With that in mind, we can take a closer look at two technologies for our cloud-native observability solution: one for metrics and one for traces. Instrumenting Metrics Widely adopted and accepted standards for metrics can be found in the Prometheus project, including time-series storage, communication protocols to scrape (pull) data from targets, and PromQL, the query language for visualizing the data. Below you see an outline of the architecture used by Prometheus to collect metrics data. There are client libraries, exporters, and standards in communication to detect services across various cloud-native technologies. They make it look extremely low effort to ensure we can start collecting meaningful data in the form of standardized metrics from your applications, devices, and services. The reality is that we need to look much closer at scoping the efforts required to instrument our applications. Below you see an example of what is necessary to (either auto or manually) instrument a Java application. The process is the same for either method. While some of the data can be automatically gathered, that's just generic Java information for your applications and services. Manual instrumentation is the cost you can't forget, where you need to make code changes and redeploy. While it's nice to discuss manual instrumentation in the abstract sense, nothing beats getting hands-on with a real coding example. To that end, we can dive into what it takes to both auto and manually instrument a simple Java application in this workshop lab. Below you see a small example of the code you will apply to your example application in one of the workshop exercises to create a gauge metric: Java // Start thread and apply values to metrics. Thread bgThread = new Thread(() -> { while (true) { try { counter.labelValues("ok").inc(); counter.labelValues("ok").inc(); counter.labelValues("error").inc(); gauge.labelValues("value").set(rand(-5, 10)); TimeUnit.SECONDS.sleep(1); } catch (InterruptedException e) { e.printStackTrace(); } } }); bgThread.start(); Be sure to explore the free online workshop and get hands-on experience with what instrumentation for your Java applications entails. Instrumenting Traces In the case of tracing, a widely adopted and accepted standard is the OpenTelemetry (OTel) project, which is used to instrument and collect telemetry data through a push mechanism to an agent installed on the host. Below you see an outline of the architecture used by OTel to collect telemetry data: Whether we choose automatic or manual instrumentation, we have the same issues as previously discussed above. Our applications and services all require some form of cost to instrument our applications and we can't forget that when scoping our observability solutions. The telemetry data is pushed to an agent, known as the OTel Collector, which is installed on the application's host platform. It uses a widely accepted open standard to communicate known as the OpenTelemetry Protocol (OTLP). Note that OTel does not have a backend component, instead choosing to leverage other technologies for the backend and the collector sends all processed telemetry data onwards to that configured backend. Again, it's nice to discuss manual instrumentation in the abstract sense, but nothing beats getting hands-on with a real coding example. To that end, we can dive into what it takes to programmatically instrument a simple application using OTel in this workshop lab. Below, you see a small example of the code that you will apply to your example application in one of the workshop exercises to collect OTel telemetry data, and later in the workshop, view in the Jaeger UI: Python ... from opentelemetry.trace import get_tracer_provider, set_tracer_provider set_tracer_provider(TracerProvider()) get_tracer_provider().add_span_processor( BatchSpanProcessor(ConsoleSpanExporter()) ) instrumentor = FlaskInstrumentor() app = Flask(__name__) instrumentor.instrument_app(app) ... Be sure to explore the free online workshop and get hands-on yourself to experience how much effort it is to instrument your applications using OTel. The road to cloud-native success has many pitfalls. Understanding how to avoid the pillars and focusing instead on solutions for the phases of observability will save much wasted time and energy. Coming Up Next Another pitfall organizations struggle with in cloud native observability is the protocol jungle. In the next article in this series, I'll share why this is a pitfall and how we can avoid it wreaking havoc on our cloud-native observability efforts.

By Eric D. Schabell

CORE

Fundamentals of Functions and Relations for Software Quality Engineering

Understanding the fundamentals of functions and relations is paramount. Grasping these core concepts lays the groundwork for effective software development and testing. We will delve into the basics of functions and relations, exploring their significance in software engineering and their implications for ensuring software quality. We will highlight basic scenarios for testing to kickstart more intricate testing activities. Effective testing is not just about covering every line of code. It's about understanding the underlying relationships. How do we effectively test the complex relationships in our software code? Understanding functions and relations proves an invaluable asset in this endeavor. This article explores these mathematical concepts, weaving their definitions with practical testing applications. By leveraging this knowledge, you can design more targeted and efficient test strategies, ultimately strengthening your software's quality. Functions A function associates elements of sets and it is a special kind of relation. Our code contains functions that associate outputs with inputs. In the mathematical formulation of a function, the inputs are the domain and the outputs are the range of the function. Formally, a function f from set A to set B can be defined as a subset of the Cartesian product A × B. This technical definition essentially ensures that each element in A maps to a unique element in B. In simpler terms, a well-behaved function never links a single input to multiple different outputs. This characteristic is crucial for testing, as non-deterministic functions (producing unpredictable outputs for the same input) pose unique challenges. It's worth noting that while code can be viewed as functions in a broad sense, not all are "pure" functions. Pure functions have no side effects, meaning they solely rely on their inputs to produce outputs without altering any external state. In practice, our code involves side effects (such as modifying databases or interacting with external devices), complicating the pure function interpretation. Basic Function Types We will cover specific types of deterministic functions. Onto functions demand specific testing strategies due to their "every input has an output" requirement. By recognizing their characteristics and potential edge cases, software testers can create more effective test suites that uncover hidden issues and ensure reliable software behavior. Into functions require a nuanced testing approach due to their selective mapping nature. By comprehending their characteristics and potential edge cases, you can craft targeted test suites that ensure robust and reliable software behavior. Mastering the intricacies of one-to-one functions empowers you to craft effective test suites that safeguard your software against hidden mapping errors and ensure data integrity. With a keen eye for unique outputs and potential collisions, you can confidently navigate the testing terrain and contribute to building reliable and secure software. By harnessing the power of equivalence classes, thorough edge case testing, and an understanding of potential performance limitations, you can conquer the challenges of many-to-one functions. This empowers you to craft test suites that ensure your software accurately processes diverse inputs and delivers consistent, reliable outputs. Onto Functions An onto function maps every element in its domain (set of inputs) to exactly one element in its range. Imagine a function converting weekdays to corresponding weekend days (Mon -> Sat, Tue -> Sun). Every weekday has a unique weekend counterpart, satisfying the onto property. Testing implications: Coverage: Testing onto functions requires ensuring all possible inputs are covered. Missing even one input could lead to untested scenarios and potential bugs. Edge cases: Pay close attention to boundary values at the edges of the domain. For instance, testing the weekend conversion function with Sunday might reveal unexpected behavior if it doesn't map to Monday (assuming a cyclical mapping). Inverse function existence: If an inverse function exists (mapping weekend days back to weekdays), testing it can indirectly validate the onto function's correctness. Performance and scalability: For large domains, testing every possible input might not be feasible. Utilizing equivalence classes or randomized testing can balance coverage with efficiency. Examples: User authentication: Mapping usernames to unique user profiles is typically an onto function. Testing should involve diverse usernames to ensure all valid ones have corresponding profiles. Error code mapping: Different error codes might map to specific error messages. Onto functions ensure every error code has a corresponding message, requiring comprehensive testing of all potential error codes. Data encryption/decryption: Onto functions can be used to ensure every encrypted message has a unique decryption key. Testing involves encrypting various messages and verifying they decrypt correctly. Into Functions Imagine a function that converts Celsius temperatures to Fahrenheit. While every Celsius value has a corresponding Fahrenheit equivalent, not every Fahrenheit value has a corresponding Celsius counterpart (e.g., -40°F has no exact Celsius equivalent). This function exemplifies an into function, mapping some, but not all, elements in its domain to the range. Testing Implications: Focus on covered elements: Testing into functions primarily focuses on ensuring all valid inputs that do have outputs are covered. Unlike onto functions, missing some inputs might be permissible based on the function's design. Edge cases and invalid inputs: Pay close attention to invalid inputs that fall outside the domain. The function's behavior for these inputs should be well-defined, whether returning a specific error value or throwing an exception. Partial testing strategies: Since not all inputs have outputs, exhaustive testing might be unnecessary. Consider equivalence partitioning to group similar inputs with likely similar behavior, optimizing test coverage without redundancy. Inverse function considerations: Unlike onto functions, inverses for into functions are generally not possible. However, if a related function exists that maps elements back to the domain, testing its correctness can indirectly validate the into function's behavior. Examples: File extension to content type mapping: Not all file extensions have corresponding content types (e.g., a ".custom" extension might not be recognized). Testing involves verifying known extensions but also validating the function's handling of unknown ones. User permission checks: Certain actions might require specific user permissions. An into function could check if a user has the necessary permission. Tests would focus on valid permissions but also include cases where permissions are absent. Data validation functions: These functions might check if data adheres to specific formats or ranges. While valid data should be processed, testing should also include invalid data to ensure proper error handling or rejection. One-to-One Functions Imagine a function that assigns unique identification numbers to students. Each student receives a distinct ID, ensuring no duplicates exist. This function perfectly embodies the one-to-one principle: one input (student) leads to one and only one output (ID). Testing Implications: Unique outputs: The heart of testing lies in verifying that distinct inputs always produce different outputs. Focus on creating test cases that cover diverse input scenarios to expose any potential mapping errors. Inverse function potential: If an inverse function exists (mapping IDs back to students), testing its correctness indirectly validates the one-to-one property of the original function. Edge cases and collisions: Pay close attention to potential "collision" scenarios where different inputs might accidentally map to the same output. Thorough testing of boundary values and special cases is crucial. Equivalence classes: While exhaustive testing might seem necessary, consider grouping similar inputs (e.g., student age ranges) into equivalence classes. Testing one representative from each class can optimize coverage without redundancy. Examples: User login with unique usernames: Each username should map to a single user account, ensuring one-to-one functionality. Test with diverse usernames to uncover potential duplicate mappings. Generating unique random numbers: Random number generators often aim for one-to-one mappings to avoid predictability. Testing involves generating large sets of numbers and verifying their uniqueness. Hashing algorithms: These functions map data to unique "hash" values. Testing focuses on ensuring different data produces distinct hashes and that collisions (same hash for different data) are highly unlikely. Many-to-One Functions Imagine a function that categorizes books by genre. Here, multiple books of different titles and authors could belong to the same genre (e.g., Sci-Fi). This exemplifies a many-to-one function, where several inputs map to a single output. Testing Implications: Focus on valid mappings: While multiple inputs might share an output, your primary focus is ensuring valid inputs indeed map to the correct output. Test diverse input scenarios to catch errors in the mapping logic. Equivalence classes: Grouping similar inputs based on shared characteristics (e.g., book themes) allows you to test one representative from each class, optimizing coverage without redundantly testing every possible combination. Edge cases and invalid inputs: Pay close attention to how the function handles invalid inputs or those falling outside its defined domain. Does it return a specific error value, ignore them, or exhibit unexpected behavior? Inverse function considerations: In many cases, inverse functions don't exist for many-to-one functions. However, if a related function maps outputs back to specific input subsets, testing its correctness can indirectly validate the original function's behavior. Examples: Product discount functions: They usually map different product quantities (e.g., 1 item, 3 items, 5 items) to a single discount percentage (e.g., 10% off for bulk purchases). Ensure discounts apply correctly for various quantities within and outside designated ranges. Test edge cases like single-item purchases and quantities exceeding discount thresholds. Shipping cost calculators: They often map different combinations of origin, destination, and package weight to a single shipping cost. Cover diverse locations, weight ranges, and shipping options. Verify calculated costs against established pricing tables and consider edge cases like remote locations or unusual package sizes. Search algorithms: Search queries could return various relevant results. Test with diverse queries and ensure the returned results indeed match the query intent, even if they share the same "relevant" category. Relations: Beyond Simple Mappings While functions provide clear input-output connections, not all relationships in software are so straightforward. Imagine tracking dependencies between tasks in a project management tool. Here, multiple tasks might relate to each other, forming a more complex network. This is where relations come in: Reflexive, symmetric, transitive: Relations can exhibit specific attributes like reflexivity (a task relates to itself), symmetry (if task A depends on B, then B depends on A), and transitivity (if A depends on B and B depends on C, then A depends on C). These properties have testing implications. For instance, in a file system deletion operation, transitivity ensures that deleting a folder also deletes its contents. Equivalence relations and partitions: Relations can sometimes group elements into equivalence classes, where elements within a class behave similarly. Testers can leverage this by testing one element in each class, assuming similar behavior for others, saving time and resources. Transitive Dependency Relation Consider a project management tool where tasks have dependencies. This forms a more complex network of relationships, represented by relations. These go beyond simple input-output mappings: Python def can_start(task, dependencies): """Checks if a task can start given its dependencies (completed or not).""" for dep in dependencies: if not dep.is_completed(): return False return True # Transitive relation: A depends on B, B depends on C, implies A depends on C task_A = Task("Write requirements") task_B = Task("Design prototype") task_C = Task("Develop code") task_A.add_dependency(task_B) task_B.add_dependency(task_C) assert can_start(task_A) == False # Task A can't start while C is incomplete task_C.mark_completed() assert can_start(task_A) == True # Now A can start as C is complete This can_start function utilizes a transitive relation. If task A depends on B, and B depends on C, then A ultimately depends on C. Testing involves checking various dependency combinations to ensure tasks can only start when their transitive dependencies are fulfilled. Testing Basic transitive dependency: Ensure the function accurately reflects the transitive nature of dependencies. Test scenarios where task A depends on B, B depends on C, and so on, ensuring A can only start when C is completed. Circular dependencies: Verify the function's behavior when circular dependencies exist (e.g., A depends on B, B depends on A). Handle them appropriately, either preventing circular dependencies altogether or flagging them for manual evaluation. Multiple dependencies: Test cases where a task has multiple dependencies. Ensure the function only allows the task to start when all its dependencies are complete, regardless of their number or complexity. Reflexive Relation Consider a user login system where a user needs to be logged in to perform certain actions. Python def is_authorized(user, action): """Checks if a user is authorized to perform an action.""" return user.is_logged_in() and user.has_permission(action) Every user is considered authorized to perform the action of "logging in" (regardless of other permissions). This establishes a reflexive relation, where every user is related to the action of "logging in" (user -> "log in"). Testing Verify that is_authorized(user, "log in") is always True for any user object, regardless of their login status or permissions. Test edge cases like newly created users, users with specific permission sets, and even invalid user objects. Symmetric Relation Consider a social media platform where users can "follow" each other. Python def are_friends(user1, user2): """Checks if two users are friends (follow each other).""" return user1.follows(user2) and user2.follows(user1) In a friendship, the relationship flows both ways. If user A follows user B, then user B must also follow user A. This establishes a symmetric relation, where user A and user B are related in the same way ("follows") if the friendship exists. Testing Verify that are_friends(user1, user2) is True only if are_friends(user2, user1) is also True. Test various scenarios like mutual follows, one-way follows, and users who don't know each other. Consider edge cases like blocked users, deactivated accounts, and privacy settings affecting visibility. Equivalence Classes and Partitioning Relations can group elements with similar behavior into equivalence classes. Testers can leverage this by testing one element in each class, assuming similar behavior for others. Python def get_file_type(filename): """Classifies a file based on its extension (text, image, etc.).""" extension = filename.split(".")[-1].lower() if extension in {".txt", ".md"}: return "text" elif extension in {".jpg", ".png"}: return "image" else: return "unknown" # Equivalence classes: Test one file from each class text_files = ["readme.txt", "report.md"] image_files = ["photo.jpg", "banner.png"] for file in text_files: assert get_file_type(file) == "text" for file in image_files: assert get_file_type(file) == "image" In the get_file_type example, testing one file from each equivalence class (text and image) efficiently covers different file extensions without redundant testing. This principle applies to various scenarios, like testing error handling for different input types or user roles with similar permissions. Visualizing Relationships for Clarity Visualizing functions and relations can significantly enhance understanding and test design. Two popular ways to visualize are the following. Function mapping diagrams: Draw arrows connecting inputs to outputs, highlighting one-to-one, many-to-one scenarios. Relation network diagrams: Represent elements as nodes and connections as edges, indicating reflexivity, symmetry, and transitivity. Wrapping Up By understanding functions and relations both conceptually and practically, we gain valuable tools for effective software development and testing. Functions and relations provide a foundational framework for organizing and reasoning about the intricate relationships between different parts of our code, ultimately leading to more robust and reliable software. Remember, effective testing is not just about covering every line of code but about understanding the underlying relationships that make your software tick.

By Stelios Manioudakis

CORE

Mastering Prometheus: Unlocking Actionable Insights and Enhanced Monitoring in Kubernetes Environments

In the dynamic world of cloud-native technologies, monitoring and observability have become indispensable. Kubernetes, the de-facto orchestration platform, offers scalability and agility. However, managing its health and performance efficiently necessitates a robust monitoring solution. Prometheus, a powerful open-source monitoring system, emerges as a perfect fit for this role, especially when integrated with Kubernetes. This guide outlines a strategic approach to deploying Prometheus in a Kubernetes cluster, leveraging helm for installation, setting up an ingress nginx controller with metrics scraping enabled, and configuring Prometheus alerts to monitor and act upon specific incidents, such as detecting ingress URLs that return 500 errors. Prometheus Prometheus excels at providing actionable insights into the health and performance of applications and infrastructure. By collecting and analyzing metrics in real-time, it enables teams to proactively identify and resolve issues before they impact users. For instance, Prometheus can be configured to monitor system resources like CPU, memory usage, and response times, alerting teams to anomalies or thresholds breaches through its powerful alerting rules engine, Alertmanager. Utilizing PromQL, Prometheus's query language, teams can dive deep into their metrics, uncovering patterns and trends that guide optimization efforts. For example, tracking the rate of HTTP errors or response times can highlight inefficiencies or stability issues within an application, prompting immediate action. Additionally, by integrating Prometheus with visualization tools like Grafana, teams can create dashboards that offer at-a-glance insights into system health, facilitating quick decision-making. Through these capabilities, Prometheus not only monitors systems but also empowers teams with the data-driven insights needed to enhance performance and reliability. Prerequisites Docker and KIND: A Kubernetes cluster set-up utility (Kubernetes IN Docker.) Helm, a package manager for Kubernetes, installed. Basic understanding of Kubernetes and Prometheus concepts. 1. Setting Up Your Kubernetes Cluster With Kind Kind allows you to run Kubernetes clusters in Docker containers. It's an excellent tool for development and testing. Ensure you have Docker and Kind installed on your machine. To create a new cluster: kind create cluster --name prometheus-demo Verify your cluster is up and running: kubectl cluster-info --context kind-prometheus-demo 2. Installing Prometheus Using Helm Helm simplifies the deployment and management of applications on Kubernetes. We'll use it to install Prometheus: Add the Prometheus community Helm chart repository: helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update Install Prometheus: helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace helm upgrade prometheus prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \ --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false This command deploys Prometheus along with Alertmanager, Grafana, and several Kubernetes exporters to gather metrics. Also, customize your installation to scan for service monitors in all the namespaces. 3. Setting Up Ingress Nginx Controller and Enabling Metrics Scraping Ingress controllers play a crucial role in managing access to services in a Kubernetes environment. We'll install the Nginx Ingress Controller using Helm and enable Prometheus metrics scraping: Add the ingress-nginx repository: helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update Install the ingress-nginx chart: helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \ --namespace ingress-nginx --create-namespace \ --set controller.metrics.enabled=true \ --set controller.metrics.serviceMonitor.enabled=true \ --set controller.metrics.serviceMonitor.additionalLabels.release="prometheus" This command installs the Nginx Ingress Controller and enables Prometheus to scrape metrics from it, essential for monitoring the performance and health of your ingress resources. 4. Monitoring and Alerting for Ingress URLs Returning 500 Errors Prometheus's real power shines in its ability to not only monitor your stack but also provide actionable insights through alerting. Let's configure an alert to detect when ingress URLs return 500 errors. Define an alert rule in Prometheus: Create a new file called custom-alerts.yaml and define an alert rule to monitor for 500 errors: apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: ingress-500-errors namespace: monitoring labels: prometheus: kube-prometheus spec: groups: - name: http-errors rules: - alert: HighHTTPErrorRate expr: | sum (rate(nginx_ingress_controller_requests{status=~"5.."}[1m])) > 0.1 OR absent(sum (rate(nginx_ingress_controller_requests{status=~"5.."}[1m]))) for: 1m labels: severity: critical annotations: summary: High HTTP Error Rate description: "This alert fires when the rate of HTTP 500 responses from the Ingress exceeds 0.1 per second over the last 5 minutes." Apply the alert rule to Prometheus: You'll need to configure Prometheus to load this alert rule. If you're using the Helm chart, you can customize the values.yaml file or create a ConfigMap to include your custom alert rules. Verify the alert is working: Trigger a condition that causes a 500 error and observe Prometheus firing the alert. For example, launch the following application: kubectl create deploy hello --image brainupgrade/hello:1.0 kubectl expose deploy hello --port 80 --target-port 8080 kubectl create ingress hello --rule="hello.internal.brainupgrade.in/=hello:80" --class nginx Access the application using the below command: curl -H "Host: hello.internal.brainupgrade.in" 172.18.0.3:31080 Wherein: 172.18.0.3 is the IP of the KIND cluster node. 31080 is the node port of the ingress controller service. This could be different in your case. Bring down the hello service pods using the following command: kubectl scale --replicas 0 deploy hello You can view active alerts in the Prometheus UI (localhost:9999) by running the following command. kubectl port-forward -n monitoring svc/prometheus-operated 9999:9090 And you will see the alert being fired. See the following snapshot: Error alert on Prometheus UI. You can also configure Alertmanager to send notifications through various channels (email, Slack, etc.). Conclusion Integrating Prometheus with Kubernetes via Helm provides a powerful, flexible monitoring solution that's vital for maintaining the health and performance of your cloud-native applications. By setting up ingress monitoring and configuring alerts for specific error conditions, you can ensure your infrastructure not only remains operational but also proactively managed. Remember, the key to effective monitoring is not just collecting metrics but deriving actionable insights that lead to improved reliability and performance.

By Rajesh Gheware

The Best Way To Diagnose a Patient Is To Cut Him Open

"The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." — Brian Kernighan. Cutting a patient open and using print for debugging used to be the best way to diagnose problems. If you still advocate either one of those as the superior approach to troubleshooting, then you're either facing a very niche problem or need to update your knowledge. This is a frequent occurrence, e.g., this recent tweet: This specific tweet got to the HN front page, and people chimed in with that usual repetitive nonsense. No, it’s not the best way for the vast majority of developers. It should be discouraged just as surgery should be avoided when possible. Fixating on print debugging is a form of a mental block; debugging isn’t just stepping over code. It requires a completely new way of thinking about issue resolution. A way that is far superior to merely printing a few lines. Before I continue, my bias is obvious. I wrote a book about debugging, and I blog about it a lot. This is a pet peeve of mine. I want to start with the exception to the rule, though: when do we need to print something... Logging Is NOT Print Debugging! One of the most important debugging tools in our arsenal is a logger, but it is not the same as print debugging in any way: Logger Print Permanence of output Permanent Ephemeral Permanence in code Permanent Should be removed Globally Toggleable Yes No Intention Added as part of the design Added ad-hoc A log is something we add with forethought; we want to keep the log for future bugs and might even want to expose it to the users. We can control its verbosity often at the module level and can usually disable it entirely. It’s permanent in code and usually writes to a permanent file we can review at our leisure. Print debugging is code we add to locate a temporary problem. If such a problem has the potential of recurring, then a log would typically make more sense in the long run. This is true for almost every type of system. We see developers adding print statements and removing them constantly instead of creating a simple log to track frequent problems. There are special cases where print debugging make some sense: in mission-critical embedded systems, a log might be impractical in terms of device constraints. Debuggers are awful in those environments, and print debugging is a simple hack. Debugging system-level tools like a kernel, compiler, debugger, or JIT can be difficult with a debugger. Logging might not make sense in all of these cases, e.g., I don’t want my JIT to print every bytecode it’s processing and the metadata involved. Those are the exceptions, not the rules. Very few of us write such tools. I do, and even then, it’s a fraction of my work. For example, when working at Lightrun, I was working on a production debugger. Debugging the agent code that’s connected to the executable was one of the hardest things to do. A mix of C++ and JVM code that’s connected to a completely separate binary... Print debugging of that portion was simpler, and even then, we tried to aim towards logging. However, the visual aspects of the debugger within the server backend and the IDE were perfect targets for the debugger. Why Debug? There are three reasons to use a debugger instead of printouts or even logs: Features: Modern debuggers can provide spectacular capabilities that are unfamiliar to many developers. Sadly, there are very few debugging courses in academia since it’s a subject that’s hard to test. Low overhead: In the past, running with the debugger meant slow execution and a lot of overhead. This is no longer true. Many of us use the debug action when launching an application instead of running, and there’s no noticeable overhead for most applications. When there is overhead, some debuggers provide means to improve performance by disabling some features. Library code: A debugger can step into a library or framework and track the bug there. Doing this with print debugging will require compiling code that you might not want to deal with. I dug into the features I mentioned in my book and series on debugging (linked above), but let’s pick a few fantastic capabilities of the debugger that I wrote about in the past. For the sake of positive dialog, here are some of my top features of modern debuggers. Tracepoints Whenever someone opens the print debugging discussion, all I hear is, “I don’t know about tracepoints.” They aren’t a new feature in debuggers, yet so few are aware of them. A tracepoint is a breakpoint that doesn’t stop; it just keeps running. Instead of stopping, you can do other things at that point, such as print to the console. This is similar to print debugging; only it doesn’t suffer from many of the drawbacks: no runtime overhead, no accidental commit to the code base, no need to restart the application when changing it, etc. Grouping and Naming The previous video/post included a discussion of grouping and naming. This lets us group tracepoints together, disable them as a group, etc. This might seem like a minor feature until you start thinking about the process of print debugging. We slowly go through the code, adding a print and restarting. Then suddenly, we need to go back, or if a call comes in and we need to debug something else... When we package the tracepoints and breakpoints into a group, we can set aside a debugging session like a branch in version control. It makes it much easier to preserve our train of thought and jump right back to the applicable lines of code. Object Marking When asked about my favorite debugging feature I’m always conflicted, Object Marking is one of my top two features... It seems like a simple thing; we can mark an object, and it gets saved with a specific name. However, this is a powerful and important feature. I used to write down the pointers to objects or memory areas while debugging. This is valuable as sometimes an area of memory would look the same but would have a different address, or it might be hard to track objects with everything going on. Object Marking allows us to save a global reference to an object and use it in conditional breakpoints or for visual comparison. Renderers My other favorite feature is the renderer. It lets us define how elements look in the debugger watch area. Imagine you have a sophisticated object hierarchy but rarely need that information... A renderer lets you customize the way IntelliJ/IDEA presents the object to you. Tracking New Instances One of the often overlooked capabilities of the debugger is memory tracking. A Java debugger can show you a searchable set of all object instances in the heap, which is a fantastic capability that can expose unintuitive behavior But it can go further, it can track new allocations of an object and provide you with the stack to the applicable object allocation. Tip of the Iceberg I wrote a lot about debugging, so there’s no point in repeating all of it in this post. If you’re a person who feels more comfortable using print debugging, then ask yourself this: why? Don’t hide behind an out-of-date Brian Kernighan quote. Things change. Are you working in one of the edge cases where print debugging is the only option? Are you treating logging as print debugging or vice versa? Or is it just that print debugging was how your team always worked, and it stuck in place? If it’s one of those, then it might be time to re-evaluate the current state of debuggers.

By Shai Almog

CORE

Mastering Test Code Quality Assurance

Over the years, many articles have highlighted the importance of unit and integration tests and their benefits. They enable quick and accurate identification of errors, simplify the debugging process, support safe refactoring, and prove invaluable during code reviews. These tests can also significantly reduce development costs, help catch mistakes early, and ensure the final product aligns well with its specifications. As such, testing is often viewed as a central part of the development process. However, within the developer community, it's become clear in recent years that merely having unit and integration tests isn't enough. A growing number of blog posts and articles emphasize the need for well-structured and formatted tests. So, why is this aspect so crucial? Best Practices In short, poorly formatted tests or those exhibiting anti-patterns can significantly hamper a project's progress. It's not just my perspective. Many articles stress the significance of well-structured testsand provide best practices and insights on this topic. One element that frequently emerges as pivotal in these discussions is the naming of tests. Two articles in particular, Anatomy of a Good Java Test and Importance of Unit Testing underscore the crucial role of effective test naming. They advise against using the word "test" in test names, suggesting that appropriate naming can clearly describe the test's objective or what it intends to verify. Additionally, the article Clean Unit Testing highlights not only the naming of test methods but also the importance formaintainability of correct naming and ordering test variables. Branching out from naming assertions is another cornerstone in testing best practices. Take, for instance, the article 7 Tips for Writing Better Unit Tests in Java that highlights the advantage of using assertions over print statements. Other industry experts often emphasize limiting the number of assertions and correctly positioning them within a single test. The AAA pattern (Arrange, Act, Assert) is the perfect example of this intention: positioning assertions at the end of the test method ensures clarity and readability for other developers. Moreover, the transparency of the assertions themselves is also important. For instance, they should comewith descriptive messages. In fact, there are more suggestions to keep in mind: Appropriate usage of mocks and stubs. Avoiding "if" statements in test blocks. Focusing on a single case in each unit Making tests as isolated and automated as possible. Maintaining high test and code coverage. Testing negative scenarios and borderline cases, in addition to positive ones. Avoiding non-deterministic results and flaky tests Avoiding unit-test anti-patterns Yet, the realm of best practices is ever-evolving, and this list isn't exhaustive. New best practices continue to emerge. For example, the recent idea about the layout of tests highlights the importance of structuring both unit and integration tests within the source code. It's not just about refactoring tests anymore but also about organizing them systematically within the source code. In summation, as you can see, the community provides a variety of best practices for creating quality tests. The real question, however, is: Are these principles just theoretical, or are there practical solutions that can help us achieve such quality? Gap Identification Yes, I'm referring to static analyzers. Let's briefly examine the most widely used ones, even though there are many similar tools available. I will focus only on rules and checks that help to address at least some of the best practices discovered previously. Checkstyle Checkstyle is a development tool that helps programmers write Java code that adheres to a coding standard. In other words, Checkstyle is a static code analysis tool (linter) used in the Java world. Although Checkstyle doesn't provide features specifically tailored for tests, many of its features areapplicable to test code, just as they are to production code. It can assist with Javadoc comments, indentation, line length, cyclomatic complexity, etc. However, to the best of my knowledge, the only feature related to tests is the ability to enforce the test names convention by developing a specific checker. So, yes, before using it, you need to develop your own checker first.Thus, while Checkstyle is a general tool that focuses solely on Java code, it doesn't specifically address issues with tests. It doesn't consider specific rules related to assertion checks, identification of anti-patterns, or maintaining the layout of tests - all of which are essential to keep tests consistent and clear in line with industry requirements and best practices. PMD PMD is one more source code analyzer similar to Checkstyle. It finds common programming flaws like unused variables, empty catch blocks, unnecessary object creation, and so forth. While it supports many different languages, we are only interested in Java. PMD, compared with Checkstyle, has many more rules that check test quality, for example (but not limited to): JUnitAssertionsShouldIncludeMessage requires JUnit assertions to include a message. JUnitTestContainsTooManyAsserts checks if the JUnit or TestNG test contains too many assertion statements. JUnitTestsShouldIncludeAssert checks that JUnit tests include at least one assertion. TestClassWithoutTestCases checks that test classes have at least one testing method. UnnecessaryBooleanAssertion checks that JUnit assertions are used correctly without assertTrue(true) statements (line-hitter anti-pattern detection.) Here is a short example of test violations that PMD can find: Java public class Foo extends TestCase { public void testSomething() { // [JUnitAssertionsShouldIncludeMessage] Use the form: // assertEquals("Foo does not equals bar", "foo", "bar"); // instead assertEquals("foo", "bar"); } //[TestClassWithoutTestCases] Consider adding test methods if it is a test: public class Bar extends TestCase {} public class MyTestCase extends TestCase { // Ok public void testMyCaseWithOneAssert() { boolean myVar = false; assertFalse("should be false", myVar); } //[JUnitTestsShouldIncludeAssert] //Bad, don't have any asserts public void testSomething() { Bar b = findBar(); b.work(); } //[JUnitTestContainsTooManyAsserts]: //Bad, too many asserts (assuming max=1) public void testMyCaseWithMoreAsserts() { boolean myVar = false; assertFalse("myVar should be false", myVar); assertEquals("should equals false", false, myVar); //[UnnecessaryBooleanAssertion] Bad, serves no real purpose - remove it: assertTrue(true); } However, all these checks are designed primarily for JUnit assertions and, in some cases, for AssertJ. They don't support Hamcrest assertions, which are widely adopted in the industry. Also, while PMD can check method names, these checks are relatively simple. They focus on aspects such as method name length, avoiding special characters like underscores, and adhering to camel case naming conventions. Consequently, these checks are primarily intended for production code only and don't examine specific test name patterns. Moreover, to the best of my knowledge, PMD doesn't identify structural mistakes or verify the correct placement of methods. Thus, PMD provides a rather limited set of checks for tests. Sonar Qube SonarQube is also a widely used tool for checking code quality. SonarQube has a lot of rules similar to PMD that can be applied to tests, for example: TestCases should contain tests. Literal boolean values and nulls should not be used in assertions. Assertions should not compare an object to itself. Test assertions should include messages. Test methods should not contain too many assertions. Similar tests should be grouped in a single Parameterized test. At the time of writing this text, there are around 45 rules specifically designed for tests. As you might have noticed, SonarQube has more rules than PMD, although many of them overlap. However, to the best of my knowledge, SonarQube doesn't check Hamcrest assertions and doesn't maintain the layout of tests. It also doesn't show much concern about checking test anti-patterns. Others Actually, there are other tools available for detecting issues related to test quality. Some notable ones include: SpotBugs checks for correct usage of setUp/tearDown methods, empty test cases, andimproper use of assertions. ErrorProne examines test signatures and forbids the use of "test" in test names, identifies redundant methods without @Test and @Ignore and offers some other test-related checks. MegaLinter and Qulice primarily combine previously mentioned linters like PMD and Checkstyle. Essentially, they just bundle checks from other linters. Coverity is a proprietary tool that has numerous checks, including those for assertions and various resource leaks. However, some users argue that its features are similar to those PMD and SpotBugs. Jtest is another proprietary tool that has a comprehensive set of features. This includes checks for assertion statements, initialization methods, and more. The complete list of checks can be found here. There are numerous other tools, including Checkmarx Glossary, Klocwork, CodeSonar, among many others, that we simply can't cover in this article. In summary, tools like Checkstyle, PMD, SonarQube, and others offer numerous rules to ensure test code quality. However, noticeable gaps exist in their ability to tackle certain test-related issues. Checkstyle is primarily designed for Java production code, and its features for tests are limited. This often requires users to develop their own checkers for specific scenarios. PMD has a robust set of rules for JUnit assertions, yet it doesn't support popular frameworks like Hamcrest or method naming patterns. SonarQube provides an extensive rule set, which overlaps with PMD in many areas. However, it lacks some vital test checks, including those for Hamcrest assertions and test anti-patterns. Other tools have their own limitations, or they are proprietary. Significantly, none of the aforementioned tools focus on the proper placement and naming of test classes. Thus, even though these tools provide a foundation for test code quality, there's a notable gap in terms of aligning with industry test standards and best practices. Introducing jtcop To address the aforementioned gaps, we developed a new static analyzer called jtcop that focuseson test quality in Java projects. It is a simple Maven plugin that checks tests for common mistakes and anti-patterns. We use it in our projects, and it has helped us maintain consistent and clear tests. It also speeds up PR reviews significantly by preventing recurring comments about issues like improper test placement or naming. Although, we don't think our rules are the only good way to set up tests, so feel free to share your ideas and suggestions by submitting tickets and PRs. In the following, I'll explain how jtcop fits into the landscape of static analysis tools, which checks it utilizes, and how it can assist you in youreveryday programming. Test Names I'm sure you know there are many ways to name your test. For example, you can find various test naming conventions or even some threads that have lengthy discussions on how to do it correctly. Here is just a short summary of how you can name your tests: Pattern Example methodName_stateUnderTest_expected add_negativeNumbers_throwsException() when_condition_then_expected when_ageLessThan18_then_isUnderageIsTrue() given_precondition_when_action_then_result given_userIsAdmin_when_deleteIsCalled_then_deleteSuccess() test[methodName] testAdd() or testIsUnderage() should_expectedBehavior_when_condition should_throwException_when_negativeNumbersAreAdded() methodName_expected add_returnsSum() or isUnderage_returnsTrue() canAction canDeleteUser() or canCalculateSum( methodName_doesExpectedBehavior add_doesReturnSum() or isUnderage_returnsTrue() verbCondition (or verbResult) calculatesSum() or deletesSuccessfully() jtcopprefers the last pattern: Test names should use the present tense without a subject. For example, if you're testing a class Animal with a method eat(), the test name should be eats(). If you need to add more context, do it after the verb – for instance, eatsApplesOnly(). Test names should use camelCase. Name shouldn't use the word "test", as it is redundant. The @Test annotation is sufficient. Special characters like _ and $ are forbidden. Correct Names Incorrect Names eats() testEats() eatsApplesOnly() TestEatsApplesOnly() runsQuickly() _runsQuickly() jumpsOverFence() jumps_over_fence() drinksWater() drinks$Water() sleepsAtNight() sleepsZZZ() chewsGum() test_chewsGum() listensToMusic() listens_To_Music() walksInPark() WalksInPark() barksLoudly() barks__loudly() This style has been chosen by many developers and is widely used in numerous projects. If you prefer a different pattern for test naming, just let us know, and we'll be happy to add it to the plugin. Corresponding Production Class Now, let's imagine we have a test class named SumTest.java with the test method checksSum(). But what if the test occasionally fails? Most would attempt to locate the issue and find the original class where the problem occurred. But which class is it? The first guess would likely be Sum.java, right? Yet, you might not find it, perhaps because the production class is named something like Addition.java or Calculator.java. This mismatch in naming conventions can lead to significant confusion and longertroubleshooting times. In other words, if you have a test class named SumTest.java and the corresponding production class is Addition.java, it can be very confusing. The more appropriate name for the test class would be AdditionTest.java. Essentially, the name of the test class isn't merely a label; it serves as a pointer to the production class, helping developers pinpoint potential issues.This is where jtcop comes into play. It helps ensure that your tests are consistent with your production classes and suggests appropriate naming conventions for them, effectively addressing the problem described. If you're further interested in this issue, you can read about it here.The only exception in this case is integration tests. They are usually named like AdditionIT.java or AdditionIntegrationTest.java. However, they should be placed in a separate package, such as it, and have an appropriate suffix like IT or ITCase. Test Methods Only The next check is rather strict and is still considered an experimental feature. However, the rule itself is simple: test classes should contain methods that are only annotated with the @Test annotation. You might wonder what to do with initialization methods or common code shared among different test cases. The answer isn't straightforward and this rule is designed to guide you with it. There aren't actually many options available. I'm referring to methods such as static initialization methods, setup methods @BeforeEach and @AfterEach annotations, JUnit extensions, and Fake Objects. The approach you choose for initializing your tests will determine their quality. Static Methods The first idea that comes to mind is using static methods. Developers often use static methods to configure a common setup for several tests in the class. Here's a simple example: Java @Test void calculatesSum(){ Summator s = init(); Assertions.assertEquals( 2, sum(1, 1), "Something went wrong, because 1 + 1 != 2" ); } private static Summator init(){ Summator s = new Summator(); // Setup return s; } At first glance, it might seem like a good solution, but it does have inherent problems. When such a method is used within a single class, it's usually not a major concern, even though static methods typically lead to low cohesion and tight coupling. However, issues arise when you begin to use it across multipleclasses or try to consolidate such methods into a centralized TestUtils.java class. In this case, the approach with static methods can become problematic: It can lead to confusion for developers since TestUtils.java doesn't correspond to any class in the production code. TestUtils.java might be considered an anti-pattern. Thus, jtcop deems static methods in tests and utility classes as dangerous and prohibits them. If you attempt to run jtcop against the previous code sample, you'll receive the following warning message: Shell All methods should be annotated with @Test annotation. SetUp and TearDown Methods The next widely-used approach involves the so-called "setUp" methods. By "setUp" methods, I'm referring to those annotated with @BeforeAll, @BeforeEach, @AfterAll, or @AfterEach. An example of using these annotations is as follows: Java Summator s; @BeforeEach void setUp(){ s = new Summator(); // Setup } @Test void calculatesSum(){ Summator s=init(); Assertions.assertEquals( 2, sum(1,1), "Something went wrong, because 1 + 1 != 2" ); } This approach makes the situation even worse for many reasons. The most obvious reason, familiar to most developers, is the need to "jump" between test methods and the initialization part. Then, over time, as the codebase grows and changes and as the number of test cases in the test class increases, developers may become unaware of the setup/teardown that happens for each test and may end up with setup code that is unnecessary for certain tests, thus violating the principle of keeping tests minimal and setting up only what is needed. Next, using such methods can introduce another problem. They can lead to ashared state between tests if not managed properly. This harms test isolation, an extremely important quality of any test, which in turn can result in flaky tests. Moreover, using @BeforeAll and @AfterAll use static methods, which inherit all the disadvantages of the previous approach. Hence, jtcop doesn't allow the use of such setUp/tearDown methods. Test Extensions Now, let's examine the approach supported by jtcop. JUnit 5 offers Test Extensions that allow for the creation of custom extensions. These extensions. can be used to configure setup and teardown logic for all the tests in a class. Java @ExtendWith(SummatorExtension.class) public class SumTest { @Test void calculatesSum(Summator s) { Assertions.assertEquals( 2, s.sum(1, 1), "Something went wrong, because 1 + 1 != 2" ); } class SummatorExtension implements ParameterResolver { @Override public boolean supportsParameter(ParameterContext pctx, ExtensionContext ectx) { return pctx.getParameter().getType() == Summator.class; } @Override public Object resolveParameter( Summator s =new Summator(); // Setup return s; } Extensions offer a way to craft more modular and reusable test setups. In this scenario, we've bypassed the need for utility classes, static methods, and shared states between tests. These extensions are easily reused across a multitude of test classes and standalone unit tests. What's more, theseextensions often have insight into the current test class, method, annotationsused, and other contextual details, paving the way for versatile and reusablesetup logic. Fake Objects Another method for test configuration and setup that jtcop supports is the use of Fake objects, as recommended here. These are positioned with other production objects, yet they provide a unique"fake" behavior. By leveraging these objects, all setup can be handled directly in a test, making the code cleaner and easier to read. Java abstract class Discount { // Usually we have rather complicated // logic here for calculating a discount. abstract double multiplier(); static class Fake extends Discount { @Override double multiplier() { return 1; } } public class PriceTest { @Test void retrievesSamePrice() { Price p = new Price(100, new Discount.Fake()); Assertions.assertEquals( 100, p.total(), "Something went wrong; the price shouldn't have changed" ); } Fake objects often sit alongside production code, which is why jtcop doesn't classify them as test classes. While mixing production and test code might seem questionable, Fake objects aren't exclusively for testing; you might sometimes integrate them into your production code, too. Many projects have embraced the use of Fake objects, finding it a practical way to set up tests. Additionally, this strategy eliminates the need for using Mock frameworks with intricate initialization logic. Test Assertions jtcop also underscores the need to validate assertions in tests. Several tools out there offer similar checks. Yet, many of them focus solely on JUnit assertions or only catch high-level errors. jtcop supports both Hamcrest and JUnit assertions and adheres to stricter guidelines for assertions. To paint aclearer picture, let's dive into a few code snippets. Java @Test void calculatesSum(){ if(sum(1, 1) != 2){ throw new RuntimeException("1 + 1 != 2"); } } This code snippet lacks any assertions, meaning jtcop will warn about it. Check out the next snippet as a proper replacement, and note the use of the Hamcrest assertion. Java @Test void calculatesSum(){ assertThat( "Something went wrong, because 1 + 1 != 2", sum(1, 1), equalTo(2) ); } Pay attention to the explanatory messages in the assertion Something went wrong, because 1 + 1 != 2 from the code above. They're essential. Without such messages, it can sometimes be challenging to understand what went wrong during test execution, which can puzzle developers. For instance, consider this real example. I've simplified it for clarity: Java @Test void checksSuccessfully(){ assertThat( new Cop(new Project.Fake()).inspection(), empty() ); } Now, suppose this test fails. In that scenario, you'll receive the following exception message: Shell Expected: an empty collection but: <[Complaint$Text@548e6d58]> Not very informative, right? However, if you include an explanatory message in the assertion: Java void checksSuccessfully(){ assertThat( "Cop should not find any complaints in this case, but it has found something.", new Cop(new Project.Fake()).inspection(), empty() ); } With this inclusion, you're greeted with a far more insightful message: Shell java.lang.AssertionError: Cop should not find any complaints in this case, but it has found something. Expected: an empty collection but: <[Complaint$Text@548e6d58]> In a perfect world, we'd offer more details — specifically, some context. This sheds light on initialization values and provides developers with valuable hints. Line Hitters The last feature I'd like to spotlight is the Line Hitter anti-pattern detection. At first glance, the tests cover everything and code coverage tools confirm it with 100%, but in reality the tests merely hit the code, without doing any output analysis. What this means is that you might stumble upon a test method in a program that does not really verify anything. Take this for instance: Java @Test void calculatesSum(){ sum(1, 1); } This typically happens when a developer is more into their code coverage numbers than genuinely ensuring the robustness of the test. There are tools that can spot when assertions are missing in tests. But, as you know, developers might always find a way around: Java @Test void calculatesSum(){ sum(1,1); assertThat( "I'm just hanging around", true, is(true) ); } Yep, that’s our "Line Hitter" again, only this time, it's wearing the disguise of an assertion statement. Luckily, jtcop can detect such tests and flag them as unreliable. Setting up jtcop To get started with jtcop, simply add the plugin to your build configuration file. If you're using Maven, here's how you can do it: XML <build> <plugins> <plugin> <groupId>com.github.volodya-lombrozo</groupId> <artifactId>jtcop-maven-plugin</artifactId> <version>1.1.1</version> <executions> <execution> <goals> <goal>check</goal> </goals> </execution> </executions> </plugin> </plugins> </build> By default, the plugin operates in the verify phase, so there is no need to specify it. However, if you wish to modify it, simply add the desired phase to the execution section. Then, to run jtcop, use the mvn jtcop:checkcommand. If you stumble upon an issue, say, a test lacking a corresponding productionclass, you'll get a clear error message: Shell [ERROR] Test SumTest doesn't have corresponding production class. [ERROR] Either rename or move the test class ./SumTest.java. [ERROR] You can also ignore the rule by adding @SuppressWarnings("JTCOP.RuleAllTestsHaveProductionClass") annotation. [ERROR] Rule: RuleAllTestsHaveProductionClass. [ERROR] You can read more about the rule here: <link> Similarly, for the "Line Hitter" pattern previously mentioned: Shell [ERROR] Method 'calculatesSum' contains line hitter anti-pattern. [ERROR] Write valuable assertion for this test. [ERROR] You can also ignore the rule by adding @SuppressWarnings("JTCOP.RuleLineHitter") annotation. [ERROR] Rule: RuleLineHitter. [ERROR] You can read more about the rule here: <link> By default, jtcop will halt the build if it detects issues with your tests. If you only want to use it to highlight problems without interrupting the build, you can configure jtcop to display only warning messages by adjusting the failOnError property. XML <configuration> <failOnError>false</failOnError> </configuration> However, I highly recommend keeping the default setting to maintain high-quality tests. Experimental Features As I mentioned earlier, some features are still experimental. To try them out, just add the following configuration to your pom.xml file: XML <configuration> <experimental>true</experimental> </configuration> Once done, all experimental features will be active in your project, ensuring cleaner and more organized tests. Benefits jtcop has already helped us in several ways: Code Review: The primary issue addressed by jtcop is the frequent appearance of comments such as "place this test class here," "rename this test method," or "that's a testing anti-pattern." `jtcop` saves time and aids developers in resolving these issues before even making a PR into arepository. Onboarding: Another advantage we've observed is that well-structured and appropriately named test methods not only facilitate code understanding and maintenance but also reduce the time spent explaining or documenting code style guides. As a result, we often receive well-formatted pull requests from new team members with little to no additional guidance. Consistency: jtcop ensures our tests remain consistent across numerous projects. So, when you delve into a project that uses jtcop, it becomes significantly easier to comprehend its workings and start contributing to it. Overall, integrating `jtcop` has significantly streamlined our processes, enhancing collaboration and understanding across our development projects. Future Plans Looking ahead, we're preparing to enhance jtcop with additional rules. One of our primary focuses is to address several anti-patterns like the ones highlighted in this StackOverflow thread. Just to name a few: The Mockery: Tests that have too many mocks. Excessive Setup: Tests that demand extensive setup. Wait and See: Tests that need to pause for a specific duration before verifying if the tested code works as intended. It's worth noting that these are just a few examples; there's a broader spectrum of anti-patterns we're considering. Additionally, we've also encountered issues with projects that have many tests written in various styles. At times, it's incredibly tedious to address these issues manually. Thus, another viable avenue is developing an application that will automatically solve most of these problems. So, if you have ideas or suggestions, please don't hesitate to open an issue or submit a pull request in our repository and share your thoughts with us. We're always eager to get feedback or contributions from the community. Feel free to fork it if you want and craft your own test checkers that fit your needs, or simply use jtcop as is.

By Volodya Lombrozo

DORA Metrics in DevOps

DevOps Research and Assessment (DORA) is a research group in Google Cloud. They conduct a long-running research program trying to assess and understand the velocity and reliability of the software development process. They try to capture what makes teams move fast, how to measure these KPIs automatically, and finally, how to improve based on the captured data. DORA wrote a famous article in 2020 titled Are you an Elite DevOps performer? Find out with the Four Keys Project. They defined two broad areas — velocity and stability — to measure four important metrics: deployment frequency, lead time for changes, change failure rate, and time to restore service (also known as mean time to recover). These four metrics are now known as DORA Metrics or DORA Four Keys. Even though they later added a fifth metric — reliability, we still start on the four original metrics to assess the performance. While the metrics focus on software deployments in general, people typically relate them with deploying application code. However, DORA metrics apply to everything that we deploy, including changes around our databases. Just like it’s important to monitor metrics for deploying microservices or applications in general, we should pay attention to everything that affects our databases, including schema migrations, query changes, configuration modifications, or scheduled background tasks. We need to track these metrics since every change in our databases may affect our customers and impact business operations. Slow databases lead to slow applications, which in turn lead to frustrated clients and lost revenue. Therefore, DevOps performance metrics should include metrics from our databases to accurately reflect the healthiness of our whole business. Let’s read on to understand how to achieve that. Exploring the Four Key DORA Metrics DORA identified four important metrics to measure DevOps performance. These metrics are: Deployment frequency Lead time for changes Change failure rate Time to restore service Let’s walk through them one by one. Deployment Frequency Deployment frequency measures the successful deployments over a given time period. We want to maximize this metric as this means that we have more successful deployments. This indicates that we can get our changes more often to production. In the database world, this means that we can apply modifications to the database more often. We need to understand that there are different types of changes. Some changes can be applied in a short time, while some other changes will require pushing to production outside of office hours or even taking the database offline. It’s important to understand what we measure and if we need to have different dimensions in this metric. To improve the metric, we need to make sure that our deployments are fast and do not fail. We need to add automated testing along the way to check all the changes before we try deploying them in production. This includes: Code reviews Static code analysis Unit tests Integration tests Load tests Configuration checks Schema migrations analysis Other areas of our changes We may also consider breaking changes into smaller batches to deploy them independently. However, we shouldn’t do that just for the sake of increasing the metric. Lead Time for Changes Lead time for changes measures the time it takes a code change to get into production. Let’s clarify some misconceptions before explaining how to work on this metric. First, lead time for changes may sound similar to deployment frequency. However, lead time for changes covers the end-to-end time it takes to get some changes deployed. For instance, we may be deploying changes daily and have high deployment frequency, but a particular change may take a month to get deployed to production. Second, lead time for changes measures how fast we can push a change through the CI/CD pipeline to production once we have the change ready. It’s not the same as lead time, which measures the time between opening an issue and closing it. This metric measures the efficiency of our automated process, mostly our CI/CD pipeline. We can think of it as the time between merging the changes to the main branch and deploying things to production. We want to minimize this metric as this indicates that we can push changes faster. To improve this metric, we should automate the deployment process as much as possible. We should minimize the amount of manual steps needed to verify the change and deploy it to production. Keep in mind that lead time for changes includes the time needed for code reviews, which are known to slow down the process significantly. This is especially important in the area of databases because there are no tools but Metis that can automatically review your database changes. Change Failure Rate The Change failure rate metric measures how often a change causes a failure in production. Even though we reviewed all the changes and tested them automatically, sometimes things break after the deployment. This metric shows at a glance how often that happens, and we want to keep the metric as low as possible. To improve the metric, we need to understand why things break after the deployment. Sometimes, it’s caused by inefficient testing methods. In that case, we need to improve CI/CD pipelines, add more tests, and cover scenarios that fail often. Sometimes, it’s caused by differences between production and non-production environments like traffic increase, different data distribution, different configuration, parallelism, background tasks, permissions, or even different versions of the database running in production. In that case, we need to focus on replicating the production database in testing environments to find the issues during the CI/CD phase. It’s important to understand that there is no point in moving fast (i.e., having high deployment frequency and low lead time for changes) if we break things in production. Stability is crucial, and we need to find the right balance between moving fast and still keeping high quality of our solutions. Time To Restore Service The Time to restore service indicates how long it takes to recover from a failure in production. We want to minimize this metric. This metric can be increased by many aspects: long time for the teams to react, long investigation time, long time for applying the fix or rolling back the solution. Since each issue is different, this metric may be prone to high variation. To improve the metric, we should keep well-written playbooks on how to investigate and fix issues. Teams shouldn’t spend time figuring out what to do. They should have their standard operating procedure written down and accessible whenever the issue pops up. Also, the investigation should be automated as much as possible to save time. We also need good database monitoring to have metrics in place that will fire alarms that will roll deployments back automatically. In DevOps, studies have indicated that high-performing teams can have a recovery time (the time it takes to recover from a failure) of less than an hour, significantly quicker than lower-performing counterparts that may need 24 hours or more. Other Metrics DORA added a fifth metric — reliability — which is now tracked apart from availability in their reports. However, most of the tools and solutions focus on the four key metrics presented above. It’s important to understand that metrics are not the true goal. We want our software to be reliable and always available, and we want the changes to go swiftly and smoothly. Optimizing metrics just for the sake of optimization is not the point. Implementing DORA Metrics in Your DevOps Practices Let’s understand how to implement DORA Metrics in your space. The easiest way to start is to integrate with the Four Keys project. The project provides a solution for measuring software delivery performance metrics and for visualizing them. It’s also worth checking out DORA Presentation Video to see it in action. In general, we need to have the following elements: Signals source Metrics aggregation and calculation Visualization Feedback loop Let’s see this in some greater detail. Signals Source We need to identify sources in our ecosystem and capture their signals. Typical examples of sources are: Source control repository: For instance, pull request created, code review created, comments added, or pull request accepted. CI/CD pipeline: For instance, tests executed, tests failing, deployments, rollbacks, or alarms. Deployment tools like Octopus Deploy: For instance, deployments, rollbacks, or alarms. Incidents: For instance, reported issues, triggered alarms, and faulted queries. However, there are also specific signals around databases that we should capture: Configuration changes: For instance, changing parameters, changing Schema migrations: For instance, when a migration is triggered. Background tasks: For instance, vacuuming, partitioning, and defragmenting. Data migrations: For instance, moving data from hot storage to cold storage. Queries: For instance, slow queries, deadlocks, and unused indexes. We need to capture these signals, transform them into a common form, and then deliver them to a centralized store. The Four Keys project can do that automatically from Cloud Build and GitHub events, and it can be extended with more signal sources if needed. We want to capture signals automatically as much as possible. Ideally, we don’t need to implement anything on our end, but we just want to reuse existing emitters of our infrastructure and frameworks. If we build the code with cloud providers like AWS CodeBuild or Google Cloud Build, then we should capture the metrics using the event mechanisms these infrastructures provide. Same with GitHub, GitLab, or any other build server that we use. Metrics Aggregation and Calculation Once we have the signals accessible from one place, we need to aggregate them and calculate the key figures representing our process performance. Here, we calculate all the four metrics we defined in the previous section. To calculate the metrics, we typically run a daily background job that aggregates the signals, calculates metrics, and exports the results in a form that can be later queried or browsed. This can be a database with all the metrics, JSON files, or some pre-generated dashboards. The Four Keys project includes this part and emits data to BigQuery tables. We don’t need to come up with custom logic to calculate the metrics. We can use the Four Keys project as our starting point and then adjust as needed. If we emit our signals in the same format as the Four Keys, then we don’t need to modify the code at all to calculate the metrics. Visualization Once we calculate the metrics, we can start visualizing them. It’s up to us how we do that, and we can tune this part to our needs. We would like to get dashboards that can be reviewed quickly and can easily show if there are any issues or if we need to focus on improving some metrics. The Four Keys project prepares dashboards like this one: We can see all four metrics presented and some historical data showing how things change over time. We could extend such a dashboard with links to tickets, anomaly detection, or analysis of how to improve metrics. We should look for a balance between how many details we show on the dashboard and how readable it is. Keep in mind that aggregating data may lead to hiding some issues. For instance, if we take the time to restore service from all incidents and average it, then we will include the long tail that may skew the results. On the other hand, if we ignore the long tail, then we may not see some issues that actually stop us from moving fast. Dashboards need to show enough data to easily tell if all is good, and at the same time, they should enable us to dig deeper and analyze details. Feedback Loop Last but not least, we need to have a feedback loop. We don’t track metrics just for the sake of doing so. We need to understand how things change over time and how we can improve metrics later on. To do that, we should build a process to take metrics, analyze them, suggest improvements, implement them, and verify how they affected the pipeline. Ultimately, our goal is to make our business move fast and be reliable. Metrics can only point us to what can be improved, but they won’t fix the issues on our behalf. We need to incorporate them into our day-to-day lives and tune processes to fix the issues along the way. Measuring and Improving With DORA Metrics DORA metrics can show us how to improve processes and technology and how to change the culture within our organization. Since the metrics focus on four key areas (deployment frequency, lead time for changes, time to restore service, and change failure rate), we need to focus on each aspect independently and improve it. Below, we consider some of the strategies on how to use DORA metrics to improve our business. Automation and Tooling We want to automate our deployments and processes. We can do that with: Continuous Integration/Continuous Deployment (CI/CD): Automate testing, building, and deployment processes to streamline the delivery pipeline. Infrastructure as Code (IaC): Automate infrastructure provisioning and configuration, ensuring consistency and repeatability. Code quality tools: Use tools for static code analysis, linters, semantic diffs, and theorem provers. Database tools: Analyze your databases, focusing on things that often go unnoticed, like slow queries, deadlocking transactions, or unused indexes. GitOps: Describe and manage your system declaratively using version control NoOps and AIOps: Automate operations to the extent that they are nearly invisible. Use machine learning and artificial intelligence to remove manual tasks. Culture and Collaboration DORA metrics can’t be fixed without cultural changes. We need to promote DevOps focusing on shorter communication and faster feedback loops. We can improve that by having: Have cross-functional teams: Encourage collaboration between development, operations, and other relevant teams to foster shared responsibility and knowledge. Implement feedback loops: Implement mechanisms for rapid feedback and learning from failures or successes. Measure and analyze: Continuously measure and analyze metrics to identify bottlenecks or areas for improvement. Favor iterative improvements: Use data to iteratively improve processes and workflows. Invest in training: Provide training and resources to empower teams with the necessary skills and knowledge. Encourage experimentation: Create an environment where experimentation and trying new approaches are encouraged. Have supportive leadership: Ensure leadership buy-in and support for DevOps practices and initiatives. Remember, improvements in DORA metrics often require a cultural shift, where continuous improvement and collaboration are valued. Start with small, manageable changes and gradually scale up improvements as the organization adapts to the new practices. Reducing Lead Time for Changes To improve the lead time for changes, we can try the following: Smaller batch sizes: Break down work into smaller, manageable chunks to reduce lead time for changes. Parallel development: Encourage parallel development of features by different teams or individuals. Parallel testing: Run tests early and in parallel. Do not wait with load tests until the very end but start them as early as possible to not block the pipeline. Improving Time To Restore Service To improve the time to restore service, we can make the following improvements: Monitoring and observability: Implement robust monitoring to detect issues early and facilitate faster troubleshooting. Blameless post-mortems: Encourage a blame-free culture to learn from incidents and improve processes without fear of retribution. Anomaly detection: Check your metrics automatically to detect anomalies and have low-priority alerts for those. Manual reviews: Encourage your stakeholders to periodically review metrics showing business performance to not let any business issues go unnoticed. Feature flags and rollbacks: Deploy changes behind feature flags to be able to roll back them much faster. Reducing Change Failure Rate To reduce the change failure rate, we need to make sure we identify as many issues as possible before going to production. Testing strategies: Enhance testing practices (unit, integration, regression) to catch issues before deployment. Feature flags and rollbacks: Implement feature toggles to enable easy rollback of features if issues arise. Maintaining documentation: Capture the issues that happened in the past and extend your pipelines to automatically make sure these issues won’t happen again. Conclusion: The Future of DevOps With DORA Metrics The future of DevOps with DORA metrics will likely involve a continued evolution towards greater automation, enhanced collaboration, stronger security integration, and a deeper understanding of how to measure and optimize software delivery and operational performance. Flexibility, adaptability, and a culture of continuous improvement will remain key aspects of successful DevOps implementations. We’ll include more and more domains like ML, security, and databases. We’ll also go towards NoOps and replace all manual work with automated machine-learning solutions. FAQ What Are the Four Key Metrics of DevOps? These are: Deployment frequency Lead time for changes Time to restore service (also known as mean time to restore or MTTR) Change failure rate How Does the DORA Framework Improve DevOps Performance? DORA Metrics improves DevOps performance by providing a structured approach to measure and assess key metrics associated with software delivery and operational excellence. What Is the Role of Continuous Deployment in DevOps? Continuous deployment is a practice that focuses on automating the deployment of code changes to production or a live environment after they pass through the entire pipeline of tests and checks. It improves the business by automating the release process, enabling frequent and reliable software deployments. How Do You Calculate Lead Time for Changes in DevOps? The lead time for changes in DevOps represents the duration it takes for a code change to move from the initial commit (when the change is introduced) to its deployment in a production environment. What Strategies Reduce the Change Failure Rate in DevOps Environments? Reducing change failure rate in DevOps environments involves implementing strategies that prioritize reliability, risk mitigation, and thorough testing throughout the software delivery lifecycle. Automate your tests, remove manual steps, test early, and test often. How Is the Mean Time to Recover Crucial for DevOps Success? In essence, a lower MTTR is indicative of a more responsive, efficient, and resilient DevOps environment. It's not just about reacting quickly to incidents but also about learning from them to prevent similar issues in the future, ultimately contributing to the success of DevOps practices and the overall stability of systems and services. What Tools Are Used for Measuring DORA Metrics? The most important is the Four Keys project. However, we can build our own pipelines with any tools that allow us to capture signals from CI/CD and deployment, aggregate these signals, calculate metrics, and then visualize the results with dashboards.

By Adam Furmanek

Improving Upon My OpenTelemetry Tracing Demo

Last year, I wrote a post on OpenTelemetry Tracing to understand more about the subject. I also created a demo around it, which featured the following components: The Apache APISIX API Gateway A Kotlin/Spring Boot service A Python/Flask service And a Rust/Axum service I've recently improved the demo to deepen my understanding and want to share my learning. Using a Regular Database In the initial demo, I didn't bother with a regular database. Instead: The Kotlin service used the embedded Java H2 database The Python service used the embedded SQLite The Rust service used hard-coded data in a hash map I replaced all of them with a regular PostgreSQL database, with a dedicated schema for each. The OpenTelemetry agent added a new span when connecting to the database on the JVM and in Python. For the JVM, it's automatic when one uses the Java agent. One needs to install the relevant package in Python — see next section. OpenTelemetry Integrations in Python Libraries Python requires you to explicitly add the package that instruments a specific library for OpenTelemetry. For example, the demo uses Flask; hence, we should add the Flask integration package. However, it can become a pretty tedious process. Yet, once you've installed opentelemetry-distro, you can "sniff" installed packages and install the relevant integration. Shell pip install opentelemetry-distro opentelemetry-bootstrap -a install For the demo, it installs the following: Plain Text opentelemetry_instrumentation-0.41b0.dist-info opentelemetry_instrumentation_aws_lambda-0.41b0.dist-info opentelemetry_instrumentation_dbapi-0.41b0.dist-info opentelemetry_instrumentation_flask-0.41b0.dist-info opentelemetry_instrumentation_grpc-0.41b0.dist-info opentelemetry_instrumentation_jinja2-0.41b0.dist-info opentelemetry_instrumentation_logging-0.41b0.dist-info opentelemetry_instrumentation_requests-0.41b0.dist-info opentelemetry_instrumentation_sqlalchemy-0.41b0.dist-info opentelemetry_instrumentation_sqlite3-0.41b0.dist-info opentelemetry_instrumentation_urllib-0.41b0.dist-info opentelemetry_instrumentation_urllib3-0.41b0.dist-info opentelemetry_instrumentation_wsgi-0.41b0.dist-info The above setup adds a new automated trace for connections. Gunicorn on Flask Every time I started the Flask service, it showed a warning in red that it shouldn't be used in production. While it's unrelated to OpenTelemetry, and though nobody complained, I was not too fond of it. For this reason, I added a "real" HTTP server. I chose Gunicorn, for no other reason than because my knowledge of the Python ecosystem is still shallow. The server is a runtime concern. We only need to change the Dockerfile slightly: Dockerfile RUN pip install gunicorn ENTRYPOINT ["opentelemetry-instrument", "gunicorn", "-b", "0.0.0.0", "-w", "4", "app:app"] The -b option refers to binding; you can attach to a specific IP. Since I'm running Docker, I don't know the IP, so I bind to any. The -w option specifies the number of workers Finally, the app:app argument sets the module and the application, separated by a colon Gunicorn usage doesn't impact OpenTelemetry integrations. Heredocs for the Win You may benefit from this if you write a lot of Dockerfile. Every Docker layer has a storage cost. Hence, inside a Dockerfile, one tends to avoid unnecessary layers. For example, the two following snippets yield the same results. Dockerfile RUN pip install pip-tools RUN pip-compile RUN pip install -r requirements.txt RUN pip install gunicorn RUN opentelemetry-bootstrap -a install RUN pip install pip-tools \ && pip-compile \ && pip install -r requirements.txt \ && pip install gunicorn \ && opentelemetry-bootstrap -a install The first snippet creates five layers, while the second is only one; however, the first is more readable than the second. With heredocs, we can access a more readable syntax that creates a single layer: Dockerfile RUN <<EOF pip install pip-tools pip-compile pip install -r requirements.txt pip install gunicorn opentelemetry-bootstrap -a install EOF Heredocs are a great way to have more readable and more optimized Dockerfiles. Try them! Explicit API Call on the JVM In the initial demo, I showed two approaches: The first uses auto-instrumentation, which requires no additional action The second uses manual instrumentation with Spring annotations I wanted to demo an explicit call with the API in the improved version. The use-case is analytics and uses a message queue: I get the trace data from the HTTP call and create a message with such data so the subscriber can use it as a parent. First, we need to add the OpenTelemetry API dependency to the project. We inherit the version from the Spring Boot Starter parent POM: XML <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> At this point, we can access the API. OpenTelemetry offers a static method to get an instance: Kotlin val otel = GlobalOpenTelemetry.get() At runtime, the agent will work its magic to return the instance. Here's a simplified class diagram focused on tracing: In turn, the flow goes something like this: Kotlin val otel = GlobalOpenTelemetry.get() //1 val tracer = otel.tracerBuilder("ch.frankel.catalog").build() //2 val span = tracer.spanBuilder("AnalyticsFilter.filter") //3 .setParent(Context.current()) //4 .startSpan() //5 // Do something here span.end() //6 Get the underlying OpenTelemetry Get the tracer builder and "build" the tracer Get the span builder Add the span to the whole chain Start the span End the span; after this step, send the data to the OpenTelemetry endpoint configured Adding a Message Queue When I did the talk based on the post, attendees frequently asked whether OpenTelemetry would work with messages such as MQ or Kafka. While I thought it was the case in theory, I wanted to make sure of it: I added a message queue in the demo under the pretense of analytics. The Kotlin service will publish a message to an MQTT topic on each request. A NodeJS service will subscribe to the topic. Attaching OpenTelemetry Data to the Message So far, OpenTelemetry automatically reads the context to find out the trace ID and the parent span ID. Whatever the approach, auto-instrumentation or manual, annotations-based or explicit, the library takes care of it. I didn't find any existing similar automation for messaging; we need to code our way in. The gist of OpenTelemetry is the traceparent HTTP header. We need to read it and send it along with the message. First, let's add MQTT API to the project. XML <dependency> <groupId>org.eclipse.paho</groupId> <artifactId>org.eclipse.paho.mqttv5.client</artifactId> <version>1.2.5</version> </dependency> Interestingly enough, the API doesn't allow access to the traceparent directly. However, we can reconstruct it via the SpanContext class. I'm using MQTT v5 for my message broker. Note that the v5 allows for metadata attached to the message; when using v3, the message itself needs to wrap them. JavaScript val spanContext = span.spanContext //1 val message = MqttMessage().apply { properties = MqttProperties().apply { val traceparent = "00-${spanContext.traceId}-${spanContext.spanId}-${spanContext.traceFlags}" //2 userProperties = listOf(UserProperty("traceparent", traceparent)) //3 } qos = options.qos isRetained = options.retained val hostAddress = req.remoteAddress().map { it.address.hostAddress }.getOrNull() payload = Json.encodeToString(Payload(req.path(), hostAddress)).toByteArray() //4 } val client = MqttClient(mqtt.serverUri, mqtt.clientId) //5 client.publish(mqtt.options, message) //6 Get the span context Construct the traceparent from the span context, according to the W3C Trace Context specification Set the message metadata Set the message body Create the client Publish the message Getting OpenTelemetry Data From the Message The subscriber is a new component based on NodeJS. First, we configure the app to use the OpenTelemetry trace exporter: JavaScript const sdk = new NodeSDK({ resource: new Resource({[SemanticResourceAttributes.SERVICE_NAME]: 'analytics'}), traceExporter: new OTLPTraceExporter({ url: `${collectorUri}/v1/traces` }) }) sdk.start() The next step is to read the metadata, recreate the context from the traceparent, and create a span. JavaScript client.on('message', (aTopic, payload, packet) => { if (aTopic === topic) { console.log('Received new message') const data = JSON.parse(payload.toString()) const userProperties = {} if (packet.properties['userProperties']) { //1 const props = packet.properties['userProperties'] for (const key of Object.keys(props)) { userProperties[key] = props[key] } } const activeContext = propagation.extract(context.active(), userProperties) //2 const tracer = trace.getTracer('analytics') const span = tracer.startSpan( //3 'Read message', {attributes: {path: data['path'], clientIp: data['clientIp']}, activeContext, ) span.end() //4 } }) Read the metadata Recreate the context from the traceparent Create the span End the span For the record, I tried to migrate to TypeScript, but when I did, I didn't receive the message. Help or hints are very welcome! Apache APISIX for Messaging Though it's not common knowledge, Apache APISIX can proxy HTTP calls as well as UDP and TCP messages. It only offers a few plugins at the moment, but it will add more in the future. An OpenTelemetry one will surely be part of it. In the meantime, let's prepare for it. The first step is to configure Apache APISIX to allow both HTTP and TCP: YAML apisix: proxy_mode: http&stream #1 stream_proxy: tcp: - addr: 9100 #2 tls: false Configure APISIX for both modes Set the TCP port The next step is to configure TCP routing: YAML upstreams: - id: 4 nodes: "mosquitto:1883": 1 #1 stream_routes: #2 - id: 1 upstream_id: 4 plugins: mqtt-proxy: #3 protocol_name: MQTT protocol_level: 5 #4 Define the MQTT queue as the upstream Define the "streaming" route. APISIX defines everything that's not HTTP as streaming Use the MQTT proxy. Note APISIX offers a Kafka-based one Address the MQTT version. For version above 3, it should be 5 Finally, we can replace the MQTT URLs in the Docker Compose file with APISIX URLs. Conclusion I've described several items I added to improve my OpenTelemetry demo in this post. While most are indeed related to OpenTelemetry, some of them aren't. I may add another component in another different stack, a front-end. The complete source code for this post can be found on GitHub.

By Nicolas Fränkel

CORE

A Brief History of DevOps and the Link to Cloud Development Environments

The history of DevOps is definitely worth reading in a few good books about it. On that topic, “The Phoenix Project,” self-characterized as “a novel of IT and DevOps,” is often mentioned as a must-read. Yet for practitioners like myself, a more hands-on one is “The DevOps Handbook” (which shares Kim as an author in addition to Debois, Willis, and Humble) that recounts some of the watershed moments around the evolution of software engineering and provides good references around implementation. This book actually describes how to replicate the transformation explained in the Phoenix Project and provides case studies. In this brief article, I will use my notes on this great book to regurgitate a concise history of DevOps, add my personal experience and opinion, and establish a link to Cloud Development Environments (CDEs), i.e., the practice of providing access to and running, development environments online as a service for developers. In particular, I explain how the use of CDEs concludes the effort of bringing DevOps “fully online.” Explaining the benefits of this shift in development practices, plus a few personal notes, is my main contribution in this brief article. Before clarifying the link between DevOps and CDEs, let’s first dig into the chain of events and technical contributions that led to today’s main methodology for delivering software. The Agile Manifesto The creation of the Agile Manifesto in 2001 sets forth values and principles as a response to more cumbersome software development methodologies like Waterfall and the Rational Unified Process (RUP). One of the manifesto's core principles emphasizes the importance of delivering working software frequently, ranging from a few weeks to a couple of months, with a preference for shorter timescales. The Agile movement's influence expanded in 2008 during the Agile Conference in Toronto, where Andrew Shafer suggested applying Agile principles to IT infrastructure rather than just to the application code. This idea was further propelled by a 2009 presentation at the Velocity Conference, where a paper from Flickr demonstrated the impressive feat of "10 deployments a day" using Dev and Ops collaboration. Inspired by these developments, Patrick Debois organized the first DevOps Days in Belgium, effectively coining the term "DevOps." This marked a significant milestone in the evolution of software development and operational practices, blending Agile's swift adaptability with a more inclusive approach to the entire IT infrastructure. The Three Ways of DevOps and the Principles of Flow All the concepts that I discussed so far are today incarnated into the “Three Ways of DevOps,” i.e., the foundational principles that guide the practices and processes in DevOps. In brief, these principles focus on: Improving the flow of work (First Way), i.e., the elimination of bottlenecks, reduction of batch sizes, and acceleration of workflow from development to production, Amplifying feedback loops (Second Way), i.e., quickly and accurately collect information about any issues or inefficiencies in the system and Fostering a culture of continuous learning and experimentation (Third Way), i.e., encouraging a culture of continuous learning and experimentation. Following the leads from Lean Manufacturing and Agile, it is easy to understand what led to the definition of the above three principles. I delve more deeply into each of these principles in this conference presentation. For the current discussion, though, i.e., how DevOps history leads to Cloud Development Environments, we just need to look at the First Way, the principle of flow, to understand the causative link. Chapter 9 of the DevOps Handbook explains that the technologies of version control and containerization are central to implementing DevOps flows and establishing a reliable and consistent development process. At the center of enabling the flow is the practice of incorporating all production artifacts into version control to serve as a single source of truth. This enables the recreation of the entire production environment in a repeatable and documented fashion. It ensures that production-like code development environments can be automatically generated and entirely self-serviced without requiring manual intervention from Operations. The significance of this approach becomes evident at release time, which is often the first time where an application's behavior is observed in a production-like setting, complete with realistic load and production data sets. To reduce the likelihood of issues, developers are encouraged to operate production-like environments on their workstations, created on-demand and self-serviced through mechanisms such as virtual images or containers, utilizing tools like Vagrant or Docker. Putting these environments under version control allows for the entire pre-production and build processes to be recreated. Note that production-like environments really refer to environments that, in addition to having the same infrastructure and application configuration as the real production environments, also contain additional applications and layers necessary for development. Developers are encouraged to operate production-like environments (Docker icon) on their workstations using mechanisms such as virtual images or containers to reduce the likelihood of execution issues in production. From Developer Workstations to a CDE Platform The notion of self-service is already emphasized in the DevOps Handbook as a key enabler to the principle of flow. Using 2016 technology, this is realized by downloading environments to the developers’ workstations from a registry (such as DockerHub) that provides pre-configured, production-like environments as files (dubbed infrastructure-as-code). Docker is often a tool to implement this function. Starting from this operation, developers create an application in effect as follows: They access and copy files with development environment information to their machines, Add source code to it in the local storage, and Build the application locally using their workstation computing resources. This is illustrated in the left part of the figure below. Once the application works correctly, the source code is sent (“pushed) to a central code repository, and the application is built and deployed online, i.e., using Cloud-based resources and applications such as CI/CD pipelines. The three development steps listed above are, in effect, the only operations in addition to the authoring of source code using an IDE that is “local,” i.e., they use workstations’ physical storage and computing resources. All the rest of the DevOps operations are performed using web-based applications and used as-a-service by developers and operators (even when these applications are self-hosted by the organization.). The basic goal of Cloud Development Environments is to move these development steps online as well. To do that, CDE platforms, in essence, provide the following basic services, illustrated in the right part of the figure below: Manage development environments online as containers or virtual machines such that developers can access them fully built and configured, substituting step (1) above; then Provide a mechanism for authoring source code online, i.e., inside the development environment using an IDE or a terminal, substituting step (2); and finally Provide a way to execute build commands inside the development environment (via the IDE or terminal), substituting step (3). Figure: (left) The classic development data flow requires the use of the local workstation resources. (right) The cloud development data flow replaced local storage and computing while keeping a similar developer experience. On each side, operations are (1) accessing environment information, (2) adding code, and (3) building the application. Note that the replacement of step (2) can be done in several ways. For example, for example, the IDE can be browser-based (aka a Cloud IDE), or a locally installed IDE can implement a way to remotely author the code in the remote environment. It is also possible to use a console text editor via a terminal such as vim. I cannot conclude this discussion without mentioning that, often multiple containerized environments are used for testing on the workstation, in particular in combination with the main containerized development environment. Hence, cloud IDE platforms need to reproduce the capability to run containerized environments inside the Cloud Development Environment (itself a containerized environment). If this recursive process becomes a bit complicated to grasp, don’t worry; we have reached the end of the discussion and can move to the conclusion. What Comes Out of Using Cloud Development Environments in DevOps A good way to conclude this discussion is to summarize the benefits of moving development environments from the developers’ workstations online using CDEs. As a result, the use of CDEs for DevOps leads to the following advantages: Streamlined Workflow: CDEs enhance the workflow by removing data from the developer's workstation and decoupling the hardware from the development process. This ensures the development environment is consistent and not limited by local hardware constraints. Environment Definition: With CDEs, version control becomes more robust as it can uniformize not only the environment definition but all the tools attached to the workflow, leading to a standardized development process and consistency across teams across the organization. Centralized Environments: The self-service aspect is improved by centralizing the production, maintenance, and evolution of environments based on distributed development activities. This allows developers to quickly access and manage their environments without the need for Operations manual work. Asset Utilization: Migrating the consumption of computing resources from local hardware to centralized and shared cloud resources not only lightens the load on local machines but also leads to more efficient use of organizational resources and potential cost savings. Improved Collaboration: Ubiquitous access to development environments, secured by embedded security measures in the access mechanisms, allows organizations to cater to a diverse group of developers, including internal, external, and temporary workers, fostering collaboration across various teams and geographies. Scalability and Flexibility: CDEs offer scalable cloud resources that can be adjusted to project demands, facilitating the management of multiple containerized environments for testing and development, thus supporting the distributed nature of modern software development teams. Enhanced Security and Observability: Centralizing development environments in the Cloud not only improves security (more about secure CDEs) but also provides immediate observability due to their online nature, allowing for real-time monitoring and management of development activities. By integrating these aspects, CDEs become a solution for modern, in particular cloud-native software development, and align with the principles of DevOps to improve flow, but also feedback, and continuous learning. In an upcoming article, I will discuss the contributions of CDEs across all three ways of DevOps. In the meantime, you're welcome to share your feedback with me.

By Laurent Balmelli, PhD

DTrace Revisited: Advanced Debugging Techniques

When we think of debugging, we think of breakpoints in IDEs, stepping over, inspecting variables, etc. However, there are instances where stepping outside the conventional confines of an IDE becomes essential to track and resolve complex issues. This is where tools like DTrace come into play, offering a more nuanced and powerful approach to debugging than traditional methods. This blog post delves into the intricacies of DTrace, an innovative tool that has reshaped the landscape of debugging and system analysis. DTrace Overview DTrace was first introduced by Sun Microsystems in 2004, DTrace quickly garnered attention for its groundbreaking approach to dynamic system tracing. Originally developed for Solaris, it has since been ported to various platforms, including MacOS, Windows, and Linux. DTrace stands out as a dynamic tracing framework that enables deep inspection of live systems – from operating systems to running applications. Its capacity to provide real-time insights into system and application behavior without significant performance degradation marks it as a revolutionary tool in the domain of system diagnostics and debugging. Understanding DTrace’s Capabilities DTrace, short for Dynamic Tracing, is a comprehensive toolkit for real-time system monitoring and debugging, offering an array of capabilities that span across different levels of system operation. Its versatility lies in its ability to provide insights into both high-level system performance and detailed process-level activities. System Monitoring and Analysis At its core, DTrace excels in monitoring various system-level operations. It can trace system calls, file system activities, and network operations. This enables developers and system administrators to observe the interactions between the operating system and the applications running on it. For instance, DTrace can identify which files a process accesses, monitor network requests, and even trace system calls to provide a detailed view of what's happening within the system. Process and Performance Analysis Beyond system-level monitoring, DTrace is particularly adept at dissecting individual processes. It can provide detailed information about process execution, including CPU and memory usage, helping to pinpoint performance bottlenecks or memory leaks. This granular level of detail is invaluable for performance tuning and debugging complex software issues. Customizability and Flexibility One of the most powerful aspects of DTrace is its customizability. With a scripting language based on C syntax, DTrace allows the creation of customized scripts to probe specific aspects of system behavior. This flexibility means that it can be adapted to a wide range of debugging scenarios, making it a versatile tool in a developer’s arsenal. Real-World Applications In practical terms, DTrace can be used to diagnose elusive performance issues, track down resource leaks, or understand complex interactions between different system components. For example, it can be used to determine the cause of a slow file operation, analyze the reasons behind a process crash, or understand the system impact of a new software deployment. Performance and Compatibility of DTrace A standout feature of DTrace is its ability to operate with remarkable efficiency. Despite its deep system integration, DTrace is designed to have minimal impact on overall system performance. This efficiency makes it a feasible tool for use in live production environments, where maintaining system stability and performance is crucial. Its non-intrusive nature allows developers and system administrators to conduct thorough debugging and performance analysis without the worry of significantly slowing down or disrupting the normal operation of the system. Cross-Platform Compatibility Originally developed for Solaris, DTrace has evolved into a cross-platform tool, with adaptations available for MacOS, Windows, and various Linux distributions. Each platform presents its own set of features and limitations. For instance, while DTrace is a native component in Solaris and MacOS, its implementation in Linux often requires a specialized build due to kernel support and licensing considerations. Compatibility Challenges on MacOS On MacOS, DTrace's functionality intersects with System Integrity Protection (SIP), a security feature designed to prevent potentially harmful actions. To utilize DTrace effectively, users may need to disable SIP, which should be done with caution. This process involves booting into recovery mode and executing specific commands, a step that highlights the need for a careful approach when working with such powerful system-level tools. We can disable SIP using the command: csrutil disable We can optionally use a more refined approach of enabling SIP without dtrace using the following command: csrutil enable --without dtrace Be extra careful when issuing these commands and when working on machines where dtrace is enabled. Back up your data properly! Customizability and Flexibility of DTrace A key feature that sets DTrace apart in the realm of system monitoring tools is its highly customizable nature. DTrace employs a scripting language that bears similarity to C syntax, offering users the ability to craft detailed and specific diagnostic scripts. This scripting capability allows for the creation of custom probes that can be fine-tuned to target particular aspects of system behavior, providing precise and relevant data. Adaptability to Various Scenarios The flexibility of DTrace's scripting language means it can adapt to a multitude of debugging scenarios. Whether it's tracking down memory leaks, analyzing CPU usage, or monitoring I/O operations, DTrace can be configured to provide insights tailored to the specific needs of the task. This adaptability makes it an invaluable tool for both developers and system administrators who require a dynamic approach to problem-solving. Examples of Customizable Probes Users can define probes to monitor specific system events, track the behavior of certain processes, or gather data on system resource usage. This level of customization ensures that DTrace can be an effective tool in a variety of contexts, from routine maintenance to complex troubleshooting tasks. The following is a simple "Hello, world!" dtrace probe: sudo dtrace -qn 'syscall::write:entry, syscall::sendto:entry /pid == $target/ { printf("(%d) %s %s", pid, probefunc, copyinstr(arg1)); }' -p 9999 The kernel is instrumented with hooks that match various callbacks. dtrace connects to these hooks and can perform interesting tasks when these hooks are triggered. They have a naming convention, specifically provider:module:function:name. In this case, the provider is a system call in both cases. We have no module so we can leave that part blank between the colon (:) symbols. We grab a write operation and sendto entries. When an application writes or tries to send a packet, the following code event will trigger. These things happen frequently, which is why we restrict the process ID to the specific target with pid == $target. This means the code will only trigger for the PID passed to us in the command line. The rest of the code should be simple for anyone with basic C experience: it's a printf that would list the processes and the data passed. Real-World Applications of DTrace DTrace's diverse capabilities extend far beyond theoretical use, playing a pivotal role in resolving real-world system complexities. Its ability to provide deep insights into system operations makes it an indispensable tool in a variety of practical applications. To get a sense of how DTrace can be used, we can use the man -k dtrace command whose output on my Mac is below: bitesize.d(1m) - analyse disk I/O size by process. Uses DTrace cpuwalk.d(1m) - Measure which CPUs a process runs on. Uses DTrace creatbyproc.d(1m) - snoop creat()s by process name. Uses DTrace dappprof(1m) - profile user and lib function usage. Uses DTrace dapptrace(1m) - trace user and library function usage. Uses DTrace dispqlen.d(1m) - dispatcher queue length by CPU. Uses DTrace dtrace(1) - dynamic tracing compiler and tracing utility dtruss(1m) - process syscall details. Uses DTrace errinfo(1m) - print errno for syscall fails. Uses DTrace execsnoop(1m) - snoop new process execution. Uses DTrace fddist(1m) - file descriptor usage distributions. Uses DTrace filebyproc.d(1m) - snoop opens by process name. Uses DTrace hotspot.d(1m) - print disk event by location. Uses DTrace iofile.d(1m) - I/O wait time by file and process. Uses DTrace iofileb.d(1m) - I/O bytes by file and process. Uses DTrace iopattern(1m) - print disk I/O pattern. Uses DTrace iopending(1m) - plot number of pending disk events. Uses DTrace iosnoop(1m) - snoop I/O events as they occur. Uses DTrace iotop(1m) - display top disk I/O events by process. Uses DTrace kill.d(1m) - snoop process signals as they occur. Uses DTrace lastwords(1m) - print syscalls before exit. Uses DTrace loads.d(1m) - print load averages. Uses DTrace newproc.d(1m) - snoop new processes. Uses DTrace opensnoop(1m) - snoop file opens as they occur. Uses DTrace pathopens.d(1m) - full pathnames opened ok count. Uses DTrace perldtrace(1) - Perl's support for DTrace pidpersec.d(1m) - print new PIDs per sec. Uses DTrace plockstat(1) - front-end to DTrace to print statistics about POSIX mutexes and read/write locks priclass.d(1m) - priority distribution by scheduling class. Uses DTrace pridist.d(1m) - process priority distribution. Uses DTrace procsystime(1m) - analyse system call times. Uses DTrace rwbypid.d(1m) - read/write calls by PID. Uses DTrace rwbytype.d(1m) - read/write bytes by vnode type. Uses DTrace rwsnoop(1m) - snoop read/write events. Uses DTrace sampleproc(1m) - sample processes on the CPUs. Uses DTrace seeksize.d(1m) - print disk event seek report. Uses DTrace setuids.d(1m) - snoop setuid calls as they occur. Uses DTrace sigdist.d(1m) - signal distribution by process. Uses DTrace syscallbypid.d(1m) - syscalls by process ID. Uses DTrace syscallbyproc.d(1m) - syscalls by process name. Uses DTrace syscallbysysc.d(1m) - syscalls by syscall. Uses DTrace topsyscall(1m) - top syscalls by syscall name. Uses DTrace topsysproc(1m) - top syscalls by process name. Uses DTrace Tcl_CommandTraceInfo(3tcl), Tcl_TraceCommand(3tcl), Tcl_UntraceCommand(3tcl) - monitor renames and deletes of a command bitesize.d(1m) - analyse disk I/O size by process. Uses DTrace cpuwalk.d(1m) - Measure which CPUs a process runs on. Uses DTrace creatbyproc.d(1m) - snoop creat()s by process name. Uses DTrace dappprof(1m) - profile user and lib function usage. Uses DTrace dapptrace(1m) - trace user and library function usage. Uses DTrace dispqlen.d(1m) - dispatcher queue length by CPU. Uses DTrace dtrace(1) - dynamic tracing compiler and tracing utility dtruss(1m) - process syscall details. Uses DTrace errinfo(1m) - print errno for syscall fails. Uses DTrace execsnoop(1m) - snoop new process execution. Uses DTrace fddist(1m) - file descriptor usage distributions. Uses DTrace filebyproc.d(1m) - snoop opens by process name. Uses DTrace hotspot.d(1m) - print disk event by location. Uses DTrace iofile.d(1m) - I/O wait time by file and process. Uses DTrace iofileb.d(1m) - I/O bytes by file and process. Uses DTrace iopattern(1m) - print disk I/O pattern. Uses DTrace iopending(1m) - plot number of pending disk events. Uses DTrace iosnoop(1m) - snoop I/O events as they occur. Uses DTrace iotop(1m) - display top disk I/O events by process. Uses DTrace kill.d(1m) - snoop process signals as they occur. Uses DTrace lastwords(1m) - print syscalls before exit. Uses DTrace loads.d(1m) - print load averages. Uses DTrace newproc.d(1m) - snoop new processes. Uses DTrace opensnoop(1m) - snoop file opens as they occur. Uses DTrace pathopens.d(1m) - full pathnames opened ok count. Uses DTrace perldtrace(1) - Perl's support for DTrace pidpersec.d(1m) - print new PIDs per sec. Uses DTrace plockstat(1) - front-end to DTrace to print statistics about POSIX mutexes and read/write locks priclass.d(1m) - priority distribution by scheduling class. Uses DTrace pridist.d(1m) - process priority distribution. Uses DTrace procsystime(1m) - analyse system call times. Uses DTrace rwbypid.d(1m) - read/write calls by PID. Uses DTrace rwbytype.d(1m) - read/write bytes by vnode type. Uses DTrace rwsnoop(1m) - snoop read/write events. Uses DTrace sampleproc(1m) - sample processes on the CPUs. Uses DTrace seeksize.d(1m) - print disk event seek report. Uses DTrace setuids.d(1m) - snoop setuid calls as they occur. Uses DTrace sigdist.d(1m) - signal distribution by process. Uses DTrace syscallbypid.d(1m) - syscalls by process ID. Uses DTrace syscallbyproc.d(1m) - syscalls by process name. Uses DTrace syscallbysysc.d(1m) - syscalls by syscall. Uses DTrace topsyscall(1m) - top syscalls by syscall name. Uses DTrace topsysproc(1m) - top syscalls by process name. Uses DTrace There's a lot here; we don't need to read everything. The point is that when you run into a problem you can just search through this list and find a tool dedicated to debugging that problem. Let’s say you're facing elevated disk write issues that are causing the performance of your application to degrade. . . But is it your app at fault or some other app? rwbypid.d can help you with that: it can generate a list of processes and the number of calls they have for read/write based on the process ID as seen in the following screenshot: We can use this information to better understand IO issues in our code or even in 3rd party applications/libraries. iosnoop is another tool that helps us track IO operations but with more details: In diagnosing elusive system issues, DTrace shines by enabling detailed observation of system calls, file operations, and network activities. For instance, it can be used to uncover the root cause of unexpected system behaviors or to trace the origin of security breaches, offering a level of detail that is often unattainable with other debugging tools. Performance optimization is the main area where DTrace demonstrates its strengths. It allows administrators and developers to pinpoint performance bottlenecks, whether they lie in application code, system calls, or hardware interactions. By providing real-time data on resource usage, DTrace helps in fine-tuning systems for optimal performance. Final Words In conclusion, DTrace stands as a powerful and versatile tool in the realm of system monitoring and debugging. We've explored its broad capabilities, from in-depth system analysis to individual process tracing, and its remarkable performance efficiency that allows for its use in live environments. Its cross-platform compatibility, coupled with the challenges and solutions specific to MacOS, highlights its widespread applicability. The customizability through scripting provides unmatched flexibility, adapting to a myriad of diagnostic needs. Real-world applications of DTrace in diagnosing system issues and optimizing performance underscore its practical value. DTrace's comprehensive toolkit offers an unparalleled window into the inner workings of systems, making it an invaluable asset for system administrators and developers alike. Whether it's for routine troubleshooting or complex performance tuning, DTrace provides insights and solutions that are essential in the modern computing landscape.

By Shai Almog

CORE

Test Automation Guilds: Advancing Excellence in Testing

Have you ever found yourself in the position of a test engineer embedded in one of the Agile engineering teams? While you have daily interactions with peers, connecting with them on a profound level for the successful execution of job duties might be challenging. Although there is a shared goal to release features successfully, we often experience isolation, especially while others, like developers, find comfort within the team. In the realm of dispersed Agile teams with time zones adding an extra layer of complexity, the longing for a team to resonate with, connect with, and brainstorm on all test automation challenges is prevalent. In the expansive landscape of test automation, the creation of an automation guild is more than just collaboration; it stands as a testament to the resilience of SDETs working across diverse time zones and Agile teams. Through this guide, I aim to share the benefits and challenges overcome, the enrichment of test engineers or SDETs, and the establishment of a collective force dedicated to advancing excellence in testing. Breaking Silos In a world where time zones separate teams and Agile methodologies dictate the rhythm of development, test engineers face a unique challenge. Even though they are part of an Agile team with a shared goal, i.e., successful release, they must navigate independently without a clear direction or purpose. The guild, however, becomes a bridge across these temporal gaps, offering a platform for asynchronous collaboration. It not only allows them to demo their progress, accomplishments, and new utility that can be leveraged by others but also their challenges and blockers. It will surprise you to see how often those obstacles are common among other guild members. Now that they have each other, all heads come together to brainstorm and find common, effective solutions for any testing problem. Fostering Through Training and Contribution As important as regular guild meet-ups and collective commitment are, continuous learning and training initiatives are equally vital to empower test engineers to contribute effectively. From workshops on emerging testing methodologies to skill-building webinars, the guild evolves into a learning haven where members grow together, ensuring each test engineer is equipped to make a meaningful impact. It enhances members’ efficiency by reducing redundant efforts. Understanding what others are working on and what tools are available for use, such as common utilities and shared definitions, enables them to save time by avoiding duplication of efforts and contribute more effectively. This isn’t just about individual efficiency; it’s a strategic move toward collective empowerment. Grow Your Network and Your Profile Within the guild, networking is not confined to individual teams. It offers the creation of a network that spans across Agile teams, allowing Test Engineers to understand overall solutions from diverse perspectives. This isn’t just about sharing knowledge; it’s about broadening domain knowledge. Turning new members into seasoned members who can then mentor new juniors, ensuring that the guild is not just a community but a mentorship ecosystem that thrives on collective wisdom. If there’s one aspect that has been repeatedly demonstrated in the guild, it would be that challenges are not roadblocks but opportunities for innovation and collaboration. The guild stands as a testament to the fact that, even in the world of test automation, where distances and time zones pose challenges, excellence can be achieved through collective strength. Automation guild is not just about crafting code; it’s about crafting a community that advances excellence in testing, collectively and collaboratively. The future, as envisioned through the chronicles, is one where Test Engineers, regardless of time zones, work seamlessly in a guild that stands as a beacon of innovation, knowledge-sharing, and collective growth.

By Priyanka Chauhan