DZone Research Report: A look at our developer audience, their tech stacks, and topics and tools they're exploring.
Getting Started With Large Language Models: A guide for both novices and seasoned practitioners to unlock the power of language models.
The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.
Requirements, Code, and Tests: How Venn Diagrams Can Explain It All
Five Best Data De-Identification Tools To Protect Patient Data and Stay Compliant
In this article, you will learn how to run Ansible Playbook from the Azure DevOps tool. By incorporating Ansible Playbooks into the Azure release pipeline, organizations can achieve streamlined and automated workflows, reducing manual intervention and minimizing the risk of errors. This enhances the efficiency of the release process, accelerates time-to-market, and ensures a standardized and reliable deployment of applications and infrastructure on the Azure platform. What Is an Ansible Playbook? An Ansible Playbook is a configuration management and automation tool to define and execute a series of tasks. It is particularly valuable in managing infrastructure as code, ensuring consistency and repeatability in the deployment and configuration of systems. In the context of Azure release pipelines, Ansible Playbooks play a crucial role in automating the deployment and configuration of resources within the Azure environment. They allow for the definition of tasks such as provisioning virtual machines, configuring networking, and installing software components. This tutorial assumes that the Ansible utility is installed and enabled for your Project in Azure DevOps. You can download and install the utility from this link, and get it enabled from your Azure DevOps Administrator. Related: Learn how to schedule pipelines in Azure DevOps. How to Run Ansible Playbook From Azure DevOps Step 1: Create New Release Pipeline Create a new release pipeline with an empty job. Step 2: Add Artifacts in Release Pipeline Job Next, add Azure DevOps in artifacts, as I am using the Azure repository to store our playbook and inventory file. I have already pushed the inventory file and tutorial.yml playbook in my Azure repo branch, ansible-tutorial. Select your project, repo, and branch to add artifacts in your release pipeline. YAML xxxxxxxxxx 1 1 #tutorial.yml 2 - hosts: "{{ host }" 3 tasks: 4 - name: create a test file for ansible 5 shell: touch /tmp/tutorail.yml Step 3: Upload and Configure Secure Key in Stage 1 for Ansible-Playbook Authentication Use the SSH key for authentication on my target machine. To pass the SSH key, I will upload it using the Download Secure file utility available. Download Secure Utility This is used for storing secure files in your release pipeline like SSH key, SSL certs, and CA certs. During execution, files are downloaded in a temp folder and their path can be accessed by calling the reference variable (shown below). These files are deleted as the release job is completed. Enter the reference name as shown below. To access the file, use the variable $(<reference name>.secureFilePath). Ex: $(pemKey.SecureFilePath) Step 4: Change File Permission We will add a shell command-line utility to change the file permission to 400 before using it in the playbook. I have used $(pemKey.secureFilePath) to access the SSH key. Step 5: Add and Configure the Ansible Task Add the Ansible task and enter the playbook path as shown below. For an inventory, the location selects the file and the file path as shown below. Use additional parameters to pass a variable to the Ansible playbook. Use additional parameters to pass variables and other command line parameters to the playbook at run time. To pass the path of the SSH key, I have used ansible_ssh_private_key_file=$(pemKey.secureFilePath). Also, you can use the variable ansible_ssh_common_args='-o StrictHostKeyChecking=no' to disable the host key checking in your Ansible playbook, if it's failing due to a host key verification error. Step 6: Save the Release Pipeline and Create a Release To Run the Playbook We can see our release completed successfully. Summary Ansible playbook ran successfully from Azure DevOps. If you want to use a username and password instead of an SSH key, you can pass the Linux creds using additional parameters using secrets variables so that the creds will be masked, or you can also use a shell command-line utility to set creds in an environment variable and Ansible will read from there.
I’m pretty sure that you’ve had a situation where you deployed a major UX change on your web app and missed the most obvious issues, like a misaligned button or distorted images. Unintended changes on your site can cause not only a sharp decline in user satisfaction but also a large fall in sales and customer retention. By identifying and resolving these discrepancies before the update went live, you could have prevented these outcomes. This is where visual regression testing comes in, helping you validate visual elements on your web app. Having a visual regression testing strategy prevents issues that could disrupt the user experience and cause users to bounce. In this article, we’re focusing on visual regression testing with PlayWright. We’ll break down the concept, go through the benefits of visual testing, and see how to implement it into your testing strategy. Finally, I’ll show you how to combine Checkly and PlayWright for effective visual regression testing. What Is Visual Regression Testing With PlayWright? Visual regression testing with PlayWright is the most reliable way to ensure your web app looks and behaves correctly. It does this by comparing two snapshots of an area of your choice and immediately detecting issues related to the layout, content, or design of your web app. When web apps are updated iteratively with new features or optimizations, there is a chance that unintentional visual changes will occur. If these changes go unnoticed, they can negatively affect the user experience, cause customer annoyance, or even lower engagement or sales. The main goal of visual regression testing is to provide predictable and consistent visual behavior over iterations, avoiding regressions and improving the application's overall quality. Visual Regression Testing Key Concepts To understand the concept of visual regression testing, let’s look at these key concepts related to it: Image comparison: To identify changes between older and more recent iterations of web pages, visual regression testing uses image comparison techniques. To draw attention to any visual changes or discrepancies, PlayWright takes screenshots and applies picture-diffing methods to them. Baseline establishment: Setting up a baseline entails establishing a point of reference for the visual design of the application's initial release. This baseline is used as a benchmark for comparison in later testing iterations. Automated workflows: With PlayWright, automation is essential to visual regression testing since it makes it possible to execute tests repeatedly and smoothly across a variety of devices and browsers. Version control system integration: By incorporating visual regression tests into version control systems, development teams can work together more easily, and traceability is guaranteed.Check how the feature compares snapshots here: Benefits of Visual Regression Testing Now, let’s look at the benefits and advantages of visual regression testing: Consistent UI/UX stability: Visual regression testing ensures a consistent and stable user interface and experience across different versions, guaranteeing reliability for end-users. Efficiency and cost-effectiveness: Automating visual regression tests using PlayWright saves time and resources by reducing the need for manual checks, leading to more efficient testing processes. Early identification of issues: Detecting visual defects early in the development cycle allows for swift issue resolution, minimizing the chances of releasing flawed features and enhancing overall software quality. Cross-platform compatibility: With PlayWright, visual regression testing verifies visual elements across multiple browsers and devices, ensuring uniformity and compatibility in diverse environments. Confidence in deployments: Regular visual regression testing instills confidence in software releases, decreasing the likelihood of unexpected visual regressions when rolling out updates or new features. What Does Visual Regression Testing Not Detect? Visual testing, while crucial for assessing the graphical interface and layout of web applications, does not cover certain aspects related to the functionality and underlying code. Some areas that visual testing does not encompass include: Functional testing: This testing verifies that the application operates in accordance with its specifications, ensuring that each feature, component, or interaction functions correctly and performs as intended, encompassing form submissions, user actions, data processing, and other essential operations. Security testing: Assessing security measures within the application is vital. Security testing identifies vulnerabilities, weaknesses, and potential threats, aiming to prevent data breaches, unauthorized access, or any form of security compromise. It involves examining authentication methods, data protection mechanisms, encryption, and defenses against diverse cyber threats. Accessibility testing: Ensuring the application's accessibility to all user groups, including individuals with disabilities, is crucial. Accessibility testing confirms compliance with accessibility standards like WCAG, focusing on features such as compatibility with screen readers, keyboard navigation, contrast adjustments, and other elements to ensure an inclusive user experience for diverse audiences. API and backend testing: API and backend testing involve evaluating the functionality and responses of the application's backend components, such as APIs, databases, and server-side operations. Unlike visual testing, which focuses on the front end, this testing requires distinct methodologies to directly interact with and assess the backend systems, ensuring their proper functioning and accuracy in handling data and operations. Getting Started With Visual Regression Testing Checkly natively supports PlayWright Test Runner for browser checks, so you can now use its visual snapshot testing feature with Checkly. These are two important assertions to get you started: .toHaveScreenshot() will help you visually compare a screenshot of your page to a golden image/reference snapshot .toMatchSnapshot() will compare any string or Buffer value to a golden image/reference snapshot Before getting started, make sure you’ve downloaded the newest version of Checkly: Checkly CLI v4.4.0 or later. Checkly Agent v3.2.0 or later. Step 1 Add expect(page).toHaveScreenshot() to your browser check script. Here’s an example: Step 2 Run your browser check. The first time you run it, you will get an error indicating that no golden image/reference snapshot exists yet. A snapshot doesn't exist at /tmp/19g67loplhq0j/script.spec.js-snapshots/Playwright-homepage-1-chromium-linux.png. Step 3 Generate a golden image snapshot by clicking the “Run script and update golden image” option in the “Run script” button. This step will generate a golden image. You can check your golden image in the “Golden Files” tab in the editor. You can now save your check, and on each check run, the golden image will be compared to the actual screenshot. When your check fails due to a visual difference, you will see the difference between the golden image and the actual screenshot in your check result. To find out how to configure visual regression testing with Checkly, please check our docs. Conclusion Mastering the intricacies of visual regression testing and API testing is paramount for delivering resilient and high-performing software applications. While visual testing ensures the visual integrity of your application, API testing guarantees the reliability of backend functionalities. To enhance your testing strategy and safeguard against regressions, consider combining PlayWright with other monitoring tools like Checkly. More Resources The official PlayWright guide on visual comparison and snapshot testing The .toHaveScreenshot() API reference The .toMatchSnapshot() API reference
Anticipated to shatter records and surpass an extraordinary USD 813 billion in revenues by 2027, the global software market is set to achieve unprecedented growth. This surge is propelled by the pivotal role software products play in enabling businesses to attain a competitive edge in the digital era. As organizations strive for excellence in their digital offerings, the imperative to elevate software quality has propelled technology assessments to new heights. In this blog, we will guide you through the forefront of the industry, unveiling the most prominent trends shaping the landscape of software testing. Certainly, let's explore the future of software testing by delving into the first trend on the list: 1. Scriptless Test Automation: Accelerating SDLC With Simplicity Automation testing has long been a cornerstone in expediting the Software Development Life Cycle (SDLC). In response to the evolving demands of modern software, a prominent trend has emerged — Scriptless Test Automation. This innovative approach addresses the need for enhanced scalability beyond traditional manual testing methods. Key Attributes Accessibility for non-coders: Scriptless automation eliminates the requirement for extensive coding skills. Testers and developers can execute manual run tests without delving into complex code scripting, making it a user-friendly approach. Shortcut to high-speed testing: Offering a shortcut to accelerate testing processes, scriptless automation significantly reduces the time and effort traditionally associated with creating and maintaining test scripts. 2. Robotic Process Automation (RPA): Transforming Testing Dynamics In the realm of software testing trends, Robotic Process Automation (RPA) stands out as a transformative force, integrating artificial intelligence (AI), cognitive computing, and the Internet of Things (IoT). This cutting-edge technology has not only redefined the industry landscape but is anticipated to become a specialized testing domain by 2024. Key Attributes AI, cognitive computing, and IoT integration: RPA leverages a synergy of artificial intelligence, cognitive computing capabilities, and IoT functionalities, creating a comprehensive testing approach. Market growth: Reports indicate substantial growth in the RPA market, with projections soaring from US$ 2,599.3 Million in 2022 to an estimated USD 24,633.4 Million by the year 2033. This growth signifies its increasing significance in various industries. 3. Combined Automation and Manual Testing: Striking a Balance for Holistic Testing In the intricate landscape of software testing, a trend has emerged that emphasizes the synergy of both Automation and Manual Testing. Recognizing the unique strengths and limitations of each approach, this trend advocates for a balanced testing strategy to achieve comprehensive software quality. Key Attributes Security and speed with automation: Automation testing, a burgeoning trend, is celebrated for enhancing security and expediting testing processes. Its strengths lie in repetitive tasks, regression testing, and rapid feedback. Manual testing for varied considerations: Acknowledging that certain aspects like accessibility, user interface intricacies, and architectural nuances cannot be fully addressed by automation alone, manual testing remains a crucial component of the testing process. 4. API and Service Test Automation: Navigating the Era of Microservices The prevalence of microservice architecture in application development has given rise to the prominence of API and Service Test Automation. As client-server platforms proliferate, the need for robust testing of APIs, which operate independently and collaboratively, becomes paramount. Key Attributes Microservices and API integration: Microservice architecture's popularity underscores the significance of APIs in web services automation. APIs enable seamless communication between independently functioning microservices. Interoperability and system integration: APIs not only facilitate communication between microservices but also enable seamless integration with other systems and applications, contributing to the development of a cohesive and interoperable system. 5. More Data, Better Data: Elevating Data Testing for Quality Assurance In the realm of software testing, the meticulous examination of data is integral to ensuring quality assurance. The trend of "More Data, Better Data" underscores the significance of systematic approaches to verify and process data. This includes assessing data accuracy, relevance, believability, repetition, and the sheer number of records. Modern software testing tools are now enhancing the practicality of data collection, paving the way for the emergence of more advanced data testing tools in the pursuit of high-quality products. Key Attributes Methodical data verification: Test teams employ systematic approaches to scrutinize data across multiple parameters, ensuring its accuracy, relevance, believability, and more. Technological advancements: New software testing tools are revolutionizing the collection and analysis of data, making data testing more feasible and comprehensive. 6. Performance Testing: Ensuring Excellence in Software Functionality Performance testing emerges as a pivotal trend, solidifying its place as a crucial component in achieving optimal results for applications. Developers recognize the imperative of creating test scripts not only to safeguard the application but also to ensure its functionality and efficiency. Key Attributes Quality and efficiency: Performance testing is employed to enhance software quality, ensuring that applications operate seamlessly, meet performance benchmarks, and deliver a satisfying user experience. Cross-platform compatibility: Testing scripts are crafted to guarantee that applications function effectively across multiple operating systems and browsers, addressing the diverse technological landscape. 7. Testing Centers for Quality: A Global Shift Towards Excellence Quality Test Centers have emerged as a pivotal trend in global software testing. These centers play a vital role in fostering the creation of high-quality software applications and enhancing the entire application development phase. Going beyond conventional testing practices, these centers house competent QA teams that efficiently reduce the testing period while upholding the consistency, reliability, and effectiveness of the product. Additionally, a focus on robust test automation aligns software development with QA demands, ensuring a comprehensive and efficient Software Testing Life Cycle. Key Attributes Promotion of quality software: Quality Test Centers contribute to the creation of robust software applications, elevating the quality standards of the products in the long run. Competent QA teams: These centers boast skilled QA teams that optimize testing periods without compromising on the fundamental attributes of a product, such as consistency, reliability, and effectiveness. 8. Cyber Security and Risk Compliance: Safeguarding the Digital Realm The digital revolution brings unprecedented benefits but also introduces threats such as cyber threats and various forms of digital attacks. In response to the critical need for security testing, the trend of Cyber Security and Risk Compliance has emerged. This trend is pivotal in ensuring the security of products, networks, and systems against cyber threats and diverse risks. Key Attributes Security testing imperative: Digital dependency underscores the critical nature of security testing. Products, networks, and systems must undergo rigorous security checks to safeguard against cyber threats and ensure user safety. Standardization of security practices: Transaction processing and user safety are now standard requirements, leading to improved coding practices for secure software. 9. IoT (Internet of Things) Testing: Navigating the Proliferation of IoT Devices As the landscape of IoT development expands significantly, so does the need for robust testing. By 2028, the market revenue for IoT devices is projected to reach USD 282.25 billion. In anticipation of this growth, IoT Testing emerges as a core trend in software testing for 2024. This trend is designed to enable testers to comprehensively test and analyze risks associated with IoT instruments, with a focus on security, accessibility, software compatibility, data integrity, performance, and scalability. Key Attributes Addressing diverse risks: IoT Testing aims to address a spectrum of risks associated with IoT devices, ensuring their secure, efficient, and scalable integration into the digital landscape. Key focus areas: Testing in the realm of IoT will concentrate on crucial aspects such as security, ensuring data integrity, assessing performance, and evaluating scalability in the context of IoT devices. 11. QAOps: Bridging Development, Testing, and Operations In the holistic approach of QAOps, developers, testers, and operations teams converge to enhance collaboration and streamline processes. This practice incorporates continuous testing within the broader DevOps framework, improving Continuous Integration/Continuous Delivery (CI/CD) pipelines and fostering a seamless collaboration between testers, QA professionals, and developers. Key Attributes Collaboration across teams: QAOps emphasizes collaboration among developers, testers, and operations teams, breaking down silos and fostering a shared responsibility for software quality. Continuous testing integration: The integration of QA testing into the DevOps approach enhances CI/CD pipelines, ensuring a continuous and efficient testing process throughout the software development life cycle.
Phase 1: Establishing the Foundation In the dynamic realm of test automation, GitHub Copilot stands out as a transformative force, reshaping the approach of developers and Quality Engineers (QE) towards testing. As QA teams navigate the landscape of this AI-driven coding assistant, a comprehensive set of metrics has emerged, shedding light on productivity and efficiency. Join us on a journey through the top key metrics, unveiling their rationale, formulas, and real-time applications tailored specifically for Test Automation Developers. 1. Automation Test Coverage Metrics Test Coverage for Automated Scenarios Rationale: Robust test coverage is crucial for effective test suites, ensuring all relevant scenarios are addressed. Test Coverage = (Number of Automated Scenarios / Total Number of Scenarios) * 100 Usage in real-time scenarios: Provides insights into the effectiveness of test automation in scenario coverage. Cost savings: Higher automation test coverage reduces the need for manual testing, resulting in significant cost savings. 2. Framework Modularity Metrics Modularity Index Rationale: Modularity is key for maintainability and scalability. The Modularity Index assesses independence among different modules in your automation framework. Modularity Index = (Number of Independent Modules / Total Number of Modules) * 100 Usage in real-time scenarios: Evaluate modularity during framework development and maintenance phases for enhanced reusability. Cost savings: A higher modularity index reduces time and effort for maintaining and updating the automation framework. 3. Test Script Efficiency Metrics Script Execution Time Rationale: Script execution time impacts the feedback loop. A shorter execution time ensures quicker issue identification and faster development cycles. Script Execution Time = Total time taken to execute all test scripts Usage in real-time scenarios: Monitor script execution time during continuous integration for optimization. Cost savings: Reduced script execution time contributes to shorter build cycles, saving infrastructure costs. Test Script Success Rate Rationale: The success rate reflects the reliability of your automation suite. Test Script Success Rate = (Number of Successful Test Scripts / Total Number of Test Scripts) * 100 Usage in real-time scenarios: Continuously monitor the success rate to identify and rectify failing scripts promptly. Cost savings: Higher success rates reduce the need for manual intervention, saving both time and resources. 4. Assertion Effectiveness Assertion Success Rate Rationale: Assertions ensure correctness in test results. The assertion success rate measures the percentage of assertions passing successfully. Assertion Success Rate = (Number of Successful Assertions / Total Number of Assertions) * 100 - Number of Successful Script Executions: The count of test script executions that have produced the desired outcomes without encountering failures or errors. - Total Number of Script Executions: The overall count of test script executions, including both successful and unsuccessful executions. Usage in real-time scenarios: Regularly track this metric during test execution to ensure the reliability of your test results. Cost savings: Improved assertion effectiveness reduces false positives, minimizing debugging efforts and saving valuable time. 5. Parallel Execution Metrics Rationale: Parallel execution enhances test suite efficiency. Parallel Execution Utilization = (Time with Parallel Execution / Time without Parallel Execution) * 100 Real-time scenarios: Monitor parallel execution utilization during large test suites to optimize test execution times. Cost savings: Efficient use of parallel execution reduces overall testing time, leading to cost savings in infrastructure and resources. 6. Cross-Browser Testing Metrics Number of Supported Browsers Rationale: Cross-browser testing ensures compatibility across various browsers, a critical factor in user satisfaction. Cross Browser Test Success Rate = (Number of Successful Cross Browser Tests / Total Number of Cross Browser Tests) * 100 Usage in real-time scenarios: Regularly update and track the supported browsers to ensure coverage for the target audience. Cost savings: Identifying and fixing browser-specific issues in the testing phase prevents costly post-production bug fixes. Cross-Browser Test Success Rate Rationale: The success rate of tests across different browsers is vital for delivering a consistent user experience. Cross-Browser Test Success Rate = (Number of Successful Cross-Browser Tests / Total Number of Cross-Browser Tests) * 100 Usage in real-time scenarios: Regularly assess the success rate to catch potential issues with browser compatibility. Cost savings: Early detection of cross-browser issues reduces the time and resources spent on fixing them later in the development process. Conclusion In Phase 1, we've set the stage by exploring essential metrics such as test coverage, framework modularity, and script efficiency. GitHub Copilot's influence is unmistakable. But what's next? As we embark on Phase 2, expect insights into Test Script Efficiency Metrics. How does Copilot enhance script execution time and success rates? Stay tuned for more discoveries in Phase 2! The journey into GitHub Copilot's impact on test automation efficiency continues.
In the data-driven landscape of today, automation has become indispensable across industries, not just to maximize efficiency but, more importantly, to ensure quality. This holds true for the critical field of data engineering as well. As organizations gather and process astronomical volumes of data, manual testing is no longer feasible or reliable. Automated testing methodologies are now imperative to deliver speed, accuracy, and integrity. This comprehensive guide takes an in-depth look at automated testing in the data engineering domain. It covers the vital components of test automation, the diverse tools available, quantifiable benefits, real-world applications, and best practices to integrate automation seamlessly. The Pillars of Automated Testing Any holistic, automated testing framework rests on these key pillars: Structured Test Automation Environment This involves predefined guidelines, coding standards, best practices, and tools to enable automation. A robust framework optimizes maintainability and reuse while minimizing redundancy. Popular examples include Selenium, Robot Framework, and TestComplete. Data Validation Techniques These include methods to validate the correctness, accuracy, consistency, and completeness of data. These techniques are the crux of quality checks, from basic assertions to complex validation rule engines. Performance Testing This testing determines system behavior under real-world load conditions, identifying bottlenecks. Load testing, stress testing, endurance testing, and scalability testing are common performance tests. Integration With CI/CD Pipelines Incorporating automation into Continuous Integration and Continuous Delivery pipelines helps achieve accelerated release cycles without compromising quality. Automated Testing Tools Stack The test automation ecosystem offers open-source and licensed tools to cater to diverse needs: Load Testing Apache JMeter is an open-source tool for load and performance testing. Simulates heavy user loads to gauge system stability. API Testing Postman is a feature-rich tool for API testing with test automation capabilities. Web Application Testing Selenium is the leading open-source test automation tool specifically for web apps. Data Quality Testing Talend provides complete data health testing with profiling, validation, and quality checks. Data Pipeline Testing Great Expectations is specialized for testing data pipelines, data integrity, and transformations. dbt (Data Built Tool) enables data transformation testing in warehouses through analytics code. Why Is Automated Testing Indispensable? The overarching goal of automated testing is to deliver quality at speed. It empowers data teams with tangible benefits: Enhanced Accuracy Automated tests perform precisely as coded every single time, eliminating human error-prone manual testing. Rigorous test coverage leaves no scope for defects. Improved Efficiency Automated testing parallelizes testing to deliver exponentially faster test cycles, optimized resource utilization, and on-demand scalability. Risk Mitigation Automated unit tests, integration tests, and monitoring provide an early warning system for potential issues. This allows proactive resolution. Compliance Automated audit trails, alerts, and reports provide tangible visibility to demonstrate compliance with data regulations. Accelerated Release Cycles Integration with CI/CD pipelines enables reliable continuous delivery with automated quality gates, facilitating rapid iterations. As key representatives from leading data analytics firm Fivetran stated to Harvard Business Review, "Automated testing is crucial to enabling continuous delivery and ensuring velocity." Real-World Implementation Landscape Leading organizations across domains demonstrate the real-world impact of test automation: Fortune 500 retail giant Walmart automated over 100,000 test cases across its e-commerce platforms to bolster quality. Prominent healthcare provider Anthem decreased release cycles from six months to six days through test automation. Digital payments firm Stripe executes over 150,000 automated tests daily across its global data infrastructure to prevent defects. Ride sharing platform Uber credits its automated testing strategy for facilitating rapid geographic expansion while maintaining stability. Best Practices for Automation Success Gradual Adoption: Start small with critical areas before enterprise-wide automation to master efficiencies. Analytics-driven: Leverage intelligent analytics of test results for optimization opportunities. Integrated Process: Incorporate automation into product life cycles through DevOps collaboration. Continuous Updates: Actively maintain test ware as requirements evolve to prevent technical debt. Specialized Training: Invest in upskilling resources on tools and best practices for maximum ROI. Compliance Focus: Prioritize automated compliance reporting to satisfy data regulations like GDPR. As leading analysts highlight, test automation is now optional but fundamental to staying competitive. The State of Testing Report, 2021 finds that leading teams spend over 50% of testing cycles on automation. The data engineering sphere is no exception to this. The Road Ahead Automated testing unlocks unparalleled quality, speed, and risk reduction. While adoption has steadily increased, challenges remain in aspects like capabilities mapping, maintenance overhead, and integration complexities, especially with legacy systems. As forward-looking data engineering leaders double down on automation equipped with the right strategies and expertise, they are gearing up to dominate as champions of quality engineering.
Fuzzing, also known as fuzz testing, is an automated software testing technique that involves providing invalid, unexpected, or random data (fuzz) as inputs to a computer program. The goal is to find coding errors, bugs, security vulnerabilities, and loopholes that can be exploited. This article starts by explaining some basic types of fuzzing. The "testing the lock" metaphor is then used to explain the nuts and bolts of this technique. A list of available tools is given and a set of best practices are explored for fuzzing to be conducted ethically, effectively, and safely. Types of Fuzzing Fuzzing, as a versatile software testing technique, can be categorized into several types based on the methodology and the level of knowledge about the software being tested. Each type of fuzzing has its unique approach and is suitable for different testing scenarios. 1. Black-Box Fuzzing Definition: Black-box fuzzing is performed without any knowledge of the internal structures or implementation details of the software being tested. Testers treat the software as a black box that receives inputs and generates outputs. Approach: It involves generating random inputs or using predefined datasets to test the software. The main goal is to observe how the software behaves with unexpected or malformed inputs. Use cases: Black-box fuzzing is often used in situations where source code access is unavailable, like with proprietary or third-party applications. It's also commonly used in web application testing. 2. White-Box Fuzzing Definition: White-box fuzzing requires a thorough understanding of the program’s source code. Testers use this knowledge to create more sophisticated and targeted test cases. Approach: It often involves static code analysis to understand the program flow and identify potential areas of vulnerability. Inputs are then crafted to specifically target these areas. Use cases: White-box fuzzing is ideal for in-depth testing of specific components, especially where the source code is available. It's widely used in development environments and for security audits. 3. Grey-Box Fuzzing Definition: Grey-box fuzzing is a hybrid approach that sits between black-box and white-box fuzzing. It involves having some knowledge of the internal workings of the software, but not as detailed as in white-box fuzzing. Approach: This type of fuzzing might use instrumented binaries or partial access to source code. Testers typically have enough information to create more meaningful test cases than in black-box fuzzing but don’t require the comprehensive understanding necessary for white-box fuzzing. Use cases: Grey-box fuzzing is particularly effective in integration testing and for security testing of complex applications where partial code access is available. 4. Mutation-Based Fuzzing Definition: Mutation-based fuzzing involves modifying existing data inputs to create new test cases. It starts with a set of pre-existing input data, known as seed inputs, and then applies various mutations to generate new test inputs. Approach: Common mutations include flipping bits, changing byte values, or rearranging data sequences. This method relies on the quality and variety of the seed inputs. Use cases: It is widely used when there is already a comprehensive set of valid inputs available. This approach is effective in finding deviations in software behavior when subjected to slightly altered valid inputs. 5. Generation-Based Fuzzing Definition: Generation-based fuzzing creates test inputs from scratch based on models or specifications of valid input formats. Approach: Testers use knowledge about the input format (like protocol specifications, file formats, or API contracts) to generate inputs that conform to, or intentionally deviate from, these specifications. Use cases: This approach is particularly useful for testing systems with well-defined input formats, such as compilers, interpreters, or protocol implementations. Each fuzzing type has its specific applications and strengths. The choice of fuzzing method depends on factors like the availability of source code, the depth of testing required, and the nature of the software being tested. In practice, combining different fuzzing techniques can yield the most comprehensive results, covering a wide range of potential vulnerabilities and failure scenarios. Understanding Fuzzing: Testing the Lock Imagine you're testing the durability and quality of a lock - a device designed with specific rules and mechanisms, much like software code. Fuzzing, in this metaphor, is like trying to unlock it with a vast array of keys that you randomly generate or alter in various ways. These keys are not crafted with the intention of fitting the lock perfectly; instead, they're meant to test how the lock reacts to unexpected or incorrect inputs. The Process of Fuzzing: Key Generation and Testing Random key creation (black-box fuzzing): Here, you're blindly crafting keys without any knowledge of the lock's internal mechanisms. This approach is akin to black-box fuzzing, where you test software by throwing random data at it to see how it reacts. You're not concerned with the specifics of the lock's design; you're more interested in whether any odd key shape or size could cause an unexpected reaction, like getting stuck or, paradoxically, turning the lock. Crafted key design (white-box fuzzing): In this scenario, you have a blueprint of the lock. With this knowledge, you create keys that are specifically designed to test the lock's weaknesses or limits. This is similar to white-box fuzzing in software testing, where you use your understanding of the software’s code to create highly targeted test inputs. Combination of both (grey-box fuzzing): Here, you have some knowledge about the lock, perhaps its brand or the type of keys it usually accepts. You use this information to guide your random key generation process. This is akin to grey-box fuzzing, which uses some knowledge of the software to create more effective test cases than random testing but doesn’t require as detailed an understanding as white-box fuzzing. Fuzzing Tools Available There are several well-known fuzzing tools available, each designed for different types of fuzzing and targeting various kinds of software vulnerabilities. 1. American Fuzzy Lop (AFL) Type: Grey-box fuzzer Description: AFL is one of the most popular fuzzers and is known for its efficiency. It uses genetic algorithms to automatically discover new test cases. AFL is particularly good at finding memory corruption bugs and is used widely in security and software development communities. 2. LibFuzzer Type: White-box fuzzer Description: Part of the LLVM project, LibFuzzer is a library for in-process, coverage-guided evolutionary fuzzing of other libraries. It is particularly effective for testing code that can be isolated into a library. 3. OSS-Fuzz Type: Continuous fuzzing as a service Description: OSS-Fuzz is a free service provided by Google to open-source projects. It integrates with other fuzzing tools like AFL and LibFuzzer to continuously test target software and report back any bugs found. 4. Peach Fuzzer Type: Generation-based fuzzer Description: Peach is a framework for performing fuzz testing on network protocols, file formats, and APIs. It is highly customizable and allows testers to define their own data models for generating test inputs. 5. Fuzzilli Type: Grey-box fuzzer Description: Fuzzilli is a JavaScript engine fuzzer focused on finding bugs in JavaScript engines like V8 (Chrome, Node.js) and JavaScriptCore (Safari). It uses a unique approach of generating and mutating JavaScript programs. 6. Boofuzz Type: Network protocol fuzzer Description: Boofuzz is a fork of the Sulley Fuzzing Framework and is an easy-to-use tool for network protocol fuzzing. It allows testers to define custom network protocol specifications for testing. 7. Radamsa Type: Mutation-based fuzzer Description: Radamsa is a general-purpose fuzzer capable of generating a wide range of mutation-based test inputs. It is particularly useful for testing software that processes complex inputs like texts, binaries, or structured data. 8. Burp Suite Intruder Type: Mostly black-box fuzzer Description: Part of the Burp Suite set of tools, the Intruder module is used for web application fuzzing. It is excellent for testing web applications by automating customized attacks against web parameters. 9. Jazzer Type: White-box fuzzer Description: Jazzer enables developers to find bugs in Java applications using LibFuzzer. It’s particularly suited for projects that use Java or JVM-based languages. Best Practices Fuzzing requires careful planning and execution to ensure it's both effective and responsible. Below are some best practices to consider. 1. Ethical Considerations Responsible testing: Always obtain permission before conducting fuzz tests on systems you don't own. Unauthorized testing, even with good intentions, can be illegal and unethical. Data sensitivity: Be cautious when fuzzing applications that handle sensitive data. Ensure that testing doesn't compromise data privacy or integrity. Avoid disruptive testing on live systems: If you're testing live systems, plan your tests to minimize disruption. Fuzzing can cause systems to crash or become unresponsive, which can be problematic for production environments. Inform stakeholders: Ensure that all relevant stakeholders are aware of the testing and its potential impacts. This includes system administrators, security teams, and the user base. Legal compliance: Adhere to relevant laws and regulations, especially those relating to cybersecurity and data protection. 2. Comprehensive Coverage Diverse techniques: Employ various fuzzing techniques (black-box, white-box, grey-box, etc.) to cover different attack vectors and scenarios. Test across different layers: Fuzz not just the application layer, but also the network, data storage, and APIs if applicable. This ensures a thorough evaluation of the system’s resilience. Input variety: Use a wide range of input data, including unexpected and malformed data, to test how the system handles different scenarios. Automate where possible: Automation can help in generating a high volume of diverse test cases, ensuring more comprehensive coverage. Iterative approach: Continually refine your fuzzing strategies based on previous test outcomes. This iterative approach helps in covering new areas and improving test effectiveness. 3. Continuous Monitoring Real-time monitoring: Implement monitoring tools to track the system's performance and behavior in real-time during fuzzing. This helps in promptly identifying issues like crashes, hangs, or performance degradation. Logging and documentation: Ensure that all fuzzing activities and observed anomalies are logged systematically. This documentation is crucial for debugging and for future reference. Resource utilization monitoring: Keep an eye on system resources (CPU, memory, disk usage, etc.) to detect potential resource leaks or performance bottlenecks. Alerting mechanisms: Set up alerting systems to notify relevant teams if critical issues or anomalies are detected during fuzzing. Follow-up analysis: After fuzzing, conduct a thorough analysis of the outcomes. Investigate the root causes of any failures and document the lessons learned. Adhering to these best practices helps fuzzing to be conducted ethically, effectively, and safely. It's about striking a balance between aggressively testing the software to uncover hidden vulnerabilities and doing so in a manner that is responsible and mindful of the potential impacts. Wrapping Up Just as testing a lock with a multitude of keys can reveal its strengths and weaknesses, fuzzing tests the robustness and security of software. It's a way to probe software with unexpected conditions, much like challenging a lock with an array of unconventional keys. This method helps uncover vulnerabilities that would otherwise remain hidden under standard testing procedures, ensuring that the software (like a good lock) only responds as intended under the right conditions.
Data breaches, system failures, bugs, and website defacement can seriously harm a company's reputation and profits. Typically, companies realize the importance of auditing their infrastructure, evaluating established interaction patterns, and assessing the business logic of their services only after developing security processes or facing urgent challenges. This maturity often stems from the necessity to ensure product or service security and to meet regulatory requirements. One effective method for conducting an information security audit is through penetration testing (pen test). Companies can either develop this expertise internally or choose a skilled and trustworthy contractor to perform the tests. The contractor would conduct thorough testing and provide detailed penetration reports, complete with recommendations for safeguarding corporate data. The latter option, hiring a skilled contractor for penetration testing, is more frequently chosen, particularly by small and medium-sized businesses (SMBs), as it offers considerable savings in both time and money. The service provider outlines all stages of the process, develops a pen testing strategy, and suggests ways to eliminate threats. This approach ensures transparency, with a defined scope for testing, clear results, and compliance with both regulatory and business requirements. What Is Penetration Testing? Penetration testing broadly involves evaluating the security of information systems by mimicking the tactics of an actual attacker. However, it is not just about finding vulnerabilities and security gaps. It also includes a thorough examination of the business logic behind services. This means manually analyzing financial transactions and the flow of goods, scrutinizing mobile applications, web forms, etc. Sometimes, it is not the security perimeter that poses the risk but rather the business logic itself. This can inadvertently provide opportunities for an attacker or fraudster with legitimate access to the company's systems to siphon off funds or cause harm in various other ways. Penetration Testing Methodologies Let's now explore the diverse methodologies of penetration testing: Black Box Method In a black box testing method, the tester has little to no prior knowledge about the target system. They may only have basic information like URLs, IP addresses, or a list of systems and services. This method is primarily used for auditing perimeter security and externally accessible web services, where the tester simulates an external attack with limited initial data. Gray Box Method Here, the tester has some knowledge about the system they are testing but lacks admin rights or detailed operational patterns. This methodology is often applied to audit open banking services, mobile applications, and internal infrastructure. The penetration tester operates with a regular user's credentials, requiring them to independently analyze the business logic, conduct reverse engineering, attempt to escalate their privileges, and potentially breach more secure segments like processing centers, databases, or payment services. White Box Method In the white box approach, the tester has complete knowledge of the system, including source code, architecture diagrams, and administrative privileges. This method goes beyond just demonstrating hacking skills; it is more about identifying existing defects in software products or business services, understanding the implications of improper product use, exploring potential action vectors that could lead to malfunctions, and pinpointing process shortcomings, such as inadequate controls or regulatory non-compliance. A unique aspect of pen tests involves social engineering, where testers try to trick company employees into revealing critical data, assessing their awareness of information security. This may include making QR codes as part of social engineering tactics to evaluate employee susceptibility to phishing. Alongside, advanced AI language tools and specialized essay writing services are employed to create convincing phishing messages, making them challenging for even security professionals to detect. Additionally, the contractor might provide services like controlled DDoS attacks (stress testing) or simulated spam attacks. How To Implement Penetration Tests Implementing penetration tests begins with defining specific objectives and the scope of the test, which includes determining the systems, networks, or apps to be examined. Depending on these objectives, a suitable testing methodology is chosen. The next step is selecting the testing team, which can either be an internal group or external experts. Once the testing starts, the team simulates various attacks to identify system vulnerabilities, covering potential weaknesses in software, hardware, and human factors. After the test, analyzing the results is critical to understanding the vulnerabilities and their potential impacts. A Non-Disclosure Agreement A non-disclosure agreement (NDA) is signed with the contractor during a penetration test to ensure confidentiality. In some cases, a contrasting agreement, known as a "disclosure agreement," is also executed. This agreement permits the legitimate disclosure of discovered bugs or zero-day vulnerabilities, allowing for transparent communication of critical security findings under specific conditions. Pen Test Frequency and Duration In terms of frequency, it is recommended to run penetration testing after every noticeable change in the infrastructure. How often these changes occur depends on your business processes. Usually, full-fledged pen tests are done every six months or once a year - but agile businesses should consider running continuous pen testing if they are deploying at a faster pace. The rest of the time, after each minor configuration change, you can use scanners. Scans are cheaper and reveal basic problems. On average, the pen test lasts a month, sometimes longer. If they last for several months, it is already red teaming. Bug Bounty One of the methods for carrying out a penetration test is through a bug bounty program. This approach offers extensive coverage as numerous specialists attempt to uncover vulnerabilities in the company's services and products. A key benefit of this method is that it is cost-free until a threat is identified. However, there are drawbacks. A specialist might only report a vulnerability to the extent needed to claim a reward without delving deeper into the analysis. Additionally, there is a risk of vulnerabilities being disclosed before the company can address them, or even specialists may sell the discovered vulnerabilities on the black market if the offered reward is deemed insufficient. Red Teaming For a large or rapidly expanding operation, you may wish to consider a Red Team Assessment. This approach stands out for its complexity, thoroughness, and element of surprise. In such assessments, your information security specialists are kept in the dark about when, where, and on which systems the test attacks will occur. They will not know which logs to monitor or what precisely to look out for, as the testing team will endeavor to conceal their activities, just as an actual attacker would. Why a Pen Test May Fail Potential downsides of a pen test can include too much interference from the client, restrictions on specific testing actions (as if to prevent damage), and limiting the scope to a very narrow range of systems for evaluation. It is crucial to understand that even the most diligent contractor might not uncover critical or high-level vulnerabilities. However, this does not necessarily mean they have underperformed. Often, it may be the customer who has set conditions for the pen test that make it extremely challenging, if not impossible, to identify any vulnerabilities. Penetration testing is, by nature, a creative process. When a customer restricts the scope of work or the tools available to the contractor, they may inadvertently hinder the effectiveness of the test. This can lead to receiving a report that does not accurately reflect the actual state of their security, wasting both time and money on the service. How Not To Run Pen Tests BAS, an automated system for testing and modeling attacks, along with vulnerability scanners, are tools some might consider sufficient for pen testing. However, this is not entirely accurate. Not all business services can be translated into a machine-readable format, and the verification of business logic has its limitations. Artificial intelligence, while helpful, still falls short of the intelligence and creativity of a human specialist. Therefore, while BAS and scanners are valuable for automating routine checks, they should be integrated as part of a comprehensive penetration testing process rather than being relied upon exclusively. Pen Testing Stages From the perspective of the attacking team, penetration testing typically involves these stages: Planning and reconnaissance: Define test scope and goals and gather intelligence on the target system or network to identify vulnerabilities. Scanning: Use static (code analysis) and dynamic (running code analysis) methods to understand the target reactions to intrusion attempts. Gaining access: Exploit vulnerabilities using attacks like SQL injection or cross-site scripting (XSS) to understand the potential damage. Maintaining access: Test if the vulnerability allows for prolonged system access, mimicking persistent threats that aim to steal sensitive data. Analysis: Compile findings into a report detailing exploited vulnerabilities, accessed data, undetected duration in the system, and security recommendations. How To Choose a Reliable Penetration Testing Provider When selecting a provider for penetration testing services, it is important to establish a level of trust with the contractor. Key factors to consider include: The contractor's overall experience and history in providing these services Achievements and awards received by specific individuals, teams, or projects within the contractor's organization; recent involvement in CREST is also a notable indicator Certifications held by the contractor's team members, as well as licenses for conducting such activities Customer testimonials and recommendations, which may also include anonymous feedback The contractor's expertise in particular audit areas, with examples of involvement in complex projects, such as those with high-tech companies or process control systems Considering the arrangement of small-scale test tasks, mainly if the contractor is relatively unknown in the market The availability of qualified penetration testing specialists is limited, so it is crucial to prioritize companies for whom pen testing is a primary service. These companies should have a dedicated team of qualified specialists and a separate project manager to oversee pen tests. Opting for a non-specialized company often leads to outsourcing, resulting in unpredictable outcomes. If you consistently use the same pen test provider over the years, especially if your infrastructure remains static or undergoes minimal changes, there is a risk that the contractor's specialists might become complacent or overlook certain aspects. To maintain a high level of scrutiny and fresh perspectives, it is advisable to periodically rotate between different contractors. Best Penetration Testing Services 1. BreachLock BreachLock's pen testing service offers human-verified results, DevOps fix guidance, robust client support, and a secure portal for retests. It also provides third-party security certification and thorough, compliance-ready reports. Benefits Human-verified results with in-depth fix guidance Retest-capable client portal, adding service value Delivers third-party security certification and detailed reports for compliance Strong client support during and post-testing Drawbacks Somewhat unclear documentation that requires expertise in the field Clients may prefer BreachLock for its blend of human and tech solutions and focus on detailed, compliance-ready reports. 2. SecureWorks SecureWorks' penetration testing service is recognized for its comprehensive offerings and high-quality services, which have earned it a strong reputation in the field. They offer personalized solutions and tailor their services to industry-specific standards. While the cost is on the higher side, it is justified by their in-depth expertise and the overall value provided. Benefits Comprehensive service offerings with strong expertise Services are well-tailored for large enterprises Focus on long-term regulatory compliance and personalized solutions Recognized for high-quality services and strong industry reputation Drawbacks More expensive compared to some lower-cost options Clients seeking depth in security expertise and comprehensive, enterprise-level service might find SecureWorks a preferable option, especially for long-term, strategic IT security planning for evolving infrastructure. 3. CrowdStrike CrowdStrike's penetration testing service offers testing of various IT environment components using real-world threat actor tools, derived from CrowdStrike Threat Intelligence. This approach aims to exploit vulnerabilities to assess the risk and impact on an organization. Benefits Utilizes real-world threat actor tools for effective vulnerability assessment Focuses on testing different IT environment components comprehensively Drawbacks Focus on larger enterprises Clients might prefer CrowdStrike for its use of advanced threat intelligence tools and comprehensive testing of diverse IT components, suitable for organizations seeking detailed risk and impact analysis. Conclusion Security analysts predict a rise in the demand for penetration testing services, driven by the rapid digitalization of business operations, and growth in telecommunications, online banking, social and government services. As new information technologies are adopted, businesses and institutions increasingly focus on identifying security vulnerabilities to prevent hacks and comply with regulatory requirements.
It’s one thing to build powerful machine-learning models and another thing to be able to make them useful. A big part of it is to be able to build applications to expose its features to end users. Popular examples include ChatGPT, Midjourney, etc. Streamlit is an open-source Python library that makes it easy to build web applications for machine learning and data science. It has a set of rich APIs for visual components, including several chat elements, making it quite convenient to build conversational agents or chatbots, especially when combined with LLMs (Large Language Models). And that’s the example for this blog post as well — a Streamlit-based chatbot deployed to a Kubernetes cluster on Amazon EKS. But that’s not all! We will use Streamlit with LangChain, which is a framework for developing applications powered by language models. The nice thing about LangChain is that it supports many platforms and LLMs, including Amazon Bedrock (which will be used for our application). A key part of chat applications is the ability to refer to historical conversation(s) — at least within a certain time frame (window). In LangChain, this is referred to as Memory. Just like LLMs, you can plug-in different systems to work as the memory component of a LangChain application. This includes Redis, which is a great choice for this use case since it’s a high-performance in-memory database with flexible data structures. Redis is already a preferred choice for real-time applications (including chat) combined with Pub/Sub and WebSocket. This application will use Amazon ElastiCache Serverless for Redis, an option that simplifies cache management and scales instantly. This was announced at re:Invent 2023, so let’s explore while it’s still fresh! To be honest, the application can be deployed on other compute options such as Amazon ECS, but I figured since it needs to invoke Amazon Bedrock, it’s a good opportunity to also cover how to use EKS Pod Identity (also announced at re:Invent 2023!!) GitHub repository for the app. Here is a simplified, high-level diagram: Let’s go!! Basic Setup Amazon Bedrock: Use the instructions in this blog post to set up and configure Amazon Bedrock. EKS cluster: Start by creating an EKS cluster. Point kubectl to the new cluster using aws eks update-kubeconfig --region <cluster_region> --name <cluster_name> Create an IAM role: Use the trust policy and IAM permissions from the application GitHub repository. EKS Pod Identity Agent configuration: Set up the EKS Pod Identity Agent and associate EKS Pod Identity with the IAM role you created. ElastiCache Serverless for Redis: Create a Serverless Redis cache. Make sure it shares the same subnets as the EKS cluster. Once the cluster creation is complete, update the ElastiCache security group to add an inbound rule (TCP port 6379) to allow the application on the EKS cluster to access the ElastiCache cluster. Push the Docker Image to ECR and Deploy the App to EKS Clone the GitHub repository: Shell git clone https://github.com/abhirockzz/streamlit-langchain-chatbot-bedrock-redis-memory cd streamlit-langchain-chatbot-bedrock-redis-memory Create an ECR repository: Shell export REPO_NAME=streamlit-chat export REGION=<AWS region> ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com aws ecr create-repository --repository-name $REPO_NAME Create the Docker image and push it to ECR: Shell docker build -t $REPO_NAME . docker tag $REPO_NAME:latest $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/$REPO_NAME:latest docker push $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/$REPO_NAME:latest Deploy Streamlit Chatbot to EKS Update the app.yaml file: Enter the ECR docker image info In the Redis connection string format, enter the Elasticache username and password along with the endpoint. Deploy the application: Shell kubectl apply -f app.yaml To check logs: kubectl logs -f -l=app=streamlit-chat Start a Conversation! To access the application: Shell kubectl port-forward deployment/streamlit-chat 8080:8501 Navigate to http://localhost:8080 using your browser and start chatting! The application uses the Anthropic Claude model on Amazon Bedrock as the LLM and Elasticache Serverless instance to persist the chat messages exchanged during a particular session. Behind the Scenes in ElastiCache Redis To better understand what’s going on, you can use redis-cli to access the Elasticache Redis instance from EC2 (or Cloud9) and introspect the data structure used by LangChain for storing chat history. keys * Don’t run keys * in a production Redis instance — this is just for demonstration purposes. You should see a key similar to this — "message_store:d5f8c546-71cd-4c26-bafb-73af13a764a5" (the name will differ in your case). Check it’s type: type message_store:d5f8c546-71cd-4c26-bafb-73af13a764a5 - you will notice that it's a Redis List. To check the list contents, use the LRANGE command: Shell LRANGE message_store:d5f8c546-71cd-4c26-bafb-73af13a764a5 0 10 You should see a similar output: Shell 1) "{\"type\": \"ai\", \"data\": {\"content\": \" Yes, your name is Abhishek.\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 2) "{\"type\": \"human\", \"data\": {\"content\": \"Thanks! But do you still remember my name?\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" 3) "{\"type\": \"ai\", \"data\": {\"content\": \" Cloud computing enables convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort.\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 4) "{\"type\": \"human\", \"data\": {\"content\": \"Tell me about Cloud computing in one line\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" 5) "{\"type\": \"ai\", \"data\": {\"content\": \" Nice to meet you, Abhishek!\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 6) "{\"type\": \"human\", \"data\": {\"content\": \"Nice, my name is Abhishek\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" 7) "{\"type\": \"ai\", \"data\": {\"content\": \" My name is Claude.\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 8) "{\"type\": \"human\", \"data\": {\"content\": \"Hi what's your name?\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" Basically, the Redis memory component for LangChain persists the messages as a List and passes its contents as additional context with every message. Conclusion To be completely honest, I am not a Python developer (I mostly use Go, Java, or sometimes Rust), but I found Streamlit relatively easy to start with, except for some of the session-related nuances. I figured out that for each conversation, the entire Streamlit app is executed (this was a little unexpected coming from a backend dev background). That’s when I moved the chat ID (kind of unique session ID for each conversation) to the Streamlit session state, and things worked. This is also used as part of the name of the Redis List that stores the conversation (message_store:<session_id>) — each Redis List is mapped to the Streamlist session. I also found the Streamlit component-based approach to be quite intuitive and pretty extensive as well. I was wondering if there are similar solutions in Go. If you know of something, do let me know. Happy building!
The landscape of software development is undergoing a paradigm shift with the emergence of Generative AI (GenAI) code generation tools. These tools leverage the power of machine learning to automatically generate code, potentially revolutionizing the way software is built. This white paper explores the potential of GenAI in coding, analyzing its impact on developer productivity, code quality, and overall software development workflows. Software development is a complex and time-consuming process, often plagued by bottlenecks and inefficiencies. Developers spend significant time on repetitive tasks like bug fixing, boilerplate code generation, and documentation. GenAI code generation tools offer a compelling solution by automating these tasks, freeing up developers to focus on higher-level problem-solving and innovation. How Does GenAI Code Generation Work? GenAI code generation tools are trained on massive datasets of existing codebases, learning the patterns and relationships between code elements. This allows them to statistically predict the most likely code sequence to complete a given task or fulfill a specific function. Users provide prompts or examples, and the tool generates code that aligns with the provided context. Benefits Increased Developer Productivity: GenAI can automate repetitive tasks, freeing developers to focus on more complex and creative aspects of software development. This can lead to significant time savings and increased output. Improved Code Quality: GenAI can generate code that adheres to best practices and coding standards, potentially reducing bugs and improving code maintainability. Enhanced Collaboration: GenAI can facilitate collaboration by generating code snippets that fulfill shared objectives, aiding team development and reducing communication overhead. Democratizing Software Development: GenAI has the potential to lower the barrier to entry for software development, making it more accessible to individuals with less technical expertise. Challenges and Considerations While GenAI holds immense promise, it's crucial to acknowledge potential challenges and considerations: Limited Creativity: GenAI models are inherently data-driven, potentially limiting their ability to generate truly innovative or groundbreaking code. Security Concerns: Malicious actors could potentially exploit GenAI tools to generate harmful code or automate cyberattacks. Ethical Implications: Biases present in training data could be reflected in generated code, leading to ethical concerns around fairness and discrimination. Job Displacement: Concerns exist around GenAI potentially displacing certain developer roles, necessitating workforce adaptation and reskilling initiatives. Major AI Code Generation Tools There are various code-generation tools are available, but the major ones are GitHub Copilot: A popular tool offering code completion and suggestions within various IDEs. This extension for popular IDEs has gained immense popularity due to its seamless integration and wide range of features, including code completion, generation, and translation. OpenAI Codex: A powerful code generation model with wide language support and the ability to translate languages and write different kinds of creative content. Google AI Codey: A suite of models for code generation, chat assistance, and code completion. This suite of models from Google AI, incorporating PaLM 2, offers code generation, code completion, and natural language assistance, particularly for data science and machine learning tasks. Tabnine: An AI-powered code completion tool with language-specific models and cross-language translation capabilities. Known for its speed and language-specific models, Tabnine provides accurate code completion, context-aware suggestions, and the ability to translate between programming languages. Ponicode: A tool focused on generating unit tests to ensure code quality. While specialized in generating unit tests for Python code, Ponicode's focus on ensuring code quality makes it a valuable tool for developers aiming to build robust and reliable software. *Data Collected from Google Bard. Estimated Usage Percentage These are estimations based on available data and industry insights. Actual usage figures might vary. User adoption within different programming languages and communities can differ significantly. Usage numbers don't necessarily reflect overall tool preference, as developers might use multiple tools interchangeably. The market remains dynamic, and these usage shares could change as new tools emerge and existing ones evolve. Tool Estimated Usage Share Notes GitHub Copilot 40-50% Largest market share due to IDE integration, active development, and wide user base. OpenAI Codex 20-30% Highly accurate and versatile, gaining traction within the developer community. Tabnine 15-20% Free-to-use option with strong performance, attracting a loyal user base. Google AI Codey (Beta) 5-10% Relatively new, focus on data science/ML tasks holds potential for growth. Ponicode <5% Specialized in unit testing for Python, niche user base but valuable for specific needs. Programming Languages Support All tools support a wide range of popular programming languages. OpenAI Codex offers the most versatility in terms of language support and translation. GitHub Copilot and Tabnine support a broad range of languages but might have limitations with less popular ones. Google AI Codey focuses on data science and ML-related languages. Ponicode is exclusively for Python but provides deep support for unit testing within that language. Tool Supported Languages Notes GitHub Copilot Python, JavaScript, TypeScript, Ruby, Java, Go, C++, C#, PHP, Dockerfile, Markdown, and more Expands support based on user community contributions. OpenAI Codex Python, JavaScript, Java, Go, C++, C#, shell scripting, SQL, HTML, CSS, and more Can translate between languages and learn new ones with additional training. Tabnine Python, JavaScript, Java, Go, C++, C#, PHP, Ruby, Rust, Swift, Kotlin, TypeScript, SQL, HTML, CSS, and more Offers language-specific models for improved accuracy. Google AI Codey (Beta) Python, JavaScript, Java, Go, C++, C#, shell scripting, SQL, and more Focuses on data science and machine learning tasks, supports languages relevant to data analysis. Ponicode Python (exclusively) Specializes in generating unit tests for Python code. IDE Support As development is going on all types of IDE’s and plugins are not developed for each code generator. GitHub Copilot offers seamless integration with popular IDEs. OpenAI Codex requires specific integration methods but allows customization. Tabnine supports the widest range of editors, promoting flexibility. Google AI Codey is currently limited to Google Cloud Tools and Colab. Ponicode integrates with major Python-focused IDEs. Tool Integrated Code Editors Additional Integration Methods GitHub Copilot Visual Studio Code, JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm, etc.), Neovim, Visual Studio 2022, Codespaces None OpenAI Codex GitHub Codespaces, JetBrains IDEs (via plugin), Custom integrations via API Web-based playground for testing Tabnine 20+ editors including Visual Studio Code, JetBrains IDEs, Vim, Emacs, Sublime Text, Atom, Spyder, Jupyter Notebook, VS Codespaces, and more Custom integrations via API Google AI Codey (Beta) Google Cloud Tools for VS Code, Google Colab Limited integration with other platforms Ponicode Visual Studio Code, PyCharm, and IntelliJ IDEA None Features Support Different tools support different features, as below. All tools offer code suggestion, function generation, and code completion. OpenAI Codex excels in code translation, explanation, and natural language to code capabilities. Google AI Codey focuses on data science and natural language to code. Ponicode uniquely specializes in unit test generation for Python. Other features like bug detection, code formatting, and refactoring are not widely available yet. Feature GitHub Copilot OpenAI Codex Tabnine Google AI Codey (Beta) Ponicode Code Suggestion Yes Yes Yes Yes Yes Function Generation Yes Yes Yes Yes Yes Code Translation No Yes No No No Code Explanation No Yes No Limited No Code Completion Yes Yes Yes Yes Yes Unit Test Generation No No No No Yes Bug Detection No Limited No No No Code Formatting No No No No No Code Refactoring No Limited No No No Data Science Code Limited Limited Limited Strong No Natural Language to Code Limited Strong Limited Strong No Cost Individual vs. organization pricing plans often offer different features and usage limits. Some tools require additional costs for integration with specific platforms or services. Free trial periods or limited free plans might be available for some tools. Always check the official website or documentation for the latest pricing information and available plans. Choosing the best cost option depends on your budget and usage needs, whether you are an individual developer or part of an organization, and the features and level of support you require. Tool Individual Organization Notes GitHub Copilot $10 USD/month, $100 USD/year Custom pricing available for organizations with 5+ users OpenAI Codex Pay-per-use via API calls and resources, or through integration costs (e.g., GitHub Codespaces) Custom pricing available for enterprise licenses Requires technical setup and management Tabnine Free Basic plan with limited features, Pro plan for $49 USD/year Custom pricing available for teams with additional features and management options Google AI Codey (Beta) Currently in Beta, pricing not yet finalized Likely tiered pricing models for individuals and organizations based on Google Cloud Tools usage Ponicode Free Community plan with limited features, Personal plan for $5 USD/month or $50 USD/year, Professional plan for $25 USD/month or $250 USD/year Custom pricing available for enterprise licenses with advanced features and integrations Ease Of Use GitHub Copilot and Tabnine generally offer the easiest setup and usage. OpenAI Codex provides more flexibility and power but requires more technical expertise. Google AI Codey's Beta status means its ease of use is still evolving. Ponicode's focus on unit testing for Python makes it easy to adopt for Python developers. Feature GitHub Copilot OpenAI Codex Tabnine Google AI Codey (Beta) Ponicode Learning Curve Easy Moderate Easy Moderate Easy Configuration Minimal High Minimal Moderate Minimal Integration Seamless with popular IDEs Varies (API, Codespaces, custom) Seamless with most editors Platform-specific (integrated with Google Cloud Tools) Integrates with major Python IDEs Interface User-friendly and intuitive Technical and complex Simple and minimal Unfamiliar (Beta) User-friendly and intuitive Customization Limited Extensive Minimal Moderate Limited Error Handling Forgiving Requires user intervention Forgiving Beta, error handling not fully tested Forgiving Best for Beginners Yes No Yes No Yes
When you start writing tests for your code, you'll likely have the feeling — bloody hell, how do I drag this thing into a test? There is code that tests clearly like and code they don't. Apart from checking the correctness of our code, tests also give us hints about how to write it. And it's a good idea to listen. A test executes your code in the simplest possible setting, independent of the larger system it's part of. But if the simplest possible setting is how it's run in our app, and it's impossible to tease out the individual pieces — that's a bad sign. If we're saying — nah, we don't need tests. All the code is already executed in the app — that's a sign that we've created a large slab that is hard to change and maintain. As Uncle Bob put it: "Another word for testable is decoupled." It's been said plenty of times that good architecture gets us good testability. Let's come at this idea from another angle: what hints do our tests give about our architecture? We've already talked about how tests help prevent creeping code rot —now, we'll explore this idea in a particular example. As a side note, we'll talk mostly about tests that developers write themselves — unit tests, the first line of defense. Our example is going to be a primitive Python script that checks the user's IP, determines their region, and tells the current weather in the region (the complete example is available here). We'll write tests for that code and see how it gets improved in the process. Each major step is in a separate branch. Step 1: A Quick and Dirty Version Our first version is bad and untestable. Python def local_weather(): # First, get the IP url = "https://api64.ipify.org?format=json" response = requests.get(url).json() ip_address = response["ip"] # Using the IP, determine the city url = f"https://ipinfo.io/{ip_address}/json" response = requests.get(url).json() city = response["city"] with open("secrets.json", "r", encoding="utf-8") as file: owm_api_key = json.load(file)["openweathermap.org"] # Hit up a weather service for weather in that city url = ( "https://api.openweathermap.org/data/2.5/weather?q={0}&" "units=metric&lang=ru&appid={1}" ).format(city, owm_api_key) weather_data = requests.get(url).json() temperature = weather_data["main"]["temp"] temperature_feels = weather_data["main"]["feels_like"] # If past measurements have already been taken, compare them to current results has_previous = False history = {} history_path = Path("history.json") if history_path.exists(): with open(history_path, "r", encoding="utf-8") as file: history = json.load(file) record = history.get(city) if record is not None: has_previous = True last_date = datetime.fromisoformat(record["when"]) last_temp = record["temp"] last_feels = record["feels"] diff = temperature - last_temp diff_feels = temperature_feels - last_feels # Write down the current result if enough time has passed now = datetime.now() if not has_previous or (now - last_date) > timedelta(hours=6): record = { "when": datetime.now().isoformat(), "temp": temperature, "feels": temperature_feels } history[city] = record with open(history_path, "w", encoding="utf-8") as file: json.dump(history, file) # Print the result msg = ( f"Temperature in {city}: {temperature:.0f} °C\n" f"Feels like {temperature_feels:.0f} °C" ) if has_previous: formatted_date = last_date.strftime("%c") msg += ( f"\nLast measurement taken on {formatted_date}\n" f"Difference since then: {diff:.0f} (feels {diff_feels:.0f})" ) print(msg) (source) Let's not get into why this is bad code; instead, let's ask ourselves: how would we test it? Well, right now, we can only write an E2E test: Python def test_local_weather(capsys: pytest.CaptureFixture): local_weather() assert re.match( ( r"^Temperature in .*: -?\d+ °C\n" r"Feels like -?\d+ °C\n" r"Last measurement taken on .*\n" r"Difference since then: -?\d+ \(feels -?\d+\)$" ), capsys.readouterr().out ) (source) This executes most of our code once — so far, so good. But testing is not just about achieving good line coverage. Instead of thinking about lines, it's better to think about behavior — what systems the code manipulates and what the use cases are. So here's what our code does: It calls some external services for data It does some read/write operations to store that data and retrieve previous measurements It generates a message based on the data It shows the message to the user But right now, we can't test any of those things separately because they are all stuffed into one function. In other words, it will be tough to test the different execution paths of our code. For instance, we might want to know what happens if the city provider returns nothing. Even if we've dealt with this case in our code (which we haven't), we'd need to test what happens when the value of city is None. Currently, doing that isn't easy. You could physically travel to a place that the service we use doesn't recognize - and, while fun, this is not a viable long-term testing strategy. You could use a mock. Python's requests-mock library lets you make it so that requests doesn't make an actual request but returns whatever you told it to return. While the second solution is less cumbersome than moving to a different city, it's still problematic because it messes with global states. For instance, we wouldn't be able to execute our tests in parallel (since each changes the behavior of the same requests module). If we want to make code more testable, we first need to break it down into separate functions according to area of responsibility (I/O, app logic, etc.). Step 2: Creating Separate Functions Our main job at this stage is to determine areas of responsibility. Does a piece of code implement the application logic or some form of IO - web, file, or console? Here's how we break it down: Python # IO logic: save history of measurements class DatetimeJSONEncoder(json.JSONEncoder): def default(self, o: Any) -> Any: if isinstance(o, datetime): return o.isoformat() elif is_dataclass(o): return asdict(o) return super().default(o) def get_my_ip() -> str: # IO: load IP from HTTP service url = "https://api64.ipify.org?format=json" response = requests.get(url).json() return response["ip"] def get_city_by_ip(ip_address: str) -> str: # IO: load city by IP from HTTP service url = f"https://ipinfo.io/{ip_address}/json" response = requests.get(url).json() return response["city"] def measure_temperature(city: str) -> Measurement: # IO: Load API key from file with open("secrets.json", "r", encoding="utf-8") as file: owm_api_key = json.load(file)["openweathermap.org"] # IO: load measurement from weather service url = ( "https://api.openweathermap.org/data/2.5/weather?q={0}&" "units=metric&lang=ru&appid={1}" ).format(city, owm_api_key) weather_data = requests.get(url).json() temperature = weather_data["main"]["temp"] temperature_feels = weather_data["main"]["feels_like"] return Measurement( city=city, when=datetime.now(), temp=temperature, feels=temperature_feels ) def load_history() -> History: # IO: load history from file history_path = Path("history.json") if history_path.exists(): with open(history_path, "r", encoding="utf-8") as file: history_by_city = json.load(file) return { city: HistoryCityEntry( when=datetime.fromisoformat(record["when"]), temp=record["temp"], feels=record["feels"] ) for city, record in history_by_city.items() } return {} def get_temp_diff(history: History, measurement: Measurement) -> TemperatureDiff|None: # App logic: calculate temperature difference entry = history.get(measurement.city) if entry is not None: return TemperatureDiff( when=entry.when, temp=measurement.temp - entry.temp, feels=measurement.feels - entry.feels ) def save_measurement(history: History, measurement: Measurement, diff: TemperatureDiff|None): # App logic: check if should save the measurement if diff is None or (measurement.when - diff.when) > timedelta(hours=6): # IO: save new measurement to file new_record = HistoryCityEntry( when=measurement.when, temp=measurement.temp, feels=measurement.feels ) history[measurement.city] = new_record history_path = Path("history.json") with open(history_path, "w", encoding="utf-8") as file: json.dump(history, file, cls=DatetimeJSONEncoder) def print_temperature(measurement: Measurement, diff: TemperatureDiff|None): # IO: format and print message to user msg = ( f"Temperature in {measurement.city}: {measurement.temp:.0f} °C\n" f"Feels like {measurement.feels:.0f} °C" ) if diff is not None: last_measurement_time = diff.when.strftime("%c") msg += ( f"\nLast measurement taken on {last_measurement_time}\n" f"Difference since then: {diff.temp:.0f} (feels {diff.feels:.0f})" ) print(msg) def local_weather(): # App logic (Use Case) ip_address = get_my_ip() # IO city = get_city_by_ip(ip_address) # IO measurement = measure_temperature(city) # IO history = load_history() # IO diff = get_temp_diff(history, measurement) # App save_measurement(history, measurement, diff) # App, IO print_temperature(measurement, diff) # IO (source) Notice that we now have a function that represents our use case, the specific scenario in which all the other functions are used: local_weather(). Importantly, this is also part of app logic; it specifies how everything else should work together. Also, note that we've introduced data classes to make return values of functions less messy: Measurement, HistoryCityEntry, and TemperatureDiff. They can be found in the new typings module. As a result of the changes, our code has become more cohesive — all stuff inside one function mostly relates to doing one thing. By the way, the principle we've applied here is called the Single-responsibility principle (the "S" from SOLID). Of course, there's still room for improvement — e.g., in measure_temperature(), we do both file IO (read a secret from disk) and web IO (send a request to a service). Let's recap: We wanted to have separate tests for things our code does; That got us thinking about the responsibilities for different areas of our code; By making each piece have just a single responsibility, we've made them testable So, let's write the tests now. Tests for Step 2 Python @pytest.mark.slow def test_city_of_known_ip(): assert get_city_by_ip("69.193.168.152") == "Astoria" @pytest.mark.fast def test_get_temp_diff_unknown_city(): assert get_temp_diff({}, Measurement( city="New York", when=datetime.now(), temp=10, feels=10 )) is None A couple of things to note here. Our app logic and console output execute an order of magnitude faster than IO, and since our functions are somewhat specialized now, we can differentiate between fast and slow tests. Here, we do it with custom pytest marks (pytest.mark.fast) defined in the project's config file). This is useful, but more on it later. Also, take a look at this test: Python @pytest.mark.fast def test_print_temperature_without_diff(capsys: pytest.CaptureFixture): print_temperature( Measurement( city="My City", when=datetime(2023, 1, 1), temp=21.4, feels=24.5, ), None ) assert re.match( ( r"^Temperature in .*: -?\d+ °C\n" r"Feels like -?\d+ °C$" ), capsys.readouterr().out ) Note that before, we'd have to drag the whole application along if we wanted to check print output, and manipulating output was very cumbersome. Now, we can pass the print_temperature() function whatever we like. Problem: Fragility Our tests for high-level functionality call details of implementations directly. For instance, the E2E test we've written in step 1 (`test_local_weather`) relies on the output being sent to the console. If that detail changes, the test breaks. This isn't a problem for a test written specifically for that detail (like test_print_temperature_without_diff(), which you can find here) — it makes sense we need to change it if the feature has changed. However, our E2E test wasn't written to test the print functionality, nor was it written specifically for testing the provider. But if the provider changes, the test breaks. We might also want to change the implementation of some functions — for instance, break down the measure_temperature() method into two to improve cohesion. A test calling that function would break. All in all, our tests are fragile. If we want to change our code, we also have to rewrite tests for that code - thus, the cost of change is higher. Problem: Dependence This is related to the previous problems. If our tests call the provider directly, then any problem on that provider's end means our tests will crash and won't test the things they were written for. If the IP service is down for a day, then your tests won't be able to execute any code that runs after determining the IP inside local_weather(), and you won't be able to do anything about it. If you've got a problem with an internet connection, none of the tests will run at all, even though the code might be fine. Problem: Can’t Test the Use Case On the one hand, our tests do run local_weather(), which is the use case. But on the other, they don't test that function specifically, they just execute everything there is in the application. This means it's difficult to read the results of such a test, and it will take you more time to understand where the failure is localized. Test results should be easy to read. Problem: Excessive Coverage One more problem is that with each test run, the web and persistence functions get called twice: by the E2E test from step 1 and by the more targeted tests from step 2. Excessive coverage isn't great — for one, the services we're using count our calls, so we better not make them willy-nilly. Also, if the project continues to grow, our test base will get too slow. All these problems are related, and to solve them, we need to write a test for our coordinating functions that doesn't invoke the rest of the code. To do that, we'd need test doubles that could substitute for real web services or writing to the disk. And we'd need a way to control what specific calls our functions make. Step 3: Decoupling Dependencies To achieve those things, we have to write functions that don't invoke stuff directly but instead call things passed to them from the outside —i.e., we inject dependencies. In our case, we'll pass functions as variables instead of specifying them directly when calling them. An example of how this is done is presented in step 3: Python def save_measurement( save_city: SaveCityFunction, # the IO call is now passed from the outside measurement: Measurement, diff: TemperatureDiff|None ): """ If enough time has passed since last measurement, save measurement. """ if diff is None or (measurement.when - diff.when) > timedelta(hours=6): new_record = HistoryCityEntry( when=measurement.when, temp=measurement.temp, feels=measurement.feels ) save_city(measurement.city, new_record) In step 2, the save_measurement() function contained both app logic (checking if we should perform the save operation) and IO (actually saving). Now, the IO part is injected. Because of this, we now have more cohesion: the function knows nothing about IO, and its sole responsibility is your app logic. Note that the injected part is an abstraction: we've created a separate type for it, SaveCityFunction, which can be implemented in multiple ways. Because of this, the code has less coupling. The function does not depend directly on an external function; instead, it relies on an abstraction that can be implemented in many different ways. This abstraction that we've injected into the function means we have inverted dependencies: the execution of high-level app logic no longer depends on particular low-level functions from other modules. Instead, both now only refer to abstractions. This approach has plenty of benefits: Reusability and changeability: We can change e. g. the function that provides the IP, and execution will look the same Resistance to code rot: Because the modules are less dependent on each other, changes are more localized, so growing code complexity doesn't impact the cost of change as much And, of course, testability Importantly, we did it all because we wanted to run our app logic in tests without executing the entire application. In fact, why don't we write these tests right now? Tests for Step 3 So far, we've applied the new approach to save_measurement() — so let's test it. Dependency injection allows us to write a test double that we're going to use instead of executing actual IO: Python @dataclass class __SaveSpy: calls: int = 0 last_city: str | None = None last_entry: HistoryCityEntry | None = None @pytest.fixture def save_spy(): spy = __SaveSpy() def __save(city, entry): spy.calls += 1 spy.last_city = city spy.last_entry = entry yield __save, spy This double is called a spy; it records any calls made to it, and we can check what it wrote afterward. Now, here's how we've tested save_measurement() with that spy: Python @pytest.fixture def measurement(): yield Measurement( city="New York", when=datetime(2023, 1, 2, 0, 0, 0), temp=8, feels=12, ) @allure.title("save_measurement should save if no previous measurements exist") def test_measurement_with_no_diff_saved(save_spy, measurement): save, spy = save_spy save_measurement(save, measurement, None) assert spy.calls == 1 assert spy.last_city == "New York" assert spy.last_entry == HistoryCityEntry( when=datetime(2023, 1, 2, 0, 0, 0), temp=8, feels=12, ) @allure.title("save_measurement should not save if a recent measurement exists") def test_measurement_with_recent_diff_not_saved(save_spy, measurement): save, spy = save_spy # Less than 6 hours have passed save_measurement(save, measurement, TemperatureDiff( when=datetime(2023, 1, 1, 20, 0, 0), temp=10, feels=10, )) assert not spy.calls @allure.title("save_measurement should save if enough time has passed since last measurement") def test_measurement_with_old_diff_saved(save_spy, measurement): save, spy = save_spy # More than 6 hours have passed save_measurement(save, measurement, TemperatureDiff( when=datetime(2023, 1, 1, 17, 0, 0), temp=-2, feels=2, )) assert spy.calls == 1 assert spy.last_city == "New York" assert spy.last_entry == HistoryCityEntry( when=datetime(2023, 1, 2, 0, 0, 0), temp=8, feels=12, ) (source) Note how much control we've got over save_measurement(). Before, if we wanted to test how it behaves with or without previous measurements, we'd have to manually delete the file with those measurements — yikes. Now, we can simply use a test double. There are plenty of other advantages to such tests, but to fully appreciate them, let's first achieve dependency inversion in our entire code base. Step 4: A Plugin Architecture At this point, our code is completely reborn. Here's the central module, app_logic.py: Python def get_temp_diff( last_measurement: HistoryCityEntry | None, new_measurement: Measurement ) -> TemperatureDiff|None: if last_measurement is not None: return TemperatureDiff( when=last_measurement.when, temp=new_measurement.temp - last_measurement.temp, feels=new_measurement.feels - last_measurement.feels ) def save_measurement( save_city: SaveCityFunction, measurement: Measurement, diff: TemperatureDiff|None ): if diff is None or (measurement.when - diff.when) > timedelta(hours=6): new_record = HistoryCityEntry( when=measurement.when, temp=measurement.temp, feels=measurement.feels ) save_city(measurement.city, new_record) # injected IO def local_weather( get_my_ip: GetIPFunction, get_city_by_ip: GetCityFunction, measure_temperature: MeasureTemperatureFunction, load_last_measurement: LoadCityFunction, save_city_measurement: SaveCityFunction, show_temperature: ShowTemperatureFunction, ): # App logic (Use Case) # Low-level dependencies are injected at runtime # Initialization logic is in __init__.py now # Can be tested with dummies, stubs and spies! ip_address = get_my_ip() # injected IO city = get_city_by_ip(ip_address) # injected IO if city is None: raise ValueError("Cannot determine the city") measurement = measure_temperature(city) # injected IO last_measurement = load_last_measurement(city) # injected IO diff = get_temp_diff(last_measurement, measurement) # App save_measurement(save_city_measurement, measurement, diff) # App (with injected IO) show_temperature(measurement, diff) # injected IO (source) Our code is like a Lego now. The functions are assembled when the app is initialized (in __init__.py), and the central module just executes them. As a result, none of the low-level code is referenced in the main module; it's all hidden away in sub-modules (console_io.py, file_io.py, and web_io.py). This is what dependency inversion looks like: the central module only works with abstractions. The specific functions are passed from elsewhere — in our case, the __init__.py module: Python def local_weather( get_my_ip=None, get_city_by_ip=None, measure_temperature=None, load_last_measurement=None, save_city_measurement=None, show_temperature=None, ): # Initialization logic default_load_last_measurement, default_save_city_measurement =\ file_io.initialize_history_io() return app_logic.local_weather( get_my_ip=get_my_ip or web_io.get_my_ip, get_city_by_ip=get_city_by_ip or web_io.get_city_by_ip, measure_temperature=measure_temperature or web_io.init_temperature_service( file_io.load_secret ), load_last_measurement=load_last_measurement or default_load_last_measurement, save_city_measurement=save_city_measurement or default_save_city_measurement, show_temperature=show_temperature or console_io.print_temperature, ) As a side note, initialization is here done with functions (file_io.initialize_history_io() and web_io.init_temperature_service()). We could just as easily have done the same with, say, a WeatherClient class and created an object of that class. It's just that the rest of the code was written in a more functional style, so we decided to keep to functions for consistency. To conclude, we've repeatedly applied dependency inversion through dependency injection on every level until the highest, where the functions are assembled. With this architecture, we've finally fully decoupled all the different areas of responsibility from each other. Now, every function truly does just one thing, and we can write granular tests for them all. Tests for Step 4 Here's the final version of our test base. There are three separate modules here: e2e_test.py : We only need one E2E test because our use case is really simple. We've written that test in step 1. plugin_test.py : Those are tests for low-level functions; useful to have but slow and fragile. We've written them in step 2. unit_test.py : That's where all the hot stuff has been happening. The last module became possible once we introduced dependency injection. There, we've used all kinds of doubles: dummies: They are simple placeholders stubs: They return a hard-coded value spies: We've talked about them earlier They allow us a very high level of control over high-level app logic functions. Before, they would only be executed with the rest of our application. Now, we can do something like this: Exercising Our Use Case Python @allure.title("local_weather should use the city that is passed to it") def test_temperature_of_current_city_is_requested(): def get_ip_stub(): return "1.2.3.4" def get_city_stub(*_): return "New York" captured_city = None def measure_temperature(city): nonlocal captured_city captured_city = city # Execution of local_weather will stop here raise ValueError() def dummy(*_): raise NotImplementedError() # We don't care about most of local_weather's execution, # so we can pass dummies that will never be called with pytest.raises(ValueError): local_weather( get_ip_stub, get_city_stub, measure_temperature, dummy, dummy, dummy ) assert captured_city == "New York" We're testing our use case (the local_weather() function), but we're interested in a very particular aspect of its execution — we want to ensure it uses the correct city. Most of the function is untouched. Here's another example of how much control we have now. As you might remember, our use case should only save a new measurement if more than 6 hours have passed since the last measurement. How do we test that particular piece of behavior? Before, we'd have to manually delete existing measurements - very clumsy. Now, we do this: Python @allure.title("Use case should save measurement if no previous entries exist") def test_new_measurement_is_saved(measurement, history_city_entry): # We don't care about this value: def get_ip_stub(): return "Not used" # Nor this: def get_city_stub(*_): return "Not used" # This is the thing we'll check for: def measure_temperature(*_): return measurement # With this, local_weather will think there is # no last measurement on disk: def last_measurement_stub(*_): return None captured_city = None captured_entry = None # This spy will see what local_weather tries to # write to disk: def save_measurement_spy(city, entry): nonlocal captured_city nonlocal captured_entry captured_city = city captured_entry = entry def show_temperature_stub(*_): pass local_weather( get_ip_stub, get_city_stub, measure_temperature, last_measurement_stub, save_measurement_spy, show_temperature_stub, ) assert captured_city == "New York" assert captured_entry == history_city_entry We can control the execution flow for local_weather() to make it think there's nothing on disk without actually reading anything. Of course, it's also possible to test for opposite behavior - again, without any IO (this is done in test_recent_measurement_is_not_saved()). These and other tests check all the steps of our use case, and with that, we've covered all possible execution paths. A Test Base With Low Coupling The test base we've built has immense advantages over what we had before. Execution Speed and Independence Because our code has low coupling and our tests are granular, we can separate the fast and slow tests. In pytest, if you've created the custom "fast" and "slow" marks as we've discussed above, you can run the fast ones with a console command: Shell pytest tests -m "fast" Alternatively, you could do a selective run from Allure Testops. Pytest custom marks are automatically converted into Allure tags, so just select tests by the "fast" tag and execute them. The fast tests are mainly from the unit_test.py module — it has the "fast" mark applied globally to the entire module. How can we be sure that everything there is fast? Because everything there is decoupled, you can unplug your computer from the internet, and everything will run just fine. Here's how quick the unit tests are compared to those that have to deal with external resources: We can easily run these fast tests every time we make a change, so if there is a bug, there's much less code to dig through since the last green test run. Longevity Another benefit to those quick tests we've just made is longevity. Unfortunately, throwing away unit tests is something you'll inevitably have to do. In our case, thanks to interface segregation and dependency injection, we're testing a small abstract plug point and not technical implementation details. Such tests are likely to survive longer. Taking the User’s Point of View Any test, no matter what level, forces you to look at your code from the outside and see how it might be taken out of the context where it was created to be used somewhere else. With low-level tests, you extract yourself from the local context, but you're still elbow-deep in code. However, if you're testing an API, you're taking the view of a user (even if that user is future you or another programmer). In an ideal world, a test always imitates a user. A public API that doesn't depend on implementation details is a contract. It's a promise to the user: here's a handle; it won't change (within reason). If tests are tied to this API (as our unit tests are), writing them makes you view your code through that contract. You get a better idea of how to structure your application. And if the API is clunky and uncomfortable to use, you'll see that, too. Looking Into the Future The fact that tests force you to consider different usage scenarios also means tests allow you to peek into the future of your code's evolution. Let's compare step 4 (inverted dependencies) with step 2 (where we've just hidden stuff into functions). The major improvement was decoupling, with its many benefits, including lower cost of change. But at first sight, step 2 is simpler, and in Python, simple is better than complex, right? Without tests, the benefits of dependency inversion in our code would only become apparent if we tried to add more stuff to the application. Why would they become apparent? Because we'd get more use cases and need to add other features. That would both expose us to code rot and make us think. The different use cases would make us see the cost of change. Well, writing tests forces you to consider other use cases here and now. This is why the structure we've used to benefit our tests turns out to be super convenient when we need to introduce other changes into code. To show that, let's try to change our city provider and output. Step 4 (Continued): Changing Without Modifying New City Provider First, we'll need to write a new function that will call our new provider: Python def get_city_by_ip(ip: str): """Get user's city based on their IP""" url = f"https://geolocation-db.com/json/{ip}?position=true" response = requests.get(url).json() return response["city"] (source) Then, we'll have to call local_weather() (our use case) with that new function in __main__.py: Python local_weather(get_city_by_ip=weather_geolocationdb.get_city_by_ip) Literally just one line. As much as possible, we should add new code, not rewrite what already exists. This is such a great design principle that it has its own name, the open/closed principle. We were led to it because we wanted a test to run our app logic independently of the low-level functions. As a result, our city provider has become a technical detail that can be exchanged at will. Also, remember that these changes don't affect anything in our test base. We can add a new test for the new provider, but the existing test base runs unchanged because we're adding, not modifying. New Output Now, let's change how we show users the weather. At first, we did it with a simple print() call. Then, to make app logic testable, we had to replace that with a function passed as a variable. It might seem like an unnecessary complication we've introduced just for the sake of testability. But what if we add a simple UI and display the message there? It's a primitive Tkinter UI; you can take a look at the code here, but the implementation doesn't matter much right now; it's just a technical detail. The important thing is: what do we have to change in our app logic? Literally nothing at all. Our app is just run from a different place (the Tkinter module) and with a new function for output: Python def local_weather(): weather.local_weather( show_temperature=show_temperature ) This is triggered by a Tkinter button. It's important to understand that how we launch our application's main logic is also a technical detail; it doesn't matter that it's at the top of the stack. From the point of view of architecture, it's periphery. So, again, we're extending, not modifying. Conclusion Testability and SOLID Alright, enough fiddling with the code. What have we learned? Throughout this example, we've been on a loop: We want to write tests But we can't because something is messy in the code So we rewrite it And get many more benefits than just testability We wanted to make our code easier to test, and the SOLID principles helped us here. In particular: Applying the single responsibility principle allowed us to run different behaviors separately, in isolation from the rest of the code. Dependency inversion allowed us to substitute expensive calls with doubles and get a lightning-fast test base. The open/closed principle was an outcome of the previous principles, and it meant that we could add new functionality without changing anything in our existing tests. It's no big surprise why people writing about SOLID mention testability so much: after all, the SOLID principles were formulated by the man who was a major figure in Test-Driven Development. But TDD is a somewhat controversial topic, and we won't get into it here. We aren't advocating for writing tests before writing code (though if you do — that's awesome). As long as you do write tests and listen to testers in your team — your code will be better. Beyond having fewer bugs, it will be better structured. Devs need to take part in quality assurance; if the QA department just quietly does its own thing, its efficiency is greatly reduced. As a matter of fact, it has been measured that: "Having automated tests primarily created and maintained either by QA or an outsourced party is not correlated with IT performance." Tests Give Hints on How Well-Structured Your Code Is Let's clarify causation here: we're not saying that testability is at the core of good design. It's the other way around; following engineering best practices has good testability as one of many benefits. What we're saying is writing tests gets you thinking about those practices. Tests make you look at your code from the outside. They make you ponder its changeability and reusability — which you then achieve through best engineering practices. Tests give you hints about improving the coupling and cohesion of your code. If you want to know how well-structured your code is, writing tests is a good test.
Justin Albano
Software Engineer,
IBM
Thomas Hansen
CTO,
AINIRO.IO
Soumyajit Basu
Senior Software QA Engineer,
Encora