DZone 幸运飞行艇官网开奖查询-官方开奖记录-5分钟飞艇官网开奖查询app-开奖历史查询结果 Spotlight

Thursday, February 29 View All Articles »

Continuous Integration and Continuous Delivery for Database Changes

By Ashoke Bhowmick

幸运飞行艇体彩彩票正规官方在线娱乐+免费开奖查询官网平台. DevOps proposes Continuous Integration and Continuous Delivery (CI/CD) solutions for software project management. In CI/CD, the process of software development and operations falls into a cyclical feedback loop, which promotes not only innovation and improvements but also makes the product quickly adapt to the changing needs of the market. So, it becomes easy to cater to the needs of the customer and garner their satisfaction. The development team that adopts the culture becomes agile, flexible, and adaptive (called the Agile development team) in building an incremental quality product that focuses on continuous improvement and innovation. One of the key areas of CI/CD is to address changes. The evolution of software also has its effect on the database as well. Database change management primarily focuses on this aspect and can be a real hurdle in collaboration with DevOps practices which is advocating automation for CI/CD pipelines. Automating database change management enables the development team to stay agile by keeping database schema up to date as part of the delivery and deployment process. It helps to keep track of changes critical for debugging production problems. The purpose of this article is to highlight how database change management is an important part of implementing Continuous Delivery and recommends some processes that help streamline application code and database changes into a single delivery pipeline. Continuous Integration One of the core principles of an Agile development process is Continuous Integration. Continuous Integration emphasizes making sure that code developed by multiple members of the team is always integrated. It avoids the “integration hell” that used to be so common during the days when developers worked in their silos and waited until everyone was done with their pieces of work before attempting to integrate them. Continuous Integration involves independent build machines, automated builds, and automated tests. It promotes test-driven development and the practice of making frequent atomic commits to the baseline or master branch or trunk of the version control system. Figure 1: A typical Continuous Integration process The diagram above illustrates a typical Continuous Integration process. As soon as a developer checks in code to the source control system, it will trigger a build job configured in the Continuous Integration (CI) server. The CI Job will check out code from the version control system, execute a build, run a suite of tests, and deploy the generated artifacts (e.g., a JAR file) to an artifact repository. There may be timed CI jobs to deploy the code to the development environment, push details out to a static analysis tool, run system tests on the deployed code, or any automated process that the team feels is useful to ensure that the health of the codebase is always maintained. It is the responsibility of the Agile team to make sure that if there is any failure in any of the above-mentioned automated processes, it is promptly addressed and no further commits are made to the codebase until the automated build is fixed. Continuous Delivery Continuous Delivery takes the concept of Continuous Integration a couple of steps further. In addition to making sure that different modules of a software system are always integrated, it also makes sure that the code is always deployable (to production). This means that in addition to having an automated build and a completely automated test suite, there should be an automated process of delivering the software to production. Using the automated process, it should be possible to deploy software on short notice, typically within minutes, with the click of a button. Continuous Delivery is one of the core principles of DevOps and offers many benefits including predictable deploys, reduced risk while introducing new features, shorter feedback cycles with the customer, and overall higher quality of software. Figure 2: A typical Continuous Delivery process The above diagram shows a typical Continuous Delivery process. Please note that the above-illustrated Continuous Delivery process assumes that a Continuous Integration process is already in place. The above diagram shows 2 environments: e.g., User Acceptance Test (UAT) and production. However, different organizations may have multiple staging environments (Quality Assurance or QA, load testing, pre-production, etc.) before the software makes it to the production environment. However, it is the same codebase, and more precisely, the same version of the codebase that gets deployed to different environments. Deployment to all staging environments and the production environment are performed through the same automated process. There are many tools available to manage configurations (as code) and make sure that deploys are automatic (usually self-service), controlled, repeatable, reliable, auditable, and reversible (can be rolled back). It is beyond the scope of this article to go over those DevOps tools, but the point here is to stress the fact that there must be an automated process to release software to production on demand. Database Change Management Is the Bottleneck Agile practices are pretty much mainstream nowadays when it comes to developing application code. However, we don’t see as much adoption of agile principles and continuous integration in the area of database development. Almost all enterprise applications have a database involved and thus project deliverables would involve some database-related work in addition to application code development. Therefore, slowness in the process of delivering database-related work - for example, a schema change - slows down the delivery of an entire release. In this article, we would assume the database to be a relational database management system. The processes would be very different if the database involved is a non-relational database like a columnar database, document database, or a database storing data in key-value pairs or graphs. Let me illustrate this scenario with a real example: here is this team that practices Agile software development methodologies. They follow a particular type of Agile called Scrum, and they have a 2-week Sprint. One of the stories in the current sprint is the inclusion of a new field in the document that they interchange with a downstream system. The development team estimated that the story is worth only 1 point when it comes to the development of the code. It only involves minor changes in the data access layer to save the additional field and retrieve it later when a business event occurs and causes the application to send out a document to a downstream system. However, it requires the addition of a new column to an existing table. Had there been no database changes involved, the story could have been easily completed in the current sprint, but since there is a database change involved, the development team doesn’t think it is doable in this sprint. Why? Because a schema change request needs to be sent to the Database Administrators (DBA). The DBAs will take some time to prioritize this change request and rank this against other change requests that they received from other application development teams. Once the DBAs make the changes in the development database, they will let the developers know and wait for their feedback before they promote the changes to the QA environment and other staging environments, if applicable. Developers will test changes in their code against the new schema. Finally, the development team will closely coordinate with the DBAs while scheduling delivery of application changes and database changes to production. Figure 3: Manual or semi-automated process in delivering database changes Please note in the diagram above that the process is not triggered by a developer checking in code and constitutes a handoff between two teams. Even if the deployment process on the database side is automated, it is not integrated with the delivery pipeline of application code. The changes in the application code are directly dependent on the database changes, and they together constitute a release that delivers a useful feature to the customer. Without one change, the other change is not only useless but could potentially cause regression. However, the lifecycle of both of these changes is completely independent of each other. The fact that the database and codebase changes follow independent life cycles and the fact that there are handoffs and manual checkpoints involved, the Continuous Delivery process, in this example, is broken. Recommendations To Fix CI/CD for Database Changes In the following sections, we will explain how this can be fixed and how database-related work including data modeling and schema changes, etc., can be brought under the ambit of the Continuous Delivery process. DBAs Should Be a Part of the Cross-Functional Agile Team Many organizations have their DBAs split into broadly two different types of roles based on whether they help to build a database for application development teams or maintain production databases. The primary responsibility of a production DBA is to ensure the availability of production databases. They monitor the database, take care of upgrades and patches, allocate storage, perform backup and recovery, etc. A development DBA, on the other hand, works closely with the application development team and helps them come up with data model design, converts a logical data model into a physical database schema, estimates storage requirements, etc. To bring database work and application development work into one single delivery pipeline, it is almost necessary that the development DBA be a part of the development team. Full-stack developers in the development team with good knowledge of the database may also wear the hat of a development DBA. Database as Code It is not feasible to have database changes and application code integrated into a single delivery pipeline unless database changes are treated the same way as application code. This necessitates scripting every change in the database and having them version-controlled. It should then be possible to stand up a new instance of the database automatically from the scripts on demand. If we had to capture database objects as code, we would first need to classify them and evaluate each one of those types to see if and how they need to be captured as script (code). Following is a broad classification of them: Database Structure This is basically the definition of how stored data will be structured in the database and is also known as a schema. These include table definitions, views, constraints, indexes, and types. The data dictionary may also be considered as a part of the database structure. Stored Code These are very similar to application code, except that they are stored in the database and are executed by the database engine. They include stored procedures, functions, packages, triggers, etc. Reference Data These are usually a set of permissible values that are referenced from other tables that store business data. Ideally, tables representing reference data have very few records and don’t change much over the life cycle of an application. They may change when some business process changes but don’t usually change during the normal course of business. Application Data or Business Data These are the data that the application records during the normal course of business. The main purpose of any database system is to store these data. The other three types of database objects exist only to support these data. Out of the above four types of database objects, the first three can be and should be captured as scripts and stored in a version control system. Type Example Scripted (Stored like code?) Database Structure Schema Objects like Tables, Views, Constraints, Indexes, etc. Yes Stored Code Triggers, Procedures, Functions, Packages, etc. Yes Reference Data Codes, Lookup Tables, Static data, etc. Yes Business/Application Data Data generated from day-to-day business operations No Table 1: Depicts what types of database objects can be scripted and what types can’t be scripted As shown in the table above, business data or application data are the only types that won’t be scripted or stored as code. All rollbacks, revisions, archival, etc., are handled by the database itself; however, there is one exception. When a schema change forces data migration - say, for example, populating a new column or moving data from a base table to a normalized table - that migration script should be treated as code and should follow the same life cycle as the schema change. Let's take an example of a very simple data model to illustrate how scripts may be stored as code. This model is so simple and so often used in examples, that it may be considered the “Hello, World!” of data modeling. Figure 4: Example model with tables containing business data and ones containing reference data In the model above, a customer may be associated with zero or more addresses, like billing address, shipping address, etc. The table AddressType stores the different types of addresses like billing, shipping, residential, work, etc. The data stored in AddressType can be considered reference data as they are not supposed to grow during day-to-day business operations. On the other hand, the other tables contain business data. As the business finds more and more customers, the other tables will continue to grow. Example Scripts: Tables: Constraints: Reference Data: We won’t get into any more details and cover each type of database object. The purpose of the examples is to illustrate that all database objects, except for business data, can be and should be captured in SQL scripts. Version Control Database Artifacts in the Same Repository as Application Code Keeping the database artifacts in the same repository of the version control system as the application code offers a lot of advantages. They can be tagged and released together since, in most cases, a change in database schema also involves a change in application code, and they together constitute a release. Having them together also reduces the possibility of application code and the database getting out of sync. Another advantage is just plain readability. It is easier for a new team member to come up to speed if everything related to a project is in a single place. Figure 5: Example structure of a Java Maven project containing database code The above screenshot shows how database scripts can be stored alongside application code. Our example is a Java application, structured as a Maven project. The concept is however agnostic of what technology is used to build the application. Even if it was a Ruby or a .NET application, we would store the database objects as scripts alongside application code to let CI/CD automation tools find them in one place and perform necessary operations on them like building the schema from scratch or generating migration scripts for a production deployment. Integrate Database Artifacts Into the Build Scripts It is important to include database scripts in the build process to ensure that database changes go hand in hand with application code in the same delivery pipeline. Database artifacts are usually SQL scripts of some form and all major build tools support executing SQL scripts either natively or via plugins. We won’t get into any specific build technology but will list down the tasks that the build would need to perform. Here we are talking about builds in local environments or CI servers. We will talk about builds in staging environments and production at a later stage. The typical tasks involved are: Drop Schema Create Schema Create Database Structure (or schema objects): They include tables, constraints, indexes, sequences, and synonyms. Deploy stored code, like procedures, functions, packages, etc. Load reference data Load TEST data If the build tool in question supports build phases, this will typically be in the phase before integration tests. This ensures that the database will be in a stable state with a known set of data loaded. There should be sufficient integration tests that will cause the build to fail if the application code goes out of sync with the data model. This ensures that the database is always integrated with the application code: the first step in achieving a Continuous Delivery model involving database change management. Figure 6: Screenshot of code snippet showing a Maven build for running database scripts The above screenshot illustrates the usage of a Maven plugin to run SQL scripts. It drops the schema, recreates it, and runs all the DDL scripts to create tables, constraints, indexes, sequences, and synonyms. Then it deploys all the stored code into the database and finally loads all reference data and test data. Refactor Data Model as Needed Agile methodology encourages evolutionary design over upfront design; however, many organizations that claim to be Agile shops, actually perform an upfront design when it comes to data modeling. There is a perception that schema changes are difficult to implement later in the game, and thus it is important to get it right the first time. If the recommendations made in the previous sections are made, like having an integrated team with developers and DBAs, scripting database changes, and version controlling them alongside application code, it won’t be difficult to automate all schema changes. Once the deployment and rollback of database changes are fully automated and there is a suite of automated tests in place, it should be easy to mitigate risks in refactoring schema. Avoid Shared Database Having a database schema shared by more than one application is a bad idea, but they still exist. There is even a mention of a “Shared Database” as an integration pattern in a famous book on enterprise integration patterns, Enterprise Integration Patterns by Gregor Holpe and Bobby Woolf. Any effort to bring application code and database changes under the same delivery pipeline won’t work unless the database truly belongs to the application and is not shared by other applications. However, this is not the only reason why a shared database should be avoided. "Shared Database" also causes tight coupling between applications and a multitude of other problems. Dedicated Schema for Every Committer and CI Server Developers should be able to work on their own sandboxes without the fear of breaking anything in a common environment like the development database instance; similarly, there should be a dedicated sandbox for the CI server as well. This follows the pattern of how application code is developed. A developer makes changes and runs the build locally, and if the build succeeds and all the tests pass, (s)he commits the changes. The sandbox could be either an independent database instance, typically installed locally on the developer’s machine, or it could be a different schema in a shared database instance. Figure 7: Developers make changes in their local environment and commit frequently As shown in the above diagram, each developer has their own copy of the schema. When a full build is performed, in addition to building the application, it also builds the database schema from scratch. It drops the schema, recreates it, and executes DDL scripts to load all schema objects like tables, views, sequences, constraints, and indexes. It creates objects representing stored code, like functions, procedures, packages, and triggers. Finally, it loads all the reference data and test data. Automated tests ensure that the application code and database object are always in sync. It must be noted that data model changes are less frequent than application code, so the build script should have the option to skip the database build for the sake of build performance. The CI build job should also be set up to have its own sandbox of the database. The build script performs a full build that includes building the application as well as building the database schema from scratch. It runs a suite of automated tests to ensure that the application itself and the database that it interacts with, are in sync. Figure 8: Revised CI process with integration of database build with build of application code Please note that the similarity of the process described in the above diagram with the one described in Figure 1. The build machine or the CI server contains a build job that is triggered by any commit to the repository. The build that it performs includes both the application build and the database build. The database scripts are now always integrated, just like application code. Dealing With Migrations The process described above would build the database schema objects, stored code, reference data, and test data from scratch. This is all good for continuous integration and local environments. This process won’t work for the production database and even QA or UAT environments. The real purpose of any database is storing business data, and every other database object exists only to support business data. Dropping schema and recreating it from scripts is not an option for a database currently running business transactions. In this case, there is a need for scripting deltas, i.e., the changes that will transition the database structure from a known state (a particular release of software) to a desired state. The transition will also include any data migration. Schema changes may lead to a requirement to migrate data as well. For example, as a result of normalization, data from one table may need to be migrated to one or more child tables. In such cases, a script that transforms data from the parent table to the children should also be a part of the migration scripts. Schema changes may be scripted and maintained in the source code repository so that they are part of the build. These scripts may be hand-coded during active development, but there are tools available to automate that process as well. One such tool is Flyway, which can generate migration scripts for the transition of one state of schema to another state. Figure 9: Automation of schema migrations and rollback In the above picture, the left-hand side shows the current state of the database which is in sync with the application release 1.0.1 (the previous release). The right-hand side shows the desired state of the database in the next release. We have the state on the left-hand side captured and tagged in the version control system. The right-hand side is also captured in the version control system as the baseline, master branch, or trunk. The difference between the right-hand side and the left-hand side is what needs to be applied to the database in the staging environments and the production environment. The differences may be manually tracked and scripted, which is laborious and error-prone. The above diagram illustrates that tools like Flyway can automate the creation of such differences in the form of migration scripts. The automated process will create the following: Migration script (to transition the database from the prior release to the new release) Rollback script (to transition the database back to the previous release). The generated scripts will be tagged and stored with other deploy artifacts. This automation process may be integrated with the Continuous Delivery process to ensure repeatable, reliable, and reversible (ability to rollback) database changes. Continuous Delivery With Database Changes Incorporated Into It Let us now put the pieces together. There is a Continuous Integration process already in place that rebuilds the database along with the application code. We have a process in place that generates migration scripts for the database. These generated migration scripts are a part of the deployment artifacts. The DevOps tools will use these released artifacts to build any of the staging environments or the production environment. The deployment artifacts will also contain rollback scripts to support self-service rollback. If anything goes wrong, the previous version of the application may then be redeployed and the database rollback script shall be run to transition the database schema to the previous state that is in sync with the previous release of the application code. Figure 10: Continuous Delivery incorporating database changes The above diagram depicts a Continuous Delivery process that has database change management incorporated into it. This assumes that a Continuous Integration process is already there in place. When a UAT (or any other staging environment like TEST, QA, etc.) deployment is initiated, the automated processes take care of creating a tag in the source control repository, building application deployable artifacts from the tagged codebase, generating database migration scripts, assembling the artifacts and deploying. The deployment process includes the deployment of the application as well as applying migration scripts to the database. The same artifacts will be used to deploy the application to the production environment, following the approval process. A rollback would involve redeploying the previous release of the application and running the database rollback script. Tools Available in the Market The previous sections primarily describe how to achieve CI/CD in a project that involves database changes by following some processes but don’t particularly take into consideration any tools that help in achieving them. The above recommendations are independent of any particular tool. A homegrown solution can be developed using common automation tools like Maven or Gradle for build automation, Jenkins or TravisCI for Continuous Integration, and Chef or Puppet for configuration management; however, there are many tools available in the marketplace, that specifically deal with Database DevOps. Those tools may also be taken advantage of. Some examples are: Datical Redgate Liquibase Flyway Conclusion Continuous Integration and Continuous Delivery processes offer tremendous benefits to organizations, like accelerated time to market, reliable releases, and overall higher-quality software. Database change management is traditionally cautious and slow. In many cases, database changes involve manual processes and often cause a bottleneck to the Continuous Delivery process. The processes and best practices mentioned in this article, along with available tools in the market, should hopefully eliminate this bottleneck and help to bring database changes into the same delivery pipeline as application code. More

Build a Flow Collectibles Portal Using Cadence (Part 2)

By John Vester

CORE

Welcome to the final step in creating your Collectibles portal! (for part 1, see here). In this part, we'll focus on building the front end — the last piece of the puzzle. Here's what we'll achieve: Connect the Flow Wallet. Initialize your account and mint your NFT. Check the NFT ID in your Collection. View the NFT with the NFT ID you have in your Collection. We will be using Next.js to build the front end. Let's get started! 1. Installation Setting Up Open your project flow-collectible-portal directory. Then, runnpx create-next-app@latest frontend in the terminal and press enter . This will provide you with several options. In this tutorial, we will not use Typescript, ESLint, or TailwindCSS, and we will use the src directory and the App router at the time of this article. Now you have a fresh web app ready. This is how your frontend folder looks: 2. Configuration To interact with the Flow blockchain, we will use the Flow Client Library (FCL) to manage wallet connections, run scripts, and send transactions in our application. It will allow us to write complete Cadence functions and run them as Javascript functions. To get started, let's install FCL for our app by running the following command: Shell npm install @onflow/fcl --save After installing FCL, we need to configure it. Here's what you need to do: Inside the app folder create a new folder named flow and add a file named config.js. In this file, set up the configuration for the FCL, such as specifying the Access Node and wallet discovery endpoint. This helps you to choose between using a testnet or a local emulator. You will also want to specify the Collectibles contract address we deployed in Part 1. Add the following code to the config.js file: JavaScript import { config } from "@onflow/fcl"; config({ "app.detail.title": "Flow Name Service", "app.detail.icon": "https://placekitten.com/g/200/200", "accessNode.api": "https://rest-testnet.onflow.org", "discovery.wallet": "https://fcl-discovery.onflow.org/testnet/authn", "0xCollectibles": "ADD YOUR CONTRACT ACCOUNT ADDRESS", "0xNonFungibleToken": "0x631e88ae7f1d7c20", }); Now, you're all set up to use the FCL in your app. 3. Authentication To verify a user's identity in an app, you can use several functions: For logging in, call fcl.logIn(). For signing up, call fcl.signUp(). For logging out, call fcl.unauthenticate(). Let’s learn how we can implement these fcl functions in your front end. First, we will add the following code to our page.js file inside the app directory. This will import some dependencies, set up some initial useState for parts of our app, and build a basic UI. To make sure it looks nice, delete the page.module.css file inside the app directory and instead make a file called page.css. Then, paste the contents of this file inside of it. Now, we can write out our initial page. JavaScript "use client"; import React, { useState, useEffect, useRef } from "react"; import * as fcl from "@onflow/fcl"; import "./page.css"; import "./flow/config"; export default function Page() { const [currentUser, setCurrentUser] = useState({ loggedIn: false, addr: undefined, }); const urlInputRef = useRef(); const nameInputRef = useRef(); const idInputRef = useRef(); const [isInitialized, setIsInitialized] = useState(); const [collectiblesList, setCollectiblesList] = useState([]); const [loading, setLoading] = useState(false); const [ids, setIds] = useState([]); const [nft, setNFT] = useState({}); useEffect(() => fcl.currentUser.subscribe(setCurrentUser), []); function handleInputChange(event) { const inputValue = event.target.value; if (/^\d+$/.test(inputValue)) { idInputRef.current = +inputValue; } else { console.error("Invalid input. Please enter a valid integer."); } } return ( <div> <div className="navbar"> <h1>Flow Collectibles Portal</h1> <span>Address: {currentUser?.addr ?? "NO Address"}</span> <button onClick={currentUser.addr ? fcl.unauthenticate : fcl.logIn}> {currentUser.addr ? "Log Out" : "Connect Wallet"} </button> </div> {currentUser.loggedIn ? ( <div className="main"> <div className="mutate"> <h1>Mutate Flow Blockchain</h1> <form onSubmit={(event) => { event.preventDefault(); } > <input type="text" placeholder="enter name of the NFT" ref={nameInputRef} /> <input type="text" placeholder="enter a url" ref={urlInputRef} /> <button type="submit">Mint</button> </form> <mark>Your Collection will be initialized while minting NFT.</mark> </div> <div className="query"> <h1>Query Flow Blockchain</h1> <mark>Click below button to check</mark> <button>Check Collection</button> <p> Is your collection initialized: {isInitialized ? "Yes" : "No"} </p> <button onClick={viewIds}> View NFT IDs you hold in your collection </button> <p>NFT Id: </p> </div> <div className="view"> <h1>View Your NFT</h1> <input type="text" placeholder="enter your NFT ID" onChange={handleInputChange} /> <button>View NFT</button> <div className="nft-card"> <p>NFT id: </p> <p>NFT name: </p> <img src="" alt="" /> </div> </div> </div> ) : ( <div className="main-2"> <h1>Connect Wallet to mint NFT!!</h1> </div> )} </div> ); } After adding this code, run npm run dev to make sure everything loads correctly. 4. Querying the Flow Blockchain Before getting a deep dive into how we can use fcl to query the Flow blockchain, add these Cadence script codes after the handleInput function in the page.js file. JavaScript const CHECK_COLLECTION = ` import NonFungibleToken from 0xNonFungibleToken import Collectibles from 0xCollectibles pub fun main(address: Address): Bool? { return Collectibles.checkCollection(_addr: address) }` const GET_NFT_ID = ` import NonFungibleToken from 0xNonFungibleToken import Collectibles from 0xCollectibles pub fun main(user: Address): [UInt64] { let collectionCap = getAccount(user).capabilities.get <&{Collectibles.CollectionPublic}>(/public/NFTCollection) ?? panic("This public capability does not exist.") let collectionRef = collectionCap.borrow()! return collectionRef.getIDs() } ` const GET_NFT = ` import NonFungibleToken from 0xNonFungibleToken import Collectibles from 0xCollectibles pub fun main(user: Address, id: UInt64): &NonFungibleToken.NFT? { let collectionCap= getAccount(user).capabilities.get<&{Collectibles.CollectionPublic}>(/public/NFTCollection) ?? panic("This public capability does not exist.") let collectionRef = collectionCap.borrow()! return collectionRef.borrowNFT(id: id) } With our Cadence scripts ready to go, we can now declare some JavaScript functions and pass in the Cadence constants into the `fcl` queries. JavaScript async function checkCollectionInit() { const isInit = await fcl.query({ cadence: CHECK_COLLECTION, args: (arg,t) => [arg(currentUser?.addr, t.Address)], }); console.log(isInit); } async function viewNFT() { console.log(idInputRef.current); const nfts = await fcl.query({ cadence: GET_NFT, args: (arg,t) => [arg(currentUser?.addr,t.Address), arg(idInputRef.current, t.UInt64)] }); setNFT(nfts); console.log(nfts); } async function viewIds() { const ids = await fcl.query({ cadence: GET_NFT_ID, args: (arg,t) => [arg(currentUser?.addr,t.Address)] }); setIds(ids); console.log(ids); } Now, let’s take a look at all the functions we’ve written. There are two things to notice: The fcl.query And the args: (arg,t) => [arg(addr,t.Address)], line. Since scripts are similar to view functions in Solidity and don't require any gas fees to run, we are essentially just querying the blockchain. So, we use fcl.query to run scripts on Flow. In order to query something, we need to pass an argument. For that, we use arg, which is a function that takes a string value representing the argument and t, which is an object that contains all the different data types that Cadence has. So we can tell arg how to encode and decode the argument we are passing. 5. Mutating the Flow Blockchain While our previous functions were just “read-only,” our next ones will have actions that can mutate the blockchain state and write to it, a.k.a. “mint an NFT.” To do this, we’ll write another Cadence script as a constant. JavaScript const MINT_NFT = ` import NonFungibleToken from 0xNonFungibleToken import Collectibles from 0xCollectibles transaction(name:String, image:String){ let receiverCollectionRef: &{NonFungibleToken.CollectionPublic} prepare(signer:AuthAccount){ // initialise account if signer.borrow<&Collectibles.Collection>(from: Collectibles.CollectionStoragePath) == nil { let collection <- Collectibles.createEmptyCollection() signer.save(<-collection, to: Collectibles.CollectionStoragePath) let cap = signer.capabilities.storage.issue<&{Collectibles.CollectionPublic}>(Collectibles.CollectionStoragePath) signer.capabilities.publish( cap, at: Collectibles.CollectionPublicPath) } //takes the receiver collection refrence self.receiverCollectionRef = signer.borrow<&Collectibles.Collection>(from: Collectibles.CollectionStoragePath) ?? panic("could not borrow Collection reference") } execute{ let nft <- Collectibles.mintNFT(name:name, image:image) self.receiverCollectionRef.deposit(token: <-nft) } } Now add the below function after the transaction code to the page.js file. JavaScript async function mint() { try{ const txnId = await fcl.mutate({ cadence: MINT_NFT, args: (arg,t) => [arg(name,t.String), arg(image, t.String)], payer: fcl.authz, proposer: fcl.authz, authorizations: [fcl.authz], limit:999,}); } catch(error){ console.error('Minting failed:' error) } console.log(txnId); } As for the function, the fcl.mutate syntax is the same as fcl.query. However, we do provide several extra parameters, such as the following: YAML payer: fcl.authz, proposer: fcl.authz, authorizations: [fcl.authz], limit: 50, These are Flow-specific things that define which account will be paying for the transaction (payer), broadcasting the transaction (proposer), and the accounts from which we need authorizations. (In case an account has multiple keys attached, it can behave like a multi-sig wallet.) fcl.authz refers to the currently connected account. limit is like gasLimit in the Ethereum world, which places an upper limit on the maximum amount of computation. If the computation crosses the limit, then the transaction will fail. We’ll need to add one more function that will call and handle the mintNFT function we just made. JavaScript const saveCollectible = async () => { if (urlInputRef.current.value.length > 0 && nameInputRef.current.value.length > 0) { try { setLoading(true); const transaction = await mintNFT(nameInputRef.current.value, urlInputRef.current.value); console.log('transactionID:', transaction); // Handle minting success (if needed) } catch (error) { console.error('Minting failed:', error); // Handle minting failure (if needed) } finally { setLoading(false); } } else { console.log('Empty input. Try again.'); } }; 6. Final Code With our main functions in place, we can now plug them into our UI. However, before we do that, we’ll add some useEffect calls to help load the initial state. You can add these right above the already existing useEffect call. JavaScript useEffect(() => { checkCollectionInit(); viewNFT(); }, [currentUser]); useEffect(() => { if (currentUser.loggedIn) { setCollectiblesList(collectiblesList); console.log('Setting collectibles...'); } }, [currentUser]); Now, back in our return section with the UI, we can add our functions to the appropriate parts of the app. JavaScript return ( <div> <div className="navbar"> <h1>Flow Collectibles Portal</h1> <span>Address: {currentUser?.addr ?? "NO Address"}</span> <button onClick={currentUser.addr ? fcl.unauthenticate : fcl.logIn}> {currentUser.addr ? "Log Out" : "Connect Wallet"} </button> </div> {currentUser.loggedIn ? ( <div className="main"> <div className="mutate"> <h1>Mutate Flow Blockchain</h1> <form onSubmit={(event) => { event.preventDefault(); saveCollectible(); } > <input type="text" placeholder="enter name of the NFT" ref={nameInputRef} /> <input type="text" placeholder="enter a url" ref={urlInputRef} /> <button type="submit">Mint</button> </form> <mark>Your Collection will be initialized while minting NFT.</mark> </div> <div className="query"> <h1>Query Flow Blockchain</h1> <mark>Click below button to check</mark> <button onClick={checkCollectionInit}>Check Collection</button> <p> Is your collection initialized: {isInitialized ? "Yes" : "No"} </p> <button onClick={viewIds}> View NFT IDs you hold in your collection </button> <p>NFT Id: </p> {ids.map((id) => ( <p key={id}>{id}</p> ))} </div> <div className="view"> <h1>View Your NFT</h1> <input type="text" placeholder="enter your NFT ID" onChange={handleInputChange} /> <button onClick={viewNFT}>View NFT</button> <div className="nft-card"> <p>NFT id: {nft.id}</p> <p>NFT name: {nft.name}</p> <img src={nft.image} alt={nft.name} /> </div> </div> </div> ) : ( <div className="main-2"> <h1>Connect Wallet to mint NFT!!</h1> </div> )} </div> ); Check the final code here. Now that the app is complete, let’s walk through how to use it! First, connect your wallet by clicking the “Connect Wallet” button in the top right. Now, you can mint an NFT! Enter the name of your NFT and paste in a link to the image you want to use. After you click “mint,” it will prompt you to sign a transaction with your wallet. It might take a little while for the transaction to complete. After it completes, you should be able to click the bottom button to view the IDs of your NFTs. If this is your first one, then the ID should be just “1”. Now, you can copy the ID of your NFT, paste it into the View section, and click “View NFT.” Conclusion Well done! You've finished part 2 of the Collectibles portal project. In summary, we focused on building the front end of our Collectibles portal. We did this by: Creating an app with Next.js Connecting the Flow Wallet Creating our very own NFTs for minting Viewing your NFT Have a really great day! More

Trend Report

Enterprise Security

This year has observed a rise in the sophistication and nuance of approaches to security that far surpass the years prior, with software supply chains being at the top of that list. Each year, 幸运168飞艇体彩 investigates the state of application security, and our global developer community is seeing both more automation and solutions for data protection and threat detection as well as a more common security-forward mindset that seeks to understand the Why.In our 2023 Enterprise Security Trend Report, we dive deeper into the greatest advantages and threats to application security today, including the role of software supply chains, infrastructure security, threat detection, automation and AI, and DevSecOps. Featured in this report are insights from our original research and related articles written by members of the 幸运168飞艇体彩 Community — read on to learn more!

Refcard #393

Getting Started With Large Language Models

By Tuhin Chattopadhyay

CORE

Getting Started With Large Language Models

Refcard #392

Software Supply Chain Security

By Justin Albano

CORE

More Articles 幸运飞行艇官方开奖 168飞艇计划提前预测最新开奖记录历史开奖结果开奖结果查询官方开奖历史记录开奖号码记录查询

Exploring Text Generation With Python and GPT-4

In the rapidly evolving landscape of artificial intelligence, text generation models have emerged as a cornerstone, revolutionizing how we interact with machine learning technologies. Among these models, GPT-4 stands out, showcasing an unprecedented ability to understand and generate human-like text. This article delves into the basics of text generation using GPT-4, providing Python code examples to guide beginners in creating their own AI-driven text generation applications. Understanding GPT-4 GPT-4, or Generative Pre-trained Transformer 4, represents the latest advancement in OpenAI's series of text generation models. It builds on the success of its predecessors by offering more depth and a nuanced understanding of context, making it capable of producing text that closely mimics human writing in various styles and formats. At its core, GPT-4 operates on the principles of deep learning, utilizing a transformer architecture. This architecture enables the model to pay attention to different parts of the input text differently, allowing it to grasp the nuances of language and generate coherent, contextually relevant responses. Getting Started With GPT-4 and Python To experiment with GPT-4, one needs access to OpenAI's API, which provides a straightforward way to utilize the model without the need to train it from scratch. The following Python code snippet demonstrates how to use the OpenAI API to generate text with GPT-4: Python from openai import OpenAI # Set OpenAI API key client = OpenAI(api_key = 'you_api_key_goes_here') #Get your key at https://platform.openai.com/api-keys response = client.chat.completions.create( model="gpt-4-0125-preview", # The Latest GPT-4 model. Trained with data till end of 2023 messages =[{'role':'user', 'content':"Write a short story about a robot saving earth from Aliens."}], max_tokens=250, # Response text length. temperature=0.6, # Ranges from 0 to 2, lower values ==> Determinism, Higher Values ==> Randomness top_p=1, # Ranges 0 to 1. Controls the pool of tokens. Lower ==> Narrower selection of words frequency_penalty=0, # used to discourage the model from repeating the same words or phrases too frequently within the generated text presence_penalty=0) # used to encourage the model to include a diverse range of tokens in the generated text. print(response.choices[0].message.content) In this example, we use the client.chat.completions.create function to generate text. The model parameter specifies which version of the model to use, with "gpt-4-0125-preview" representing the latest GPT-4 preview that is trained with the data available up to Dec 2023. The messages parameter feeds the initial text to the model, serving as the basis for the generated content. Other parameters like max_tokens, temperature, and top_p allow us to control the length and creativity of the output. Applications and Implications The applications of GPT-4 extend far beyond simple text generation. Industries ranging from entertainment to customer service find value in its ability to create compelling narratives, generate informative content, and even converse with users in a natural manner. However, as we integrate these models more deeply into our digital experiences, ethical considerations come to the forefront. Issues such as bias, misinformation, and the potential for misuse necessitate a thoughtful approach to deployment and regulation. Conclusion GPT-4's capabilities represent a significant leap forward in the field of artificial intelligence, offering tools that can understand and generate human-like text with remarkable accuracy. The Python example provided herein serves as a starting point for exploring the vast potential of text generation models. As we continue to push the boundaries of what AI can achieve, it remains crucial to navigate the ethical landscape with care, ensuring that these technologies augment human creativity and knowledge rather than detract from it. In summary, GPT-4 not only showcases the power of modern AI but also invites us to reimagine the future of human-computer interaction. With each advancement, we step closer to a world where machines understand not just the words we say but the meaning and emotion behind them, unlocking new possibilities for creativity, efficiency, and understanding.

By Ashok Gorantla

Why Agile Engineering Practices in Software Development Are Essential to Achieve Agility

In the Oxford Dictionary, the word agility is defined as "the ability to move quickly and easily." It is, therefore, understandable that many people relate agility to speed. Which I think is unfortunate. I much prefer the description from Sheppard and Young, two academics in the field of sports science, who proposed a new definition of agility within the sports science community as "a rapid whole-body movement with a change of velocity or direction in response to a stimulus" [1]. The term “agility” is often used to describe “a change of direction of speed.” However, there is a distinct difference between “agility” and “a change of direction of speed.” Agility involves the ability to react in unpredictable environments. Change of direction of speed, on the other hand, focuses purely on maintaining speed as the direction of travel is changed. Maintaining a speed while changing direction is usually only possible when each change of direction of travel is known in advance. Using a sports analogy as an example, in soccer, we can say that a defender’s reaction to an attacker’s sudden movement is an agility-based movement. The defender has to react based on what the attacker is doing. Compare this to an athlete running in a zig-zag through a course of pre-positioned cones, and the reactive component is missing. There are no impulsive or unpredictable events happening. The athlete is trying to maintain speed while changing direction. Often, when leaders of organizations want to adopt Agile, they do so for reasons such as “to deliver faster.” In this case, they are thinking of agile as a way to enable a change of direction of speed like the athlete, and not in the sense of agility needed by the soccer defender when faced with the attacker. This may explain why agile ways of working do not always live up to expectations, even though more and more companies are adopting it. Sticking with the sports analogy, the athlete running through the cones tries to reach each one as quickly as possible and then runs in the direction of the next until the end of the course. This works as a metaphor for defining the scope of a project and has teams work in short iterations in which they deliver each planned feature as quickly as possible and then move on to the next. This may be fine in a predictable environment, where the plan does not need to change, where requirements stay fixed, where the market stays the same, and where customer behaviors are well understood and set in stone. In many environments, however, change is a constant: customer expectations and behavior, market trends, the actions of competitors, and more. These are the VUCA environments (volatile, uncertain, complex, and ambiguous) where there is a need to react or, to put it another way, where agility is needed. Frameworks such as Scrum are meant to support agility. Sprints are short planning horizons, and the artifacts and events in Scrum are there to provide greater transparency and opportunities to inspect and adapt based on early feedback, changes in conditions, and new information. It gives an opportunity to pivot and react in a timely manner. However, Scrum is unfortunately often misunderstood as a mechanism to deliver with speed. Focusing only on speed and delivery and not investing in the practices that enable true agility is likely to actually slow things down in the long run. When the focus is only on speed, it becomes harder to maintain that speed, let alone to increase it, and any semblance of agility is a fantasy. Let me talk about a pattern that I see again and again as an example. Company A has a goal to build an e-commerce site through which they can sell their goods. Their first slice of functionality is delivered in a 1-month Sprint and consists of a static catalog page in which the company can upload its products with a short description. The first delivery is received by happy and excited stakeholders who are hungry for more. The team keeps on building, and the product keeps growing. Stakeholders make more requests, and the team works harder to keep up and deliver at the pace that stakeholders have come to expect from them. The team does not have time to invest in improving their practices. Manual regression testing becomes a bigger and bigger burden and is more challenging to complete within a Sprint. The codebase becomes more complex and brittle. The more bloated the product becomes, the more the team struggles to deliver at the same pace. To try to meet expectations, the team begins to cut corners. They end up carrying testing work over from one Sprint to the next. And there is no time for validating if what is being delivered actually produces value. This is just as well, as no analytics have been set up anyway. In the meantime, whilst the team is so busy trying to build new features and carrying out manual integration and regression testing, they do not have time either to look at improvements to their original build pipeline or to build in any automation. A release to production involves several hours of downtime, so this has to be done overnight and manually. To make matters worse, the market has been changing. The sales team has made deals with new suppliers, but this means further customizations to the site are needed for their products. Finally, the company has pushed for the platform to be available in different timezones, so the downtime for a release is a big problem and must be minimized, so they are only allowed to happen once every six months. Progress comes to a standstill. The product is riddled with technical debt. The team has lost the ability to get early feedback and the ability to react to what their attackers are doing, i.e., customers' changing needs, competitors’ actions, changing market conditions, etc. Just implementing the mechanics of a framework like Scrum does not ensure agility and does not automatically lead to a more beneficial way of working. The Agile Manifesto includes principles such as “continuous delivery of valuable software,” “continuous attention to technical excellence,” and “at regular intervals, the team reflects on how to become more effective.” Following these principles is a greater enabler of agility than following the Scrum framework alone. One effective way of enabling greater agility is to complement something like Scrum with agile engineering practices to get those expected benefits that organizations are looking for with agile. Over the years, I have encountered many Agile adoptions at companies where a lot of passion, energy, focus, and budget went into training and coaching people in implementing certain frameworks such as Scrum. However, what I do not encounter so much are companies spending the same amount of passion, energy, focus, and budget into implementing good Agile engineering practices such as Pair Programming, Test Driven Development, Continuous Integration, and Continuous Delivery. When challenged on this, responses are typically something like “We don’t have time for that now," or “Let’s just deliver this next release, and we’ll look at doing it at a later date when we have time." And, of course, that time usually never arrives. Now, of course, something like Scrum can be used without using any Agile engineering practices. After all, Scrum is just a framework. However, without good engineering practices and without striving for technical excellence, a Scrum team developing software will only get so far. Agile engineering practices are essential to achieve agility as they allow the shortening of validation cycles and get early feedback. For example, validation and feedback in real-time on quality when pairing, or that comes with proper Continuous Integration being in place. Many of the Agile engineering practices are viewed as daunting and expensive to implement, and doing so will get in the way of delivery. However, I would argue that investing in engineering practices that help to build quality or allow tasks to be automated, for example, enables sustainable development in the long run. While investing in Agile engineering practices may seem to slow things down in the short term, the aim is to be able to maintain or actually increase speed into the future while still retaining the ability to pivot. To me, it is an obvious choice to invest in implementing Agile engineering practices, but surprisingly, many companies do not. Instead, they choose to sacrifice long-term sustainability and quality for short-term speed. Creating a shared understanding of the challenges that teams face and the trade-offs for short-term speed without good engineering practices in place versus the problems that can arise without them can help to start a conversation. It is important everyone, including developers and a team’s stakeholders, understands the importance of investing in good Agile engineering practices and the impact on agility if you don’t do it. These investments can also be thought of as experiments — trying out or investigating a certain practice for a few iterations to see how it might help can be a way to get started and make it less daunting. Either way, it is questionable that an Agile process without underlying agile practices applied can be agile at all. For sustainability, robustness, quality, customer service, fitness for purpose, and true agility in software development teams, it is important for continuous investment in agile engineering practices. [1] J. M. Sheppard & W. B. Young (2006) Agility literature review: Classifications, training and testing, Journal of Sports Sciences, 24:9, 919-932, DOI: 10.1080/02640410500457109

By David Spinks

CRUDing NoSQL Data With Quarkus, Part One: MongoDB

MongoDB is one of the most reliable and robust document-oriented NoSQL databases. It allows developers to provide feature-rich applications and services with various modern built-in functionalities, like machine learning, streaming, full-text search, etc. While not a classical relational database, MongoDB is nevertheless used by a wide range of different business sectors and its use cases cover all kinds of architecture scenarios and data types. Document-oriented databases are inherently different from traditional relational ones where data are stored in tables and a single entity might be spread across several such tables. In contrast, document databases store data in separate unrelated collections, which eliminates the intrinsic heaviness of the relational model. However, given that the real world's domain models are never so simplistic to consist of unrelated separate entities, document databases (including MongoDB) provide several ways to define multi-collection connections similar to the classical databases relationships, but much lighter, more economical, and more efficient. Quarkus, the "supersonic and subatomic" Java stack, is the new kid on the block that the most trendy and influential developers are desperately grabbing and fighting over. Its modern cloud-native facilities, as well as its contrivance (compliant with the best-of-the-breed standard libraries), together with its ability to build native executables have seduced Java developers, architects, engineers, and software designers for a couple of years. We cannot go into further details here of either MongoDB or Quarkus: the reader interested in learning more is invited to check the documentation on the official MongoDB website or Quarkus website. What we are trying to achieve here is to implement a relatively complex use case consisting of CRUDing a customer-order-product domain model using Quarkus and its MongoDB extension. In an attempt to provide a real-world inspired solution, we're trying to avoid simplistic and caricatural examples based on a zero-connections single-entity model (there are dozens nowadays). So, here we go! The Domain Model The diagram below shows our customer-order-product domain model: As you can see, the central document of the model is Order, stored in a dedicated collection named Orders. An Order is an aggregate of OrderItem documents, each of which points to its associated Product. An Order document also references the Customer who placed it. In Java, this is implemented as follows: Java @MongoEntity(database = "mdb", collection="Customers") public class Customer { @BsonId private Long id; private String firstName, lastName; private InternetAddress email; private Set<Address> addresses; ... } The code above shows a fragment of the Customer class. This is a POJO (Plain Old Java Object) annotated with the @MongoEntity annotation, which parameters define the database name and the collection name. The @BsonId annotation is used in order to configure the document's unique identifier. While the most common use case is to implement the document's identifier as an instance of the ObjectID class, this would introduce a useless tidal coupling between the MongoDB-specific classes and our document. The other properties are the customer's first and last name, the email address, and a set of postal addresses. Let's look now at the Order document. Java @MongoEntity(database = "mdb", collection="Orders") public class Order { @BsonId private Long id; private DBRef customer; private Address shippingAddress; private Address billingAddress; private Set<DBRef> orderItemSet = new HashSet<>() ... } Here we need to create an association between an order and the customer who placed it. We could have embedded the associated Customer document in our Order document, but this would have been a poor design because it would have redundantly defined the same object twice. We need to use a reference to the associated Customer document, and we do this using the DBRef class. The same thing happens for the set of the associated order items where, instead of embedding the documents, we use a set of references. The rest of our domain model is quite similar and based on the same normalization ideas; for example, the OrderItem document: Java @MongoEntity(database = "mdb", collection="OrderItems") public class OrderItem { @BsonId private Long id; private DBRef product; private BigDecimal price; private int amount; ... } We need to associate the product which makes the object of the current order item. Last but not least, we have the Product document: Java @MongoEntity(database = "mdb", collection="Products") public class Product { @BsonId private Long id; private String name, description; private BigDecimal price; private Map<String, String> attributes = new HashMap<>(); ... } That's pretty much all as far as our domain model is concerned. There are, however, some additional packages that we need to look at: serializers and codecs. In order to be able to be exchanged on the wire, all our objects, be they business or purely technical ones, have to be serialized and deserialized. These operations are the responsibility of specially designated components called serializers/deserializers. As we have seen, we're using the DBRef type in order to define the association between different collections. Like any other object, a DBRef instance should be able to be serialized/deserialized. The MongoDB driver provides serializers/deserializers for the majority of the data types supposed to be used in the most common cases. However, for some reason, it doesn't provide serializers/deserializers for the DBRef type. Hence, we need to implement our own one, and this is what the serializers package does. Let's look at these classes: Java public class DBRefSerializer extends StdSerializer<DBRef> { public DBRefSerializer() { this(null); } protected DBRefSerializer(Class<DBRef> dbrefClass) { super(dbrefClass); } @Override public void serialize(DBRef dbRef, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException { if (dbRef != null) { jsonGenerator.writeStartObject(); jsonGenerator.writeStringField("id", (String)dbRef.getId()); jsonGenerator.writeStringField("collectionName", dbRef.getCollectionName()); jsonGenerator.writeStringField("databaseName", dbRef.getDatabaseName()); jsonGenerator.writeEndObject(); } } } This is our DBRef serializer and, as you can see, it's a Jackson serializer. This is because the quarkus-mongodb-panache extension that we're using here relies on Jackson. Perhaps, in a future release, JSON-B will be used but, for now, we're stuck with Jackson. It extends the StdSerializer class as usual and serializes its associated DBRef object by using the JSON generator, passed as an input argument, to write on the output stream the DBRef components; i.e., the object ID, the collection name, and the database name. For more information concerning the DBRef structure, please see the MongoDB documentation. The deserializer is performing the complementary operation, as shown below: Java public class DBRefDeserializer extends StdDeserializer<DBRef> { public DBRefDeserializer() { this(null); } public DBRefDeserializer(Class<DBRef> dbrefClass) { super(dbrefClass); } @Override public DBRef deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException, JacksonException { JsonNode node = jsonParser.getCodec().readTree(jsonParser); return new DBRef(node.findValue("databaseName").asText(), node.findValue("collectionName").asText(), node.findValue("id").asText()); } } This is pretty much all that may be said as far as the serializers/deserializers are concerned. Let's move further to see what the codecs package brings to us. Java objects are stored in a MongoDB database using the BSON (Binary JSON) format. In order to store information, the MongoDB driver needs the ability to map Java objects to their associated BSON representation. It does that on behalf of the Codec interface, which contains the required abstract methods for the mapping of the Java objects to BSON and the other way around. Implementing this interface, one can define the conversion logic between Java and BSON, and conversely. The MongoDB driver includes the required Codec implementation for the most common types but again, for some reason, when it comes to DBRef, this implementation is only a dummy one, which raises UnsupportedOperationException. Having contacted the MongoDB driver implementers, I didn't succeed in finding any other solution than implementing my own Codec mapper, as shown by the class DocstoreDBRefCodec. For brevity reasons, we won't reproduce this class' source code here. Once our dedicated Codec is implemented, we need to register it with the MongoDB driver, such that it uses it when it comes to mapping DBRef types to Java objects and conversely. In order to do that, we need to implement the interface CoderProvider which, as shown by the class DocstoreDBRefCodecProvider, returns via its abstract get() method, the concrete class responsible for performing the mapping; i.e., in our case, DocstoreDBRefCodec. And that's all we need to do here as Quarkus will automatically discover and use our CodecProvider customized implementation. Please have a look at these classes to see and understand how things are done. The Data Repositories Quarkus Panache greatly simplifies the data persistence process by supporting both the active record and the repository design patterns. Here, we'll be using the second one. As opposed to similar persistence stacks, Panache relies on the compile-time bytecode enhancements of the entities. It includes an annotation processor that automatically performs these enhancements. All that this annotation processor needs in order to perform its enhancements job is an interface like the one below: Java @ApplicationScoped public class CustomerRepository implements PanacheMongoRepositoryBase<Customer, Long>{} The code above is all that you need in order to define a complete service able to persist Customer document instances. Your interface needs to extend the PanacheMongoRepositoryBase one and parameter it with your object ID type, in our case a Long. The Panache annotation processor will generate all the required endpoints required to perform the most common CRUD operations, including but not limited to saving, updating, deleting, querying, paging, sorting, transaction handling, etc. All these details are fully explained here. Another possibility is to extend PanacheMongoRepository instead of PanacheMongoRepositoryBase and to use the provided ObjectID keys instead of customizing them as Long, as we did in our example. Whether you chose the 1st or the 2nd alternative, this is only a preference matter. The REST API In order for our Panache-generated persistence service to become effective, we need to expose it through a REST API. In the most common case, we have to manually craft this API, together with its implementation, consisting of the full set of the required REST endpoints. This fastidious and repetitive operation might be avoided by using the quarkus-mongodb-rest-data-panache extension, which annotation processor is able to automatically generate the required REST endpoints, out of interfaces having the following pattern: Java public interface CustomerResource extends PanacheMongoRepositoryResource<CustomerRepository, Customer, Long> {} Believe it if you want: this is all you need to generate a full REST API implementation with all the endpoints required to invoke the persistence service generated previously by the mongodb-panache extension annotation processor. Now we are ready to build our REST API as a Quarkus microservice. We chose to build this microservice as a Docker image, on behalf of the quarkus-container-image-jib extension. By simply including the following Maven dependency: XML <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-container-image-jib</artifactId> </dependency> The quarkus-maven-plugin will create a local Docker image to run our microservice. The parameters of this Docker image are defined by the application.properties file, as follows: Properties files quarkus.container-image.build=true quarkus.container-image.group=quarkus-nosql-tests quarkus.container-image.name=docstore-mongodb quarkus.mongodb.connection-string = mongodb://admin:admin@mongo:27017 quarkus.mongodb.database = mdb quarkus.swagger-ui.always-include=true quarkus.jib.jvm-entrypoint=/opt/jboss/container/java/run/run-java.sh Here we define the name of the newly created Docker image as being quarkus-nosql-tests/docstore-mongodb. This is the concatenation of the parameters quarkus.container-image.group and quarkus.container-image.name separated by a "/". The property quarkus.container-image.build having the value true instructs the Quarkus plugin to bind the build operation to the package phase of maven. This way, simply executing a mvn package command, we generate a Docker image able to run our microservice. This may be tested by running the docker images command. The property named quarkus.jib.jvm-entrypoint defines the command to be run by the newly generated Docker image. quarkus-run.jar is the Quarkus microservice standard startup file used when the base image is ubi8/openjdk-17-runtime, as in our case. Other properties are quarkus.mongodb.connection-string and quarkus.mongodb.database = mdb which define the MongoDB database connection string and the name of the database. Last but not least, the property quarkus.swagger-ui.always-include includes the Swagger UI interface in our microservice space such that it allows us to test it easily. Let's see now how to run and test the whole thing. Running and Testing Our Microservices Now that we looked at the details of our implementation, let's see how to run and test it. We chose to do it on behalf of the docker-compose utility. Here is the associated docker-compose.yml file: YAML version: "3.7" services: mongo: image: mongo environment: MONGO_INITDB_ROOT_USERNAME: admin MONGO_INITDB_ROOT_PASSWORD: admin MONGO_INITDB_DATABASE: mdb hostname: mongo container_name: mongo ports: - "27017:27017" volumes: - ./mongo-init/:/docker-entrypoint-initdb.d/:ro mongo-express: image: mongo-express depends_on: - mongo hostname: mongo-express container_name: mongo-express links: - mongo:mongo ports: - 8081:8081 environment: ME_CONFIG_MONGODB_ADMINUSERNAME: admin ME_CONFIG_MONGODB_ADMINPASSWORD: admin ME_CONFIG_MONGODB_URL: mongodb://admin:admin@mongo:27017/ docstore: image: quarkus-nosql-tests/docstore-mongodb:1.0-SNAPSHOT depends_on: - mongo - mongo-express hostname: docstore container_name: docstore links: - mongo:mongo - mongo-express:mongo-express ports: - "8080:8080" - "5005:5005" environment: JAVA_DEBUG: "true" JAVA_APP_DIR: /home/jboss JAVA_APP_JAR: quarkus-run.jar This file instructs the docker-compose utility to run three services: A service named mongo running the Mongo DB 7 database A service named mongo-express running the MongoDB administrative UI A service named docstore running our Quarkus microservice We should note that the mongo service uses an initialization script mounted on the docker-entrypoint-initdb.d directory of the container. This initialization script creates the MongoDB database named mdb such that it could be used by the microservices. JavaScript db = db.getSiblingDB(process.env.MONGO_INITDB_ROOT_USERNAME); db.auth( process.env.MONGO_INITDB_ROOT_USERNAME, process.env.MONGO_INITDB_ROOT_PASSWORD, ); db = db.getSiblingDB(process.env.MONGO_INITDB_DATABASE); db.createUser( { user: "nicolas", pwd: "password1", roles: [ { role: "dbOwner", db: "mdb" }] }); db.createCollection("Customers"); db.createCollection("Products"); db.createCollection("Orders"); db.createCollection("OrderItems"); This is an initialization JavaScript that creates a user named nicolas and a new database named mdb. The user has administrative privileges on the database. Four new collections, respectively named Customers, Products, Orders and OrderItems, are created as well. In order to test the microservices, proceed as follows: Clone the associated GitHub repository: $ git clone https://github.com/nicolasduminil/docstore.git Go to the project: $ cd docstore Build the project: $ mvn clean install Check that all the required Docker containers are running: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7882102d404d quarkus-nosql-tests/docstore-mongodb:1.0-SNAPSHOT "/opt/jboss/containe…" 8 seconds ago Up 6 seconds 0.0.0.0:5005->5005/tcp, :::5005->5005/tcp, 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 8443/tcp docstore 786fa4fd39d6 mongo-express "/sbin/tini -- /dock…" 8 seconds ago Up 7 seconds 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp mongo-express 2e850e3233dd mongo "docker-entrypoint.s…" 9 seconds ago Up 7 seconds 0.0.0.0:27017->27017/tcp, :::27017->27017/tcp mongo Run the integration tests: $ mvn -DskipTests=false failsafe:integration-test This last command will run all the integration tests which should succeed. These integration tests are implemented using the RESTassured library. The listing below shows one of these integration tests located in the docstore-domain project: Java @QuarkusIntegrationTest @TestMethodOrder(MethodOrderer.OrderAnnotation.class) public class CustomerResourceIT { private static Customer customer; @BeforeAll public static void beforeAll() throws AddressException { customer = new Customer("John", "Doe", new InternetAddress("john.doe@gmail.com")); customer.addAddress(new Address("Gebhard-Gerber-Allee 8", "Kornwestheim", "Germany")); customer.setId(10L); } @Test @Order(10) public void testCreateCustomerShouldSucceed() { given() .header("Content-type", "application/json") .and().body(customer) .when().post("/customer") .then() .statusCode(HttpStatus.SC_CREATED); } @Test @Order(20) public void testGetCustomerShouldSucceed() { assertThat (given() .header("Content-type", "application/json") .when().get("/customer") .then() .statusCode(HttpStatus.SC_OK) .extract().body().jsonPath().getString("firstName[0]")).isEqualTo("John"); } @Test @Order(30) public void testUpdateCustomerShouldSucceed() { customer.setFirstName("Jane"); given() .header("Content-type", "application/json") .and().body(customer) .when().pathParam("id", customer.getId()).put("/customer/{id}") .then() .statusCode(HttpStatus.SC_NO_CONTENT); } @Test @Order(40) public void testGetSingleCustomerShouldSucceed() { assertThat (given() .header("Content-type", "application/json") .when().pathParam("id", customer.getId()).get("/customer/{id}") .then() .statusCode(HttpStatus.SC_OK) .extract().body().jsonPath().getString("firstName")).isEqualTo("Jane"); } @Test @Order(50) public void testDeleteCustomerShouldSucceed() { given() .header("Content-type", "application/json") .when().pathParam("id", customer.getId()).delete("/customer/{id}") .then() .statusCode(HttpStatus.SC_NO_CONTENT); } @Test @Order(60) public void testGetSingleCustomerShouldFail() { given() .header("Content-type", "application/json") .when().pathParam("id", customer.getId()).get("/customer/{id}") .then() .statusCode(HttpStatus.SC_NOT_FOUND); } } You can also use the Swagger UI interface for testing purposes by firing your preferred browser at http://localhost:8080/q:swagger-ui. Then, in order to test endpoints, you can use the payloads in the JSON files located in the src/resources/data directory of the docstore-api project. You also can use the MongoDB UI administrative interface by going to http://localhost:8081 and authenticating yourself with the default credentials (admin/pass). You can find the project source code in my GitHub repository. Enjoy!

By Nicolas Duminil

CORE

Hints for Unit Testing With AssertJ

Unit testing has become a standard part of development. Many tools can be utilized for it in many different ways. This article demonstrates a couple of hints or, let's say, best practices working well for me. In This Article, You Will Learn How to write clean and readable unit tests with JUnit and Assert frameworks How to avoid false positive tests in some cases What to avoid when writing unit tests Don't Overuse NPE Checks We all tend to avoid NullPointerException as much as possible in the main code because it can lead to ugly consequences. I believe our main concern is not to avoid NPE in tests. Our goal is to verify the behavior of a tested component in a clean, readable, and reliable way. Bad Practice Many times in the past, I've used isNotNull assertion even when it wasn't needed, like in the example below: Java @Test public void getMessage() { assertThat(service).isNotNull(); assertThat(service.getMessage()).isEqualTo("Hello world!"); } This test produces errors like this: Plain Text java.lang.AssertionError: Expecting actual not to be null at com.github.aha.poc.junit.spring.StandardSpringTest.test(StandardSpringTest.java:19) Good Practice Even though the additional isNotNull assertion is not really harmful, it should be avoided due to the following reasons: It doesn't add any additional value. It's just more code to read and maintain. The test fails anyway when service is null and we see the real root cause of the failure. The test still fulfills its purpose. The produced error message is even better with the AssertJ assertion. See the modified test assertion below. Java @Test public void getMessage() { assertThat(service.getMessage()).isEqualTo("Hello world!"); } The modified test produces an error like this: Java java.lang.NullPointerException: Cannot invoke "com.github.aha.poc.junit.spring.HelloService.getMessage()" because "this.service" is null at com.github.aha.poc.junit.spring.StandardSpringTest.test(StandardSpringTest.java:19) Note: The example can be found in SimpleSpringTest. Assert Values and Not the Result From time to time, we write a correct test, but in a "bad" way. It means the test works exactly as intended and verifies our component, but the failure isn't providing enough information. Therefore, our goal is to assert the value and not the comparison result. Bad Practice Let's see a couple of such bad tests: Java // #1 assertThat(argument.contains("o")).isTrue(); // #2 var result = "Welcome to JDK 10"; assertThat(result instanceof String).isTrue(); // #3 assertThat("".isBlank()).isTrue(); // #4 Optional<Method> testMethod = testInfo.getTestMethod(); assertThat(testMethod.isPresent()).isTrue(); Some errors from the tests above are shown below. Plain Text #1 Expecting value to be true but was false at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) at com.github.aha.poc.junit5.params.SimpleParamTests.stringTest(SimpleParamTests.java:23) #3 Expecting value to be true but was false at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) at com.github.aha.poc.junit5.ConditionalTests.checkJdk11Feature(ConditionalTests.java:50) Good Practice The solution is quite easy with AssertJ and its fluent API. All the cases mentioned above can be easily rewritten as: Java // #1 assertThat(argument).contains("o"); // #2 assertThat(result).isInstanceOf(String.class); // #3 assertThat("").isBlank(); // #4 assertThat(testMethod).isPresent(); The very same errors as mentioned before provide more value now. Plain Text #1 Expecting actual: "Hello" to contain: "f" at com.github.aha.poc.junit5.params.SimpleParamTests.stringTest(SimpleParamTests.java:23) #3 Expecting blank but was: "a" at com.github.aha.poc.junit5.ConditionalTests.checkJdk11Feature(ConditionalTests.java:50) Note: The example can be found in SimpleParamTests. Group-Related Assertions Together The assertion chaining and a related code indentation help a lot in the test clarity and readability. Bad Practice As we write a test, we can end up with the correct, but less readable test. Let's imagine a test where we want to find countries and do these checks: Count the found countries. Assert the first entry with several values. Such tests can look like this example: Java @Test void listCountries() { List<Country> result = ...; assertThat(result).hasSize(5); var country = result.get(0); assertThat(country.getName()).isEqualTo("Spain"); assertThat(country.getCities().stream().map(City::getName)).contains("Barcelona"); } Good Practice Even though the previous test is correct, we should improve the readability a lot by grouping the related assertions together (lines 9-11). The goal here is to assert result once and write many chained assertions as needed. See the modified version below. Java @Test void listCountries() { List<Country> result = ...; assertThat(result) .hasSize(5) .singleElement() .satisfies(c -> { assertThat(c.getName()).isEqualTo("Spain"); assertThat(c.getCities().stream().map(City::getName)).contains("Barcelona"); }); } Note: The example can be found in CountryRepositoryOtherTests. Prevent False Positive Successful Test When any assertion method with the ThrowingConsumer argument is used, then the argument has to contain assertThat in the consumer as well. Otherwise, the test would pass all the time - even when the comparison fails, which means the wrong test. The test fails only when an assertion throws a RuntimeException or AssertionError exception. I guess it's clear, but it's easy to forget about it and write the wrong test. It happens to me from time to time. Bad Practice Let's imagine we have a couple of country codes and we want to verify that every code satisfies some condition. In our dummy case, we want to assert that every country code contains "a" character. As you can see, it's nonsense: we have codes in uppercase, but we aren't applying case insensitivity in the assertion. Java @Test void assertValues() throws Exception { var countryCodes = List.of("CZ", "AT", "CA"); assertThat( countryCodes ) .hasSize(3) .allSatisfy(countryCode -> countryCode.contains("a")); } Surprisingly, our test passed successfully. Good Practice As mentioned at the beginning of this section, our test can be corrected easily with additional assertThat in the consumer (line 7). The correct test should be like this: Java @Test void assertValues() throws Exception { var countryCodes = List.of("CZ", "AT", "CA"); assertThat( countryCodes ) .hasSize(3) .allSatisfy(countryCode -> assertThat( countryCode ).containsIgnoringCase("a")); } Now the test fails as expected with the correct error message. Plain Text java.lang.AssertionError: Expecting all elements of: ["CZ", "AT", "CA"] to satisfy given requirements, but these elements did not: "CZ" error: Expecting actual: "CZ" to contain: "a" (ignoring case) at com.github.aha.sat.core.clr.AppleTest.assertValues(AppleTest.java:45) Chain Assertions The last hint is not really the practice, but rather the recommendation. The AssertJ fluent API should be utilized in order to create more readable tests. Non-Chaining Assertions Let's consider listLogs test, whose purpose is to test the logging of a component. The goal here is to check: Asserted number of collected logs Assert existence of DEBUG and INFO log message Java @Test void listLogs() throws Exception { ListAppender<ILoggingEvent> logAppender = ...; assertThat( logAppender.list ).hasSize(2); assertThat( logAppender.list ).anySatisfy(logEntry -> { assertThat( logEntry.getLevel() ).isEqualTo(DEBUG); assertThat( logEntry.getFormattedMessage() ).startsWith("Initializing Apple"); }); assertThat( logAppender.list ).anySatisfy(logEntry -> { assertThat( logEntry.getLevel() ).isEqualTo(INFO); assertThat( logEntry.getFormattedMessage() ).isEqualTo("Here's Apple runner" ); }); } Chaining Assertions With the mentioned fluent API and the chaining, we can change the test this way: Java @Test void listLogs() throws Exception { ListAppender<ILoggingEvent> logAppender = ...; assertThat( logAppender.list ) .hasSize(2) .anySatisfy(logEntry -> { assertThat( logEntry.getLevel() ).isEqualTo(DEBUG); assertThat( logEntry.getFormattedMessage() ).startsWith("Initializing Apple"); }) .anySatisfy(logEntry -> { assertThat( logEntry.getLevel() ).isEqualTo(INFO); assertThat( logEntry.getFormattedMessage() ).isEqualTo("Here's Apple runner" ); }); } Note: the example can be found in AppleTest. Summary and Source Code The AssertJ framework provides a lot of help with their fluent API. In this article, several tips and hints were presented in order to produce clearer and more reliable tests. Please be aware that most of these recommendations are subjective. It depends on personal preferences and code style. The used source code can be found in my repositories: spring-advanced-training junit-poc

By Arnošt Havelka

CORE

Orchestrating dbt Workflows: The Duel of Apache Airflow and AWS Step Functions

Think of data pipeline orchestration as the backstage crew of a theater, ensuring every scene flows seamlessly into the next. In the data world, tools like Apache Airflow and AWS Step Functions are the unsung heroes that keep the show running smoothly, especially when you're working with dbt (data build tool) to whip your data into shape and ensure that the right data is available at the right time. Both tools are often used alongside dbt (data build tool), which has emerged as a powerful tool for transforming data in a warehouse. In this article, we will introduce dbt, Apache Airflow, and AWS Step Functions and then delve into the pros and cons of using Apache Airflow and AWS Step Functions for data pipeline orchestration involving dbt. A note that dbt has a paid version of dbt cloud and a free open source version; we are focussing on dbt-core, the free version of dbt. dbt (Data Build Tool) dbt-core is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively. It allows users to write modular SQL queries, which it then runs on top of the data warehouse in the appropriate order with respect to their dependencies. Key Features Version control: It integrates with Git to help track changes, collaborate, and deploy code. Documentation: Autogenerated documentation and a searchable data catalog are created based on the dbt project. Modularity: Reusable SQL models can be referenced and combined to build complex transformations. Airflow vs. AWS Step Functions for dbt Orchestration Apache Airflow Apache Airflow is an open-source tool that helps to create, schedule, and monitor workflows. It is used by data engineers/ analysts to manage complex data pipelines. Key Features Extensibility: Custom operators, executors, and hooks can be written to extend Airflow’s functionality. Scalability: Offers dynamic pipeline generation and can scale to handle multiple data pipeline workflows. Example: DAG Shell from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.now() - timedelta(days=1), 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), } dag = DAG('dbt_daily_job', default_args=default_args, description='A simple DAG to run dbt jobs', schedule_interval=timedelta(days=1)) dbt_run = BashOperator( task_id='dbt_run', bash_command='dbt build --s sales.sql', dag=dag, ) slack_notify = SlackAPIPostOperator( task_id='slack_notify', dag=dag, # Replace with your actual Slack notification code ) dbt_run >> slack_notify Pros Flexibility: Apache Airflow offers unparalleled flexibility with the ability to define custom operators and is not limited to AWS resources. Community support: A vibrant open-source community actively contributes plugins and operators that provide extended functionalities. Complex workflows: Better suited to complex task dependencies and can manage task orchestration across various systems. Cons Operational overhead: Requires management of underlying infrastructure unless managed services like Astronomer or Google Cloud Composer are used. Learning curve: The rich feature set comes with a complexity that may present a steeper learning curve for some users. AWS Step Functions AWS Step Functions is a fully managed service provided by Amazon Web Services that makes it easier to orchestrate microservices, serverless applications, and complex workflows. It uses a state machine model to define and execute workflows, which can consist of various AWS services like Lambda, ECS, Sagemaker, and more. Key Features Serverless operation: No need to manage infrastructure as AWS provides a managed service. Integration with AWS Services: Seamless connection to AWS services is supported for complex orchestration. Example: State Machine Cloud Formation Template (Step Function) Shell AWSTemplateFormatVersion: '2010-09-09' Description: State Machine to run a dbt job Resources: DbtStateMachine: Type: 'AWS::StepFunctions::StateMachine' Properties: StateMachineName: DbtStateMachine RoleArn: !Sub 'arn:aws:iam::${AWS::AccountId}:role/service-role/StepFunctions-ECSTaskRole' DefinitionString: !Sub | Comment: "A Step Functions state machine that executes a dbt job using an ECS task." StartAt: RunDbtJob States: RunDbtJob: Type: Task Resource: "arn:aws:states:::ecs:runTask.sync" Parameters: Cluster: "arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:cluster/MyECSCluster" TaskDefinition: "arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task-definition/MyDbtTaskDefinition" LaunchType: FARGATE NetworkConfiguration: AwsvpcConfiguration: Subnets: - "subnet-0193156582abfef1" - "subnet-abcjkl0890456789" AssignPublicIp: "ENABLED" End: true Outputs: StateMachineArn: Description: The ARN of the dbt state machine Value: !Ref DbtStateMachine When using AWS ECS with AWS Fargate to run dbt workflows, while you can define the dbt command in DbtTaskdefinition, it's also common to create a Docker image that contains not only the dbt environment but also the specific dbt commands you wish to run. Pros Fully managed service: AWS manages the scaling and operation under the hood, leading to reduced operational burden. AWS integration: Natural fit for AWS-centric environments, allowing easy integration of various AWS services. Reliability: Step Functions provide a high level of reliability and support, backed by AWS SLA. Cons Cost: Pricing might be higher for high-volume workflows compared to running your self-hosted or cloud-provider-managed Airflow instance. Step functions incur costs based on the number of state transitions. Locked-in with AWS: Tightly coupled with AWS services, which can be a downside if you're aiming for a cloud-agnostic architecture. Complexity in handling large workflows: While capable, it can become difficult to manage larger, more complex workflows compared to using Airflow's DAGs. There are limitations on the number of parallel executions of a State Machine. Learning curve: The service also presents a learning curve with specific paradigms, such as the Amazon States Language. Scheduling: AWS Step functions need to rely on other AWS services like AWS Eventbridge for scheduling. Summary Choosing the right tool for orchestrating dbt workflows comes down to assessing specific features and how they align with a team's needs. The main attributes that inform this decision include customization, cloud alignment, infrastructure flexibility, managed services, and cost considerations. Customization and Extensibility Apache Airflow is highly customizable and extends well, allowing teams to create tailored operators and workflows for complex requirements. Integration With AWS AWS Step Functions is the clear winner for teams operating solely within AWS, offering deep integration with the broader AWS ecosystem. Infrastructure Flexibility Apache Airflow supports a wide array of environments, making it ideal for multi-cloud or on-premises deployments. Managed Services Here, it’s a tie. For managed services, teams can opt for Amazon Managed Workflows for Apache Airflow (MWAA) for an AWS-centric approach or a vendor like Astronomer for hosting Airflow in different environments. There are also platforms like Dagster that offer similar features to Airflow and can be managed as well. This category is highly competitive and will be based on the level of integration and vendor preference. Cost at Scale Apache Airflow may prove more cost-effective for scale, given its open-source nature and the potential for optimized cloud or on-premises deployment. AWS Step Functions may be more economical at smaller scales or for teams with existing AWS infrastructure. Conclusion The choice between Apache Airflow and AWS Step Functions for orchestrating dbt workflows is nuanced. For operations deeply rooted in AWS with a preference for serverless execution and minimal maintenance, AWS Step Functions is the recommended choice. For those requiring robust customizability, diverse infrastructure support, or cost-effective scalability, Apache Airflow—whether self-managed or via a platform like Astronomer or MWAA (AWS-managed)—emerges as the optimal solution.

By Suhas Jangoan

The Cost Crisis in Observability Tooling

The cost of services is on everybody’s mind right now, with interest rates rising, economic growth slowing, and organizational budgets increasingly feeling the pinch. But I hear a special edge in people’s voices when it comes to their observability bill, and I don’t think it’s just about the cost of goods sold. I think it’s because people are beginning to correctly intuit that the value they get out of their tooling has become radically decoupled from the price they are paying. In the happiest cases, the price you pay for your tools is “merely” rising at a rate several times faster than the value you get out of them. But that’s actually the best-case scenario. For an alarming number of people, the value they get actually decreases as their bill goes up. Observability 1.0 and the Cost Multiplier Effect Are you familiar with this chestnut? “Observability has three pillars: metrics, logs, and traces.” This isn’t exactly true, but it’s definitely true of a particular generation of tools—one might even say it's definitionally true of a particular generation of tools. Let’s call it “observability 1.0.” From an evolutionary perspective, you can see how we got here. Everybody has logs… so we spin up a service for log aggregation. But logs are expensive, and everybody wants dashboards… so we buy a metrics tool. Software engineers want to instrument their applications… so we buy an APM tool. We start unbundling the monolith into microservices, and pretty soon, we can’t understand anything without traces… so we buy a tracing tool. The front-end engineers point out that they need sessions and browser data… so we buy a RUM tool. On and on it goes. Logs, metrics, traces, APM, RUM. You’re now paying to store telemetry five different ways, in five different places, for every single request. And a 5x multiplier is on the modest side of the spectrum, given how many companies pay for multiple overlapping tools in the same category. You may also be collecting: Profiling data Product analytics Business intelligence data Database monitoring/query profiling tools Mobile app telemetry Behavioral analytics Crash reporting Language-specific profiling data Stack traces CloudWatch or hosting provider metrics …and so on. So, how many times are you paying to store data about your user requests? What’s your multiplier? (If you have one consolidated vendor bill, this may require looking at your itemized bill.) There are many types of tools, each gathering slightly different data for a slightly different use case, but underneath the hood, there are really only three basic data types: metric, unstructured logs, and structured logs. Each of these has its own distinctive trade-offs when it comes to how much they cost and how much value you can get out of them. Metrics Metrics are the great-granddaddy of telemetry formats: tiny, fast, and cheap. A “metric” consists of a single number, often with tags appended. All of the contexts of the request get discarded at write time; each individual metric is emitted separately. This means you can never correlate one metric with another from the same request, or select all the metrics for a given request ID, user, or app ID, or ask arbitrary new questions about your metrics data. Metrics-based tools include vendors like Datadog and open-source projects like Prometheus. RUM tools are built on top of metrics to understand browser user sessions; APM tools are built on top of metrics to understand application performance. When you set up a metrics tool, it generally comes prepopulated with a bunch of basic metrics, but the useful ones are typically the custom metrics you emit from your application. Your metrics bill is usually dominated by the cost of these custom metrics. At a minimum, your bill goes up linearly with the number of custom metrics you create. This is unfortunate because to restrain your bill from unbounded growth, you have to regularly audit your metrics, do your best to guess which ones are going to be useful in the future and prune any you think you can afford to go without. Even in the hands of experts, these tools require significant oversight. Linear cost growth is the goal, but it’s rarely achieved. The cost of each metric varies wildly depending on how you construct it, what the values are, how often it gets hit, etc. I’ve seen a single custom metric cost $30k per month. You probably have dozens of custom metrics per service, and it’s almost impossible to tell how much each of them costs you. Metrics bills tend to be incredibly opaque (possibly by design). Nobody can understand their software or their systems with a metrics tool alone because the metric is extremely limited in what it can do. No context, no cardinality, no strings… only basic static dashboards. For richer data, we must turn to logs. Unstructured Logs You can understand much more about your code with logs than you can with metrics. Logs are typically emitted multiple times throughout the execution of the request, with one or a small number of nouns per log line plus the request ID. Unstructured logs are still the default, although this is slowly changing. The cost of unstructured logs is driven by a few things: Write amplification: If you want to capture lots of rich context about the request, you need to emit a lot of log lines. If you are printing out just 10 log lines per request, per service, and you have half a dozen services, that’s 60 log events for every request. Noisiness: It’s extremely easy to accidentally blow up your log footprint yet add no value—e.g., by putting a print statement inside a loop instead of outside the loop. Here, the usefulness of the data goes down as the bill shoots up. Constraints on physical resources Due to the write amplification of log lines per request, it’s often physically impossible to log everything you want to log for all requests or all users—it would saturate your NIC or disk. Therefore, people tend to use blunt instruments like these to blindly slash the log volume: Log levels Consistent hashes Dumb sample rates When you emit multiple log lines per request, you end up duplicating a lot of raw data; sometimes, over half the bits are consumed by request ID, process ID, and timestamp. This can be quite meaningful from a cost perspective. All of these factors can be annoying. But the worst thing about unstructured logs is that the only thing you can do to query them is a full-text search. The more data you have, the slower it becomes to search that data, and there’s not much you can do about it. Searching your logs over any meaningful length of time can take minutes or even hours, which means experimenting and looking around for unknown unknowns is prohibitively time-consuming. You have to know what to look for in order to find it. Once again, as your logging bill goes up, the value goes down. Structured Logs Structured logs are gaining adoption across the industry, especially as OpenTelemetry picks up steam. The nice thing about structured logs is that you can actually do things with the data other than slow, dumb string searches. If you’ve structured your data properly, you can perform calculations! Compute percentiles! Generate heatmaps! Tools built on structured logs are so clearly the future. But just taking your existing logs and adding structure isn’t quite good enough. If all you do is stuff your existing log lines into key/value pairs, the problems of amplification, noisiness, and physical constraints remain unchanged—you can just search more efficiently and do some math with your data. There are a number of things you can and should do to your structured logs in order to use them more effectively and efficiently. In order of achievability: Instrument your code using the principles of canonical logs, which collect all the vital characteristics of a request into one wide, dense event. It is difficult to overstate the value of doing this for reasons of usefulness and usability as well as cost control. Add trace IDs and span IDs so you can trace your code using the same events instead of having to use an entirely separate tool. Feed your data into a columnar storage engine so you don’t have to predefine a schema or indexes to decide which dimensions of the future you can search or compute based on. Use a storage engine that supports high cardinality with an explorable interface. If you go far enough down this path of enriching your structured events, instrumenting your code with the right data, and displaying it in real-time, you will reach an entirely different set of capabilities, with a cost model so distinct it can only be described as “observability 2.0.” More on that in a second. Ballooning Costs Are Baked Into Observability 1.0 To recap, high costs are baked into the observability 1.0 model. Every pillar has a price. You have to collect and store your data—and pay to store it—again and again and again for every single use case. Depending on how many tools you use, your observability bill may be growing at a rate 3x faster than your traffic is growing, or 5x, or 10x, or even more. It gets worse. As your costs go up, the value you get out of your tools goes down. Your logs get slower and slower to search. You have to know what you’re searching for in order to find it. You have to use a blunt force sampling technique to keep the log volume from blowing up. Any time you want to be able to ask a new question, you first have to commit to a new code and deploy it. You have to guess which custom metrics you’ll need and which fields to index in advance. As the volume goes up, your ability to find a needle in the haystack—any unknown-unknowns—goes down commensurately. And nothing connects any of these tools. You cannot correlate a spike in your metrics dashboard with the same requests in your logs, nor can you trace one of the errors. It’s impossible. If your APM and metrics tools report different error rates, you have no way of resolving this confusion. The only thing connecting any of these tools is the intuition and straight-up guesses made by your most senior engineers. This means that the cognitive costs are immense, and your bus factor risks are very real. The most important connective data in your system—connecting metrics with logs and logs with traces—exists only in the heads of a few people. At the same time, the engineering overhead required to manage all these tools (and their bills) rises inexorably. With metrics, an engineer needs to spend time auditing your metrics, tracking people down to fix poorly constructed metrics, and reaping those that are too expensive or don’t get used. With logs, an engineer needs to spend time monitoring the log volume, watching for spammy or duplicate log lines, pruning or consolidating them, and choosing and maintaining indexes. But all this time spent wrangling observability 1.0 data types isn’t even the costliest part. The most expensive part is the unseen costs inflicted on your engineering organization as development slows down and tech debt piles up due to low visibility and, thus, low confidence. Is there an alternative? Yes. The Cost Model of Observability 2.0 Is Very Different Observability 2.0 has no three pillars; it has a single source of truth. Observability 2.0 tools are built on top of arbitrarily wide structured log events, also known as spans. From these wide, context-rich structured log events, you can derive the other data types (metrics, logs, or traces). Since there is only one data source, you can correlate and cross-correlate to your heart’s content. You can switch fluidly back and forth between slicing and dicing, breaking down or grouping by events, and viewing them as a trace waterfall. You don’t have to worry about cardinality or key space limitations. You also effectively get infinite custom metrics since you can append as many as you want to the same events. Not only does your cost not go up linearly as you add more custom metrics, but your telemetry just gets richer and more valuable the more key-value pairs you add! Nor are you limited to numbers; you can add any and all types of data, including valuable high-cardinality fields like “App Id” or “Full Name.” Observability 2.0 has its own amplification factor to consider. As you instrument your code with more spans per request, the number of events you have to send (and pay for) goes up. However, you have some very powerful tools for dealing with this: you can perform dynamic head-based sampling or even tail-based sampling, where you decide whether or not to keep the event after it’s finished, allowing you to capture 100% of slow requests and other outliers. Engineering Time Is Your Most Precious Resource But the biggest difference between observability 1.0 and 2.0 won’t show up on any invoice. The difference shows up in your engineering team’s ability to move quickly and with confidence. Modern software engineering is all about hooking up fast feedback loops. Observability 2.0 tooling is what unlocks the kind of fine-grained, exploratory experience you need in order to accelerate those feedback loops. Where observability 1.0 is about MTTR, MTTD, reliability, and operating software, observability 2.0 is what underpins the entire software development lifecycle, setting the bar for how swiftly you can build and ship software, find problems, and iterate on them. Observability 2.0 is about being in conversation with your code, understanding each user’s experience, and building the right things. Observability 2.0 isn’t exactly cheap either, although it is often less expensive. But the key difference between o11y 1.0 and o11y 2.0 has never been that either is cheap; it’s that with observability 2.0 when your bill goes up, the value you derive from your telemetry goes up too. You pay more money; you get more out of your tools than you should. Note: Earlier, I said, “Nothing connects any of these tools.” If you are using a single unified vendor for your metrics, logging, APM, RUM, and tracing tools, this is not strictly true. Vendors like New Relic or Datadog now let you define certain links between your traces and metrics, which allows you to correlate between data types in a few limited, predefined ways. This is better than nothing! But it’s very different from the kind of fluid, open-ended correlation capabilities that we describe with o11y 2.0. With o11y 2.0, you can slice and dice, break down, and group by your complex data sets, then grab a trace that matches any specific set of criteria at any level of granularity. With o11y 1.0, you can define a metric up front, then grab a random exemplar of that metric, and that’s it. All the limitations of metrics still apply; you can’t correlate any metric with any other metric from that request, app, user, etc, and you certainly can’t trace arbitrary criteria. But it’s not nothing.

By Charity Majors

Spring 168飞艇直播开奖结果2024最新全天计划精准版24小时免费下载 Strategy Pattern Example

In this example, we'll learn about the Strategy pattern in Spring. We'll cover different ways to inject strategies, starting from a simple list-based approach to a more efficient map-based method. To illustrate the concept, we'll use the three Unforgivable curses from the Harry Potter series — Avada Kedavra, Crucio, and Imperio. What Is the Strategy Pattern? The Strategy Pattern is a design principle that allows you to switch between different algorithms or behaviors at runtime. It helps make your code flexible and adaptable by allowing you to plug in different strategies without changing the core logic of your application. This approach is useful in scenarios where you have different implementations for a specific task of functionality and want to make your system more adaptable to changes. It promotes a more modular code structure by separating the algorithmic details from the main logic of your application. Step 1: Implementing Strategy Picture yourself as a dark wizard who strives to master the power of Unforgivable curses with Spring. Our mission is to implement all three curses — Avada Kedavra, Crucio and Imperio. After that, we will switch between curses (strategies) in runtime. Let's start with our strategy interface: Java public interface CurseStrategy { String useCurse(); String curseName(); } In the next step, we need to implement all Unforgivable curses: Java @Component public class CruciatusCurseStrategy implements CurseStrategy { @Override public String useCurse() { return "Attack with Crucio!"; } @Override public String curseName() { return "Crucio"; } } @Component public class ImperiusCurseStrategy implements CurseStrategy { @Override public String useCurse() { return "Attack with Imperio!"; } @Override public String curseName() { return "Imperio"; } } @Component public class KillingCurseStrategy implements CurseStrategy { @Override public String useCurse() { return "Attack with Avada Kedavra!"; } @Override public String curseName() { return "Avada Kedavra"; } } Step 2: Inject Curses as List Spring brings a touch of magic that allows us to inject multiple implementations of an interface as a List so we can use it to inject strategies and switch between them. But let's first create the foundation: Wizard interface. Java public interface Wizard { String castCurse(String name); } And we can inject our curses (strategies) into the Wizard and filter the desired one. Java @Service public class DarkArtsWizard implements Wizard { private final List<CurseStrategy> curses; public DarkArtsListWizard(List<CurseStrategy> curses) { this.curses = curses; } @Override public String castCurse(String name) { return curses.stream() .filter(s -> name.equals(s.curseName())) .findFirst() .orElseThrow(UnsupportedCurseException::new) .useCurse(); } } UnsupportedCurseException is also created if the requested curse does not exist. Java public class UnsupportedCurseException extends RuntimeException { } And we can verify that curse casting is working: Java @SpringBootTest class DarkArtsWizardTest { @Autowired private DarkArtsWizard wizard; @Test public void castCurseCrucio() { assertEquals("Attack with Crucio!", wizard.castCurse("Crucio")); } @Test public void castCurseImperio() { assertEquals("Attack with Imperio!", wizard.castCurse("Imperio")); } @Test public void castCurseAvadaKedavra() { assertEquals("Attack with Avada Kedavra!", wizard.castCurse("Avada Kedavra")); } @Test public void castCurseExpelliarmus() { assertThrows(UnsupportedCurseException.class, () -> wizard.castCurse("Abrakadabra")); } } Another popular approach is to define the canUse method instead of curseName. This will return boolean and allows us to use more complex filtering like: Java public interface CurseStrategy { String useCurse(); boolean canUse(String name, String wizardType); } @Component public class CruciatusCurseStrategy implements CurseStrategy { @Override public String useCurse() { return "Attack with Crucio!"; } @Override public boolean canUse(String name, String wizardType) { return "Crucio".equals(name) && "Dark".equals(wizardType); } } @Service public class DarkArtstWizard implements Wizard { private final List<CurseStrategy> curses; public DarkArtsListWizard(List<CurseStrategy> curses) { this.curses = curses; } @Override public String castCurse(String name) { return curses.stream() .filter(s -> s.canUse(name, "Dark"))) .findFirst() .orElseThrow(UnsupportedCurseException::new) .useCurse(); } } Pros: Easy to implement. Cons: Runs through a loop every time, which can lead to slower execution times and increased processing overhead. Step 3: Inject Strategies as Map We can easily address the cons from the previous section. Spring lets us inject a Map with bean names and instances. It simplifies the code and improves its efficiency. Java @Service public class DarkArtsWizard implements Wizard { private final Map<String, CurseStrategy> curses; public DarkArtsMapWizard(Map<String, CurseStrategy> curses) { this.curses = curses; } @Override public String castCurse(String name) { CurseStrategy curse = curses.get(name); if (curse == null) { throw new UnsupportedCurseException(); } return curse.useCurse(); } } This approach has a downside: Spring injects the bean name as the key for the Map, so strategy names are the same as the bean names like cruciatusCurseStrategy. This dependency on Spring's internal bean names might cause problems if Spring's code or our class names change without notice. Let's check that we're still capable of casting those curses: Java @SpringBootTest class DarkArtsWizardTest { @Autowired private DarkArtsWizard wizard; @Test public void castCurseCrucio() { assertEquals("Attack with Crucio!", wizard.castCurse("cruciatusCurseStrategy")); } @Test public void castCurseImperio() { assertEquals("Attack with Imperio!", wizard.castCurse("imperiusCurseStrategy")); } @Test public void castCurseAvadaKedavra() { assertEquals("Attack with Avada Kedavra!", wizard.castCurse("killingCurseStrategy")); } @Test public void castCurseExpelliarmus() { assertThrows(UnsupportedCurseException.class, () -> wizard.castCurse("Crucio")); } } Pros: No loops. Cons: Dependency on bean names, which makes the code less maintainable and more prone to errors if names are changed or refactored. Step 4: Inject List and Convert to Map Cons of Map injection can be easily eliminated if we inject List and convert it to Map: Java @Service public class DarkArtsWizard implements Wizard { private final Map<String, CurseStrategy> curses; public DarkArtsMapWizard(List<CurseStrategy> curses) { this.curses = curses.stream() .collect(Collectors.toMap(CurseStrategy::curseName, Function.identity())); } @Override public String castCurse(String name) { CurseStrategy curse = curses.get(name); if (curse == null) { throw new UnsupportedCurseException(); } return curse.useCurse(); } } With this approach, we can move back to use curseName instead of Spring's bean names for Map keys (strategy names). Step 5: @Autowire in Interface Spring supports autowiring into methods. The simple example of autowiring into methods is through setter injection. This feature allows us to use @Autowired in a default method of an interface so we can register each CurseStrategy in the Wizard interface without needing to implement a registration method in every strategy implementation. Let's update the Wizard interface by adding a registerCurse method: Java public interface Wizard { String castCurse(String name); void registerCurse(String curseName, CurseStrategy curse) } This is the Wizard implementation: Java @Service public class DarkArtsWizard implements Wizard { private final Map<String, CurseStrategy> curses = new HashMap<>(); @Override public String castCurse(String name) { CurseStrategy curse = curses.get(name); if (curse == null) { throw new UnsupportedCurseException(); } return curse.useCurse(); } @Override public void registerCurse(String curseName, CurseStrategy curse) { curses.put(curseName, curse); } } Now, let's update the CurseStrategy interface by adding a method with the @Autowired annotation: Java public interface CurseStrategy { String useCurse(); String curseName(); @Autowired default void registerMe(Wizard wizard) { wizard.registerCurse(curseName(), this); } } At the moment of injecting dependencies, we register our curse into the Wizard. Pros: No loops, and no reliance on inner Spring bean names. Cons: No cons, pure dark magic. Conclusion In this article, we explored the Strategy pattern in the context of Spring. We assessed different strategy injection approaches and demonstrated an optimized solution using Spring's capabilities. The full source code for this article can be found on GitHub.

By Max Stepovyi

Safe Clones With Ansible

I started research for an article on how to add a honeytrap to a GitHub repo. The idea behind a honeypot weakness is that a hacker will follow through on it and make his/her presence known in the process. My plan was to place a GitHub personal access token in an Ansible vault protected by a weak password. Should an attacker crack the password and use the token to clone the private repository, a webhook should have triggered and mailed a notification that the honeypot repo has been cloned and the password cracked. Unfortunately, GitHub seems not to allow webhooks to be triggered after cloning, as is the case for some of its higher-level actions. This set me thinking that platforms as standalone systems are not designed with Dev(Sec)Ops integration in mind. DevOps engineers have to bite the bullet and always find ways to secure pipelines end-to-end. I, therefore, instead decided to investigate how to prevent code theft using tokens or private keys gained by nefarious means. Prevention Is Better Than Detection It is not best practice to have secret material on hard drives thinking that root-only access is sufficient security. Any system administrator or hacker that is elevated to root can view the secret in the open. They should, rather, be kept inside Hardware Security Modules (HSMs) or a secret manager, at the very least. Furthermore, tokens and private keys should never be passed in as command line arguments since they might be written to a log file. A way to solve this problem is to make use of a super-secret master key to initiate proceedings and finalize using short-lived lesser keys. This is similar to the problem of sharing the first key in applied cryptography. Once the first key has been agreed upon, successive transactions can be secured using session keys. It goes beyond saying that the first key has to be stored in Hardware Security Modules, and all operations against it have to happen inside an HSM. I decided to try out something similar when Ansible clones private Git repositories. Although I will illustrate at the hand of GitHub, I am pretty sure something similar can be set up for other Git platforms as well. First Key GitHub personal access tokens can be used to perform a wide range of actions on your GitHub account and its repositories. It authenticates and authorizes from both the command line and the GitHub API. It clearly can serve as the first key. Personal access tokens are created by clicking your avatar in the top right and selecting Settings: A left nav panel should appear from where you select Developer settings: The menu for personal access tokens will display where you can create the token: I created a classic token and gave it the following scopes/permissions: repo, admin:public_key, user, and admin:gpg_key. Take care to store the token in a reputable secret manager from where it can be copied and pasted when the Ansible play asks for it when it starts. This secret manager should clear the copy buffer after a few seconds to prevent attacks utilizing attention diversion. vars_prompt: - name: github_token prompt: "Enter your github personal access token?" private: true Establishing the Session GitHub deployment keys give access to private repositories. They can be created by an API call or from the repo's top menu by clicking on Settings: With the personal access token as the first key, a deployment key can finish the operation as the session key. Specifically, Ansible authenticates itself using the token, creates the deployment key, authorizes the clone, and deletes it immediately afterward. The code from my previous post relied on adding Git URLs that contain the tokens to the Ansible vault. This has now been improved to use temporary keys as envisioned in this post. An Ansible role provided by Asif Mahmud has been amended for this as can be seen in the usual GitHub repo. The critical snippets are: - name: Add SSH public key to GitHub account ansible.builtin.uri: url: "https://api.{{ git_server_fqdn }/repos/{{ github_account_id }/{{ repo }/keys" validate_certs: yes method: POST force_basic_auth: true body: title: "{{ key_title }" key: "{{ key_content.stdout }" read_only: true body_format: json headers: Accept: application/vnd.github+json X-GitHub-Api-Version: 2022-11-28 Authorization: "Bearer {{ github_access_token }" status_code: - 201 - 422 register: create_result The GitHub API is used to add the deploy key to the private repository. Note the use of the access token typed in at the start of play to authenticate and authorize the request. - name: Clone the repository shell: | GIT_SSH_COMMAND="ssh -i {{ key_path } -v -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" {{ git_executable } clone git@{{ git_server_fqdn }:{{ github_account_id }/{{ repo }.git {{ clone_dest } - name: Switch branch shell: "{{ git_executable } checkout {{ branch }" args: chdir: "{{ clone_dest }" The repo is cloned, followed by a switch to the required branch. - name: Delete SSH public key ansible.builtin.uri: url: "https://api.{{ git_server_fqdn }/repos/{{ github_account_id }/{{ repo }/keys/{{ create_result.json.id }" validate_certs: yes method: DELETE force_basic_auth: true headers: Accept: application/vnd.github+json X-GitHub-Api-Version: 2022-11-28 Authorization: "Bearer {{ github_access_token }" status_code: - 204 Deletion of the deployment key happens directly after the clone and switch, again via the API. Conclusion The short life of the deployment key enhances the security of the DevOps pipeline tremendously. Only the token has to be kept secured at all times as is the case for any first key. Ideally, you should integrate Ansible with a compatible HSM platform. I thank Asif Mahmud for using their code to illustrate the concept of using temporary session keys when cloning private Git repositories.

By Jan-Rudolph Bührmann

Flexible Data Generation With Datafaker Gen

Introduction to Datafaker Datafaker is a modern framework that enables JVM programmers to efficiently generate fake data for their projects using over 200 data providers allowing for quick setup and usage. Custom providers could be written when you need some domain-specific data. In addition to providers, the generated data can be exported to popular formats like CSV, JSON, SQL, XML, and YAML. For a good introduction to the basic features, see "Datafaker: An Alternative to Using Production Data." Datafaker offers many features, such as working with sequences and collections and generating custom objects based on schemas (see "Datafaker 2.0"). Bulk Data Generation In software development and testing, the need to frequently generate data for various purposes arises, whether it's to conduct non-functional tests or to simulate burst loads. Let's consider a straightforward scenario when we have the task of generating 10,000 messages in JSON format to be sent to RabbitMQ. From my perspective, these options are worth considering: Developing your own tool: One option is to write a custom application from scratch to generate these records(messages). If the generated data needs to be more realistic, it makes sense to use Datafaker or JavaFaker. Using specific tools: Alternatively, we could select specific tools designed for particular databases or message brokers. For example, tools like voluble for Kafka provide specialized functionalities for generating and publishing messages to Kafka topics; or a more modern tool like ShadowTraffic, which is currently under development and directed towards a container-based approach, which may not always be necessary. Datafaker Gen: Finally, we have the option to use Datafaker Gen, which I want to consider in the current article. Datafaker Gen Overview Datafaker Gen offers a command-line generator based on the Datafaker library which allows for the continuous generation of data in various formats and integration with different storage systems, message brokers, and backend services. Since this tool uses Datafaker, there may be a possibility that the data is realistic. Configuration of the scheme, format type, and sink can be done without rebuilding the project. Datafake Gen consists of the following main components that can be configured: 1. Schema Definition Users can define the schema for their records in the config.yaml file. The schema specifies the field definitions of the record based on the Datafaker provider. It also allows for the definition of embedded fields. YAML default_locale: en-EN fields: - name: lastname generators: [ Name#lastName ] - name: firstname generators: [ Name#firstName ] 2. Format Datafake Gen allows users to specify the format in which records will be generated. Currently, there are basic implementations for CSV, JSON, SQL, XML, and YAML formats. Additionally, formats can be extended with custom implementations. The configuration for formats is specified in the output.yaml file. YAML formats: csv: quote: "@" separator: $$$$$$$ json: formattedAs: "[]" yaml: xml: pretty: true 3. Sink The sink component determines where the generated data will be stored or published. The basic implementation includes command-line output and text file sinks. Additionally, sinks can be extended with custom implementations such as RabbitMQ, as demonstrated in the current article. The configuration for sinks is specified in the output.yaml file. YAML sinks: rabbitmq: batchsize: 1 # when 1 message contains 1 document, when >1 message contains a batch of documents host: localhost port: 5672 username: guest password: guest exchange: test.direct.exchange routingkey: products.key Extensibility via Java SPI Datafake Gen uses the Java SPI (Service Provider Interface) to make it easy to add new formats or sinks. This extensibility allows for customization of Datafake Gen according to specific requirements. How To Add a New Sink in Datafake Gen Before adding a new sink, you may want to check if it already exists in the datafaker-gen-examples repository. If it does not exist, you can refer to examples on how to add a new sink. When it comes to extending Datafake Gen with new sink implementations, developers have two primary options to consider: By using this parent project, developers can implement sink interfaces for their sink extensions, similar to those available in the datafaker-gen-examples repository. Include dependencies from the Maven repository to access the required interfaces. For this approach, Datafake Gen should be built and exist in the local Maven repository. This approach provides flexibility in project structure and requirements. 1. Implementing RabbitMQ Sink To add a new RabbitMQ sink, one simply needs to implement the net.datafaker.datafaker_gen.sink.Sink interface. This interface contains two methods: getName - This method defines the sink name. run - This method triggers the generation of records and then sends or saves all the generated records to the specified destination. The method parameters include the configuration specific to this sink retrieved from the output.yaml file as well as the data generation function and the desired number of lines to be generated. Java import net.datafaker.datafaker_gen.sink.Sink; public class RabbitMqSink implements Sink { @Override public String getName() { return "rabbitmq"; } @Override public void run(Map<String, ?> config, Function<Integer, ?> function, int numberOfLines) { // Read output configuration ... int numberOfLinesToPrint = numberOfLines; String host = (String) config.get("host"); // Generate lines String lines = (String) function.apply(numberOfLinesToPrint); // Sending or saving results to the expected resource // In this case, this is connecting to RebbitMQ and sending messages. ConnectionFactory factory = getConnectionFactory(host, port, username, password); try (Connection connection = factory.newConnection()) { Channel channel = connection.createChannel(); JsonArray jsonArray = JsonParser.parseString(lines).getAsJsonArray(); jsonArray.forEach(jsonElement -> { try { channel.basicPublish(exchange, routingKey, null, jsonElement.toString().getBytes()); } catch (Exception e) { throw new RuntimeException(e); } }); } catch (Exception e) { throw new RuntimeException(e); } } } 2. Adding Configuration for the New RabbitMQ Sink As previously mentioned, the configuration for sinks or formats can be added to the output.yaml file. The specific fields may vary depending on your custom sink. Below is an example configuration for a RabbitMQ sink: YAML sinks: rabbitmq: batchsize: 1 # when 1 message contains 1 document, when >1 message contains a batch of documents host: localhost port: 5672 username: guest password: guest exchange: test.direct.exchange routingkey: products.key 3. Adding Custom Sink via SPI Adding a custom sink via SPI (Service Provider Interface) involves including the provider configuration in the ./resources/META-INF/services/net.datafaker.datafaker_gen.sink.Sink file. This file contains paths to the sink implementations: Properties files net.datafaker.datafaker_gen.sink.RabbitMqSink These are all 3 simple steps on how to expand Datafake Gen. In this example, we are not providing a complete implementation of the sink, as well as how to use additional libraries. To see the complete implementations, you can refer to the datafaker-gen-rabbitmq module in the example repository. How To Run Step 1 Build a JAR file based on the new implementation: Shell ./mvnw clean verify Step 2 Define the schema for records in the config.yaml file and place this file in the appropriate location where the generator should run. Additionally, define the sinks and formats in the output.yaml file, as demonstrated previously. Step 3 Datafake Gen can be executed through two options: Use bash script from the bin folder in the parent project: Shell # Format json, number of lines 10000 and new RabbitMq Sink bin/datafaker_gen -f json -n 10000 -sink rabbitmq 2. Execute the JAR directly, like this: Shell java -cp [path_to_jar] net.datafaker.datafaker_gen.DatafakerGen -f json -n 10000 -sink rabbitmq How Fast Is It? The test was done based on the scheme described above, which means that one document consists of two fields. Documents are recorded one by one in the RabbitMQ queue in JSON format. The table below shows the speed for 10,000, 100,000, and 1M records on my local machine: Records Time 10000 401 ms 100000 11613ms 1000000 121601ms Conclusion The Datafake Gen tool enables the creation of flexible and fast data generators for various types of destinations. Built on Datafaker, it facilitates realistic data generation. Developers can easily configure the content of records, formats, and sinks to suit their needs. As a simple Java application, it can be deployed anywhere you want, whether it's in Docker or on-premise machines. The full source code is available here. I would like to thank Sergey Nuyanzin for reviewing this article. Thank you for reading, and I am glad to be of help.

By Roman Rybak

Unleashing the Power of Git Bisect

We don't usually think of Git as a debugging tool. Surprisingly, Git shines not just as a version control system, but also as a potent debugging ally when dealing with the tricky matter of regressions. The Essence of Debugging with Git Before we tap into the advanced aspects of git bisect, it's essential to understand its foundational premise. Git is known for tracking changes and managing code history, but the git bisect tool is a hidden gem for regression detection. Regressions are distinct from generic bugs. They signify a backward step in functionality—where something that once worked flawlessly now fails. Pinpointing the exact change causing a regression can be akin to finding a needle in a haystack, particularly in extensive codebases with long commit histories. Traditionally, developers would employ a manual, binary search strategy—checking out different versions, testing them, and narrowing down the search scope. This method, while effective, is painstakingly slow and error-prone. Git bisect automates this search, transforming what used to be a marathon into a swift sprint. Setting the Stage for Debugging Imagine you're working on a project, and recent reports indicate a newly introduced bug affecting the functionality of a feature that previously worked flawlessly. You suspect a regression but are unsure which commit introduced the issue among the hundreds made since the last stable version. Initiating Bisect Mode To start, you'll enter bisect mode in your terminal within the project's Git repository: git bisect start This command signals Git to prepare for the bisect process. Marking the Known Good Revision Next, you identify a commit where the feature functioned correctly, often a commit tagged with a release number or dated before the issue was reported. Mark this commit as "good": git bisect good a1b2c3d Here, a1b2c3d represents the hash of the known good commit. Marking the Known Bad Revision Similarly, you mark the current version or a specific commit where the bug is present as "bad": git bisect bad z9y8x7w z9y8x7w is the hash of the bad commit, typically the latest commit in the repository where the issue is observed. Bisecting To Find the Culprit Upon marking the good and bad commits, Git automatically jumps to a commit roughly in the middle of the two and waits for you to test this revision. After testing (manually or with a script), you inform Git of the result: If the issue is present: git bisect bad If the issue is not present: git bisect good Git then continues to narrow down the range, selecting a new commit to test based on your feedback. Expected Output After several iterations, Git will isolate the problematic commit, displaying a message similar to: Bisecting: 0 revisions left to test after this (roughly 3 steps) [abcdef1234567890] Commit message of the problematic commit Reset and Analysis Once the offending commit is identified, you conclude the bisect session to return your repository to its initial state: git bisect reset Notice that bisect isn't linear. Bisect doesn't scan through the revisions in a sequential manner. Based on the good and bad markers, Git automatically selects a commit approximately in the middle of the range for testing (e.g., commit #6 in the following diagram). This is where the non-linear, binary search pattern starts, as Git divides the search space in half instead of examining each commit sequentially. This means fewer revisions get scanned and the process is faster. Advanced Usage and Tips The magic of git bisect lies in its ability to automate the binary search algorithm within your repository, systematically halving the search space until the rogue commit is identified. Git bisect offers a powerful avenue for debugging, especially for identifying regressions in a complex codebase. To elevate your use of this tool, consider delving into more advanced techniques and strategies. These tips not only enhance your debugging efficiency but also provide practical solutions to common challenges encountered during the bisecting process. Script Automation for Precision and Efficiency Automating the bisect process with a script is a game-changer, significantly reducing manual effort and minimizing the risk of human error. This script should ideally perform a quick test that directly targets the regression, returning an exit code based on the test's outcome. Example Imagine you're debugging a regression where a web application's login feature breaks. You could write a script that attempts to log in using a test account and checks if the login succeeds. The script might look something like this in a simplified form: #!/bin/bash # Attempt to log in and check for success if curl -s http://yourapplication/login -d "username=test&password=test" | grep -q "Welcome"; then exit 0 # Login succeeded, mark this commit as good else exit 1 # Login failed, mark this commit as bad fi By passing this script to git bisect run, Git automatically executes it at each step of the bisect process, effectively automating the regression hunt. Handling Flaky Tests With Strategy Flaky tests, which sometimes pass and sometimes fail under the same conditions, can complicate the bisecting process. To mitigate this, your automation script can include logic to rerun tests a certain number of times or to apply more sophisticated checks to differentiate between a true regression and a flaky failure. Example Suppose you have a test that's known to be flaky. You could adjust your script to run the test multiple times, considering the commit "bad" only if the test fails consistently: #!/bin/bash # Run the flaky test three times success_count=0 for i in {1..3}; do if ./run_flaky_test.sh; then ((success_count++)) fi done # If the test succeeds twice or more, consider it a pass if [ "$success_count" -ge 2 ]; then exit 0 else exit 1 fi This approach reduces the chances that a flaky test will lead to incorrect bisect results. Skipping Commits With Care Sometimes, you'll encounter commits that cannot be tested due to reasons like broken builds or incomplete features. git bisect skip is invaluable here, allowing you to bypass these commits. However, use this command judiciously to ensure it doesn't obscure the true source of the regression. Example If you know that commits related to database migrations temporarily break the application, you can skip testing those commits. During the bisect session, when Git lands on a commit you wish to skip, you would manually issue: git bisect skip This tells Git to exclude the current commit from the search and adjust its calculations accordingly. It's essential to only skip commits when absolutely necessary, as skipping too many can interfere with the accuracy of the bisect process. These advanced strategies enhance the utility of git bisect in your debugging toolkit. By automating the regression testing process, handling flaky tests intelligently, and knowing when to skip untestable commits, you can make the most out of git bisect for efficient and accurate debugging. Remember, the goal is not just to find the commit where the regression was introduced but to do so in the most time-efficient manner possible. With these tips and examples, you're well-equipped to tackle even the most elusive regressions in your projects. Unraveling a Regression Mystery In the past, we got to use git bisect when working on a large-scale web application. After a routine update, users began reporting a critical feature failure: the application's payment gateway stopped processing transactions correctly, leading to a significant business impact. We knew the feature worked in the last release but had no idea which of the hundreds of recent commits introduced the bug. Manually testing each commit was out of the question due to time constraints and the complexity of the setup required for each test. Enter git bisect. The team started by identifying a "good" commit where the payment gateway functioned correctly and a "bad" commit where the issue was observed. We then crafted a simple test script that would simulate a transaction and check if it succeeded. By running git bisect start, followed by marking the known good and bad commits, and executing the script with git bisect run, we set off on an automated process that identified the faulty commit. Git efficiently navigated through the commits, automatically running the test script on each step. In a matter of minutes, git bisect pinpointed the culprit: a seemingly innocuous change to the transaction logging mechanism that inadvertently broke the payment processing logic. Armed with this knowledge, we reverted the problematic change, restoring the payment gateway's functionality and averting further business disruption. This experience not only resolved the immediate issue but also transformed our approach to debugging, making git bisect a go-to tool in our arsenal. Final Word The story of the payment gateway regression is just one example of how git bisect can be a lifesaver in the complex world of software development. By automating the tedious process of regression hunting, git bisect not only saves precious time but also brings a high degree of precision to the debugging process. As developers continue to navigate the challenges of maintaining and improving complex codebases, tools like git bisect underscore the importance of leveraging technology to work smarter, not harder. Whether you're dealing with a mysterious regression or simply want to refine your debugging strategies, git bisect offers a powerful, yet underappreciated, solution to swiftly and accurately identify the source of regressions. Remember, the next time you're faced with a regression, git bisect might just be the debugging partner you need to uncover the truth hidden within your commit history. Video

By Shai Almog

CORE