DZone Research Report: A look at our developer audience, their tech stacks, and topics and tools they're exploring.
Getting Started With Large Language Models: A guide for both novices and seasoned practitioners to unlock the power of language models.
Open source refers to non-proprietary software that allows anyone to modify, enhance, or view the source code behind it. Our resources enable programmers to work or collaborate on projects created by different teams, companies, and organizations.
IntelliJ and Java Spring Microservices: Productivity Tips With GitHub Copilot
Getting Started With Spring AI and PostgreSQL PGVector
The software development landscape is rapidly evolving. New tools, technologies, and trends are always bubbling to the top of our workflows and conversations. One of those paradigm shifts that has become more pronounced in recent years is the adoption of microservices architecture by countless organizations. Managing microservices communication has been a sticky challenge for many developers. As a microservices developer, I want to focus my efforts on the core business problems and functionality that my microservices need to achieve. I’d prefer to offload the inter-service communication concerns—just like I do with authentication or API security. So, that brings me to the KubeMQ Control Center (KCC). It’s a service for managing microservices communication that’s quick to set up and designed with an easy-to-use UI. In this article, I wanted to unpack some of the functionality I explored as I tested it in a real-world scenario. Setting the Scene Microservices communication presents a complex challenge, akin to orchestrating a symphony with numerous distinct instruments. It demands precision and a deep understanding of the underlying architecture. Fortunately, KCC—with its no-code setup and Kubernetes-native integration—aims to abstract away this complexity. Let's explore how it simplifies microservices messaging. Initial Setup and Deployment Deploy KubeMQ Using Docker The journey with KCC starts with a Docker-based deployment. This process is straightforward: Shell $ docker run -d \ -p 8080:8080 \ -p 50000:50000 \ -p 9090:9090 \ -e KUBEMQ_TOKEN=(add token here) kubemq/kubemq This command sets up KubeMQ, aligning the necessary ports and establishing secure access. Send a "Hello World" Message After deployment, you can access the KubeMQ dashboard in your browser at http://localhost:8080/. Here, you have a clean, intuitive UI to help you manage your microservices. We can send a “Hello World” message to test the waters. In the Dashboard, click Send Message and select Queues. We set a channel name (q1) and enter "hello world!" in the body. Then, we click Send. Just like that, we successfully created our first message! And it’s only been one minute since we deployed KubeMQ and started using KCC. Pulling a Message Retrieving messages is a critical aspect of any messaging platform. From the Dashboard, select your channel to open the Queues page. Under the Pull tab, click Pull to retrieve the message that you just sent. The process is pretty smooth and efficient. We can review the message details for insights into its delivery and content. Send “Hello World” With Code Moving beyond the UI, we can send a “Hello world” message programmatically too. For example, here’s how you would send a message using C#. KCC integrates with most of the popular programming languages, which is essential for diverse development environments. Here are the supported languages and links to code samples and SDKs: C# and .NET Java Go Node.js Python Deploying KubeMQ in Kubernetes Transitioning to Kubernetes with KCC is pretty seamless, too. KubeMQ is shooting to design with scalability and the developer in mind. Here’s a quick guide to getting started. Download KCC Download KCC from KubeMQ’s account area. They offer a 30-day free trial so you can do a comprehensive evaluation. Unpack the Zip File Shell $ unzip kcc_mac_apple.zip -d /kubemq/kcc Launch the Application Shell $ ./kcc The above step integrates you into the KubeMQ ecosystem, which is optimized for Kubernetes. Add a KubeMQ Cluster Adding a KubeMQ cluster is crucial for scaling and managing your microservices architecture effectively. Monitor Cluster Status The dashboard provides an overview of your KubeMQ components, essential for real-time system monitoring. Explore Bridges, Targets, and Sources KCC has advanced features like Bridges, Targets, and Sources, which serve as different types of connectors between KubeMQ clusters, external messaging systems, and external cloud services. These tools will come in handy when you have complex data flows and system integrations, as many microservices architectures do. Conclusion That wraps up our journey through KubeMQ's Control Center. Dealing with the complexities of microservice communication can be a burden, taking the developer away from core business development. Developers can offload that burden to KCC. With its intuitive UI and suite of features, KCC helps developers be more efficient as they build their applications on microservice architectures. Of course, we’ve only scratched the surface here. Unlocking the true potential of any tool requires deeper exploration and continued use. For that, you can check out KubeMQ’s docs site. Or you can build on what we’ve shown above, continuing to play around on your own. With the right tools in your toolbox, you’ll quickly be up and running with a fleet of smoothly communicating microservices! Have a really great day!
In the ever-evolving world of software development, staying up-to-date with the latest tools and frameworks is crucial. One such framework that has been making waves in NoSQL databases is Eclipse JNoSQL. This article will deeply dive into the latest release, version 1.1.0, and explore its compatibility with Oracle NoSQL. Understanding Eclipse JNoSQL Eclipse JNoSQL is a Java-based framework that facilitates seamless integration between Java applications and NoSQL databases. It leverages Java enterprise standards, specifically Jakarta NoSQL and Jakarta Data, to simplify working with NoSQL databases. The primary objective of this framework is to reduce the cognitive load associated with using NoSQL databases while harnessing the full power of Jakarta EE and Eclipse MicroProfile. With Eclipse JNoSQL, developers can easily integrate NoSQL databases into their projects using Widfly, Payara, Quarkus, or other Java platforms. This framework bridges the Java application layer and various NoSQL databases, making it easier to work with these databases without diving deep into their intricacies. What’s New in Eclipse JNoSQL Version 1.1.0 The latest version of Eclipse JNoSQL, 1.1.0, comes with several enhancements and upgrades to make working with NoSQL databases smoother. Let’s explore some of the notable changes: Jakarta Data Version Upgrade In this release, one of the significant updates in Eclipse JNoSQL version 1.1.0 is upgrading the Jakarta Data version to M2. To better understand the importance of this upgrade, let’s delve into what Jakarta Data is and how it plays a crucial role in simplifying data access across various database types. Jakarta Data Jakarta Data is a specification that provides a unified API for simplified data access across different types of databases, including both relational and NoSQL databases. This specification is part of the broader Jakarta EE ecosystem, which aims to offer a standardized and consistent programming model for enterprise Java applications. Jakarta Data empowers Java developers to access diverse data repositories straightforwardly and consistently. It achieves this by introducing concepts like Repositories and custom query methods, making data retrieval and manipulation more intuitive and developer-friendly. One of the key features of Jakarta Data is its flexibility in allowing developers to compose custom query methods on Repository interfaces. This flexibility means that developers can craft specific queries tailored to their application’s needs without manually writing complex SQL or NoSQL queries. This abstraction simplifies the interaction with databases, reducing the development effort required to access and manipulate data. The Goal of Jakarta Data The primary goal of Jakarta Data is to provide a familiar and consistent programming model for data access while preserving the unique characteristics and strengths of the underlying data stores. In other words, Jakarta Data aims to abstract away the intricacies of interacting with different types of databases, allowing developers to focus on their application logic rather than the specifics of each database system. By upgrading Eclipse JNoSQL to use Jakarta Data version M2, the framework aligns itself with the latest Jakarta EE standards and leverages the newest features and improvements introduced by Jakarta Data. It ensures that developers using Eclipse JNoSQL can benefit from the enhanced capabilities and ease of data access that Jakarta Data brings. Enhanced Support for Inheritance One of the standout features of Eclipse JNoSQL is its support for Object-Document Mapping (ODM). It allows developers to work with NoSQL databases in an object-oriented manner, similar to how they work with traditional relational databases. In version 1.1.0, the framework has enhanced its support for inheritance, making it even more versatile when dealing with complex data models. Oracle NoSQL Database: A Brief Overview Before we conclude, let’s take a moment to understand the database we’ll work with – Oracle NoSQL Database. Oracle NoSQL Database is a distributed key-value and document database developed by Oracle Corporation. It offers robust transactional capabilities for data manipulation, horizontal scalability to handle large workloads, and simplified administration and monitoring. It is particularly well-suited for applications that require low-latency access to data, flexible data models, and elastic scaling to accommodate dynamic workloads. The Oracle NoSQL Database Cloud Service provides a managed cloud platform for deploying applications that require the capabilities of Oracle NoSQL Database, making it even more accessible and convenient for developers. Show Me the Code We’ll create a simple demo application to showcase the new features of Eclipse JNoSQL 1.1.0 and its compatibility with Oracle NoSQL. This demo will help you understand how to set up the environment, configure dependencies, and interact with Oracle NoSQL using Eclipse JNoSQL. Prerequisites Before we begin, ensure you have an Oracle NoSQL instance running. You can use either the “primes” or “cloud” flavor. For local development, you can run Oracle NoSQL in a Docker container with the following command: Shell docker run -d --name oracle-instance -p 8080:8080 ghcr.io/oracle/nosql:latest-ce Setting up the Project We’ll create a Java SE project using the Maven Quickstart Archetype to keep things simple. It will give us a basic project structure to work with. Project Dependencies For Eclipse JNoSQL to work with Oracle NoSQL, we must include specific dependencies. Additionally, we’ll need Jakarta CDI, Eclipse MicroProfile, Jakarta JSONP, and the Eclipse JNoSQL driver for Oracle NoSQL. We’ll also include “datafaker” for generating sample data. Here are the project dependencies: XML <dependencies> <dependency> <groupId>org.eclipse.jnosql.databases</groupId> <artifactId>jnosql-oracle-nosql</artifactId> <version>1.1.0</version> </dependency> <dependency> <groupId>net.datafaker</groupId> <artifactId>datafaker</artifactId> <version>2.0.2</version> </dependency> </dependencies> Configuration Eclipse JNoSQL relies on configuration properties to establish a connection to the database. This is where the flexibility of Eclipse MicroProfile Config shines. You can conveniently define these properties in your application.properties or application.yml file, allowing for easy customization of your database settings. Remarkably, Eclipse JNoSQL caters to key-value and document databases, a testament to its adaptability. Despite Oracle NoSQL's support for both data models, the seamless integration and configuration options provided by Eclipse JNoSQL ensure a smooth experience, empowering developers to effortlessly switch between these database paradigms to meet their specific application needs. Properties files # Oracle NoSQL Configuration jnosql.keyvalue.database=cars jnosql.document.database=cars jnosql.oracle.nosql.host=http://localhost:8080 Creating a Model for Car Data in Eclipse JNoSQL After setting up the database configuration, the next step is to define the data model to be stored in it. The process of defining a model is consistent across all databases in Eclipse JNoSQL. For instance, in this example, we will form a data model for cars using a basic Car class. Java @Entity public class Car { @Id private String id; @Column private String vin; @Column private String model; @Column private String make; @Column private String transmission; // Constructors, getters, and setters @Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()) { return false; } Car car = (Car) o; return Objects.equals(id, car.id); } @Override public int hashCode() { return Objects.hashCode(id); } @Override public String toString() { return "Car{" + "id='" + id + '\'' + ", vin='" + vin + '\'' + ", model='" + model + '\'' + ", make='" + make + '\'' + ", transmission='" + transmission + '\'' + '}'; } // Factory method to create a Car instance public static Car of(Faker faker) { Vehicle vehicle = faker.vehicle(); Car car = new Car(); car.id = UUID.randomUUID().toString(); car.vin = vehicle.vin(); car.model = vehicle.model(); car.make = vehicle.make(); car.transmission = vehicle.transmission(); return car; } } In the Car class, we use annotations to define how the class and its fields should be persisted in the database: @Entity: This annotation marks the class as an entity to be stored in the database. @Id: Indicates that the id field will serve as the unique identifier for each Car entity. @Column: Annotations like @Column specify that a field should be persisted as a column in the database. In this case, we annotate each field we want to store in the database. Additionally, we provide methods for getters, setters, equals, hashCode, and toString for better encapsulation and compatibility with database operations. We also include a factory method Car.of(Faker faker), to generate random car data using the “datafaker” library. This data model encapsulates the structure of a car entity, making it easy to persist and retrieve car-related information in your Oracle NoSQL database using Eclipse JNoSQL. Simplifying Database Operations With Jakarta Data Annotations In Eclipse JNoSQL’s latest version, 1.1.0, developers can harness the power of Jakarta Data annotations to streamline and clarify database operations. These annotations allow you to express your intentions in a more business-centric way, making your code more expressive and closely aligned with the actions you want to perform on the database. Here are some of the Jakarta Data annotations introduced in this version: @Insert: Effortless Data Insertion The @Insert annotation signifies the intent to perform an insertion operation in the database. When applied to a method, it indicates that it aims to insert data into the database. This annotation provides clarity and conciseness, making it evident that the method is responsible for adding new records. @Update: Seamless Data Update The @Update annotation is used to signify an update operation. It is beneficial when you want to modify existing records in the database. Eclipse JNoSQL will check if the information to be updated is already present and proceed accordingly. This annotation simplifies the code by explicitly stating its purpose. @Delete: Hassle-Free Data Removal When you want to delete data from the database, the @Delete annotation comes into play. It communicates that the method’s primary function is to remove information. Like the other annotations, it enhances code readability by conveying the intended action. @Save: Combining Insert and Update The @Save annotation serves a dual purpose. It behaves like the save method in BasicRepository but with added intelligence. It checks if the information is already in the database, and if so, it updates it; otherwise, it inserts new data. This annotation provides a convenient way to handle insertion and updating without needing separate methods. With these Jakarta Data annotations, you can express database operations in a more intuitive and business-centric language. In the context of a car-centric application, such as managing a garage or car collection, you can utilize these annotations to define operations like parking a car and unparking it: Java @Repository public interface Garage extends DataRepository<Car, String> { @Save Car parking(Car car); @Delete void unpark(Car car); Page<Car> findByTransmission(String transmission, Pageable page); } In this example, the @Save annotation is used for parking a car, indicating that this method handles both inserting new cars into the “garage” and updating existing ones. The @Delete annotation is employed for unparking, making it clear that this method is responsible for removing cars from the “garage.” These annotations simplify database operations and enhance code clarity and maintainability by aligning your code with your business terminology and intentions. Executing Database Operations With Oracle NoSQL Now that our entity and repository are set up let’s create classes to execute the application. These classes will initialize a CDI container to inject the necessary template classes for interacting with the Oracle NoSQL database. Interacting With Document Database As a first step, we’ll interact with the document database. We’ll inject the DocumentTemplate interface to perform various operations: Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { DocumentTemplate template = container.select(DocumentTemplate.class).get(); // Insert 10 random cars into the database for (int index = 0; index < 10; index++) { Car car = Car.of(faker); template.insert(car); } // Retrieve and print all cars template.select(Car.class).stream().toList().forEach(System.out::println); // Retrieve and print cars with Automatic transmission, ordered by model (descending) template.select(Car.class).where("transmission").eq("Automatic").orderBy("model").desc() .stream().forEach(System.out::println); // Retrieve and print cars with CVT transmission, ordered by make (descending) template.select(Car.class).where("transmission").eq("CVT").orderBy("make").desc() .stream().forEach(System.out::println); } System.exit(0); } In this code, we use the DocumentTemplate to insert random cars into the document database, retrieve and print all cars, and execute specific queries based on transmission type and ordering. Interacting With Key-Value Database Oracle NoSQL also supports a key-value database, and we can interact with it as follows: Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { KeyValueTemplate template = container.select(KeyValueTemplate.class).get(); // Create a random car and put it in the key-value database Car car = Car.of(faker); template.put(car); // Retrieve and print the car based on its unique ID System.out.println("The query result: " + template.get(car.id(), Car.class)); // Delete the car from the key-value database template.delete(car.id()); // Attempt to retrieve the deleted car (will return null) System.out.println("The query result: " + template.get(car.id(), Car.class)); } System.exit(0); } In this code, we utilize the KeyValueTemplate to put a randomly generated car into the key-value database, retrieve it by its unique ID, delete it, and attempt to retrieve it again (resulting in null since it’s been deleted). These examples demonstrate how to execute database operations seamlessly with Oracle NoSQL, whether you’re working with a document-oriented or key-value database model, using Eclipse JNoSQL’s template classes. In this final sampling execution, we will demonstrate how to interact with the repository using our custom repository interface. This approach simplifies database operations and makes them more intuitive, allowing you to work with your custom-defined terminology and actions. Java public static void main(String[] args) { Faker faker = new Faker(); try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Garage repository = container.select(Garage.class, DatabaseQualifier.ofDocument()).get(); // Parking 10 random cars in the repository for (int index = 0; index < 10; index++) { Car car = Car.of(faker); repository.parking(car); } // Park a car and then unpark it Car car = Car.of(faker); repository.parking(car); repository.unpark(car); // Retrieve the first page of cars with CVT transmission, ordered by model (descending) Pageable page = Pageable.ofPage(1).size(3).sortBy(Sort.desc("model")); Page<Car> page1 = repository.findByTransmission("CVT", page); System.out.println("The first page"); page1.forEach(System.out::println); // Retrieve the second page of cars with CVT transmission System.out.println("The second page"); Pageable secondPage = page.next(); Page<Car> page2 = repository.findByTransmission("CVT", secondPage); page2.forEach(System.out::println); System.out.println("The query result: "); } System.exit(0); } We create a Garage instance through the custom repository interface in this code. We then demonstrate various operations such as parking and unparking cars and querying for cars with specific transmission types, sorted by model and paginated. You can express database operations in a business-centric language by utilizing the repository interface with custom annotations like @Save and @Delete. This approach enhances code clarity and aligns with your domain-specific terminology, providing a more intuitive and developer-friendly way to interact with the database. Conclusion Eclipse JNoSQL 1.1.0, with its support for Oracle NoSQL databases, simplifies and streamlines the interaction between Java applications and NoSQL data stores. With the introduction of Jakarta Data annotations and custom repositories, developers can express database operations in a more business-centric language, making code more intuitive and easier to maintain. This article has covered the critical aspects of Eclipse JNoSQL’s interaction with Oracle NoSQL, including setting up configurations, creating data models, and executing various database operations. Whether you are working with document-oriented or key-value databases, Eclipse JNoSQL provides the necessary tools and abstractions to make NoSQL data access a breeze. To dive deeper into the capabilities of Eclipse JNoSQL and explore more code samples, check out the official repository. There, you will find a wealth of information, examples, and resources to help you leverage the power of Eclipse JNoSQL in your Java applications. Eclipse JNoSQL empowers developers to harness the flexibility and scalability of NoSQL databases while adhering to Java enterprise standards, making it a valuable tool for modern application development.
When we think of debugging, we think of breakpoints in IDEs, stepping over, inspecting variables, etc. However, there are instances where stepping outside the conventional confines of an IDE becomes essential to track and resolve complex issues. This is where tools like DTrace come into play, offering a more nuanced and powerful approach to debugging than traditional methods. This blog post delves into the intricacies of DTrace, an innovative tool that has reshaped the landscape of debugging and system analysis. DTrace Overview DTrace was first introduced by Sun Microsystems in 2004, DTrace quickly garnered attention for its groundbreaking approach to dynamic system tracing. Originally developed for Solaris, it has since been ported to various platforms, including MacOS, Windows, and Linux. DTrace stands out as a dynamic tracing framework that enables deep inspection of live systems – from operating systems to running applications. Its capacity to provide real-time insights into system and application behavior without significant performance degradation marks it as a revolutionary tool in the domain of system diagnostics and debugging. Understanding DTrace’s Capabilities DTrace, short for Dynamic Tracing, is a comprehensive toolkit for real-time system monitoring and debugging, offering an array of capabilities that span across different levels of system operation. Its versatility lies in its ability to provide insights into both high-level system performance and detailed process-level activities. System Monitoring and Analysis At its core, DTrace excels in monitoring various system-level operations. It can trace system calls, file system activities, and network operations. This enables developers and system administrators to observe the interactions between the operating system and the applications running on it. For instance, DTrace can identify which files a process accesses, monitor network requests, and even trace system calls to provide a detailed view of what's happening within the system. Process and Performance Analysis Beyond system-level monitoring, DTrace is particularly adept at dissecting individual processes. It can provide detailed information about process execution, including CPU and memory usage, helping to pinpoint performance bottlenecks or memory leaks. This granular level of detail is invaluable for performance tuning and debugging complex software issues. Customizability and Flexibility One of the most powerful aspects of DTrace is its customizability. With a scripting language based on C syntax, DTrace allows the creation of customized scripts to probe specific aspects of system behavior. This flexibility means that it can be adapted to a wide range of debugging scenarios, making it a versatile tool in a developer’s arsenal. Real-World Applications In practical terms, DTrace can be used to diagnose elusive performance issues, track down resource leaks, or understand complex interactions between different system components. For example, it can be used to determine the cause of a slow file operation, analyze the reasons behind a process crash, or understand the system impact of a new software deployment. Performance and Compatibility of DTrace A standout feature of DTrace is its ability to operate with remarkable efficiency. Despite its deep system integration, DTrace is designed to have minimal impact on overall system performance. This efficiency makes it a feasible tool for use in live production environments, where maintaining system stability and performance is crucial. Its non-intrusive nature allows developers and system administrators to conduct thorough debugging and performance analysis without the worry of significantly slowing down or disrupting the normal operation of the system. Cross-Platform Compatibility Originally developed for Solaris, DTrace has evolved into a cross-platform tool, with adaptations available for MacOS, Windows, and various Linux distributions. Each platform presents its own set of features and limitations. For instance, while DTrace is a native component in Solaris and MacOS, its implementation in Linux often requires a specialized build due to kernel support and licensing considerations. Compatibility Challenges on MacOS On MacOS, DTrace's functionality intersects with System Integrity Protection (SIP), a security feature designed to prevent potentially harmful actions. To utilize DTrace effectively, users may need to disable SIP, which should be done with caution. This process involves booting into recovery mode and executing specific commands, a step that highlights the need for a careful approach when working with such powerful system-level tools. We can disable SIP using the command: csrutil disable We can optionally use a more refined approach of enabling SIP without dtrace using the following command: csrutil enable --without dtrace Be extra careful when issuing these commands and when working on machines where dtrace is enabled. Back up your data properly! Customizability and Flexibility of DTrace A key feature that sets DTrace apart in the realm of system monitoring tools is its highly customizable nature. DTrace employs a scripting language that bears similarity to C syntax, offering users the ability to craft detailed and specific diagnostic scripts. This scripting capability allows for the creation of custom probes that can be fine-tuned to target particular aspects of system behavior, providing precise and relevant data. Adaptability to Various Scenarios The flexibility of DTrace's scripting language means it can adapt to a multitude of debugging scenarios. Whether it's tracking down memory leaks, analyzing CPU usage, or monitoring I/O operations, DTrace can be configured to provide insights tailored to the specific needs of the task. This adaptability makes it an invaluable tool for both developers and system administrators who require a dynamic approach to problem-solving. Examples of Customizable Probes Users can define probes to monitor specific system events, track the behavior of certain processes, or gather data on system resource usage. This level of customization ensures that DTrace can be an effective tool in a variety of contexts, from routine maintenance to complex troubleshooting tasks. The following is a simple "Hello, world!" dtrace probe: sudo dtrace -qn 'syscall::write:entry, syscall::sendto:entry /pid == $target/ { printf("(%d) %s %s", pid, probefunc, copyinstr(arg1)); }' -p 9999 The kernel is instrumented with hooks that match various callbacks. dtrace connects to these hooks and can perform interesting tasks when these hooks are triggered. They have a naming convention, specifically provider:module:function:name. In this case, the provider is a system call in both cases. We have no module so we can leave that part blank between the colon (:) symbols. We grab a write operation and sendto entries. When an application writes or tries to send a packet, the following code event will trigger. These things happen frequently, which is why we restrict the process ID to the specific target with pid == $target. This means the code will only trigger for the PID passed to us in the command line. The rest of the code should be simple for anyone with basic C experience: it's a printf that would list the processes and the data passed. Real-World Applications of DTrace DTrace's diverse capabilities extend far beyond theoretical use, playing a pivotal role in resolving real-world system complexities. Its ability to provide deep insights into system operations makes it an indispensable tool in a variety of practical applications. To get a sense of how DTrace can be used, we can use the man -k dtrace command whose output on my Mac is below: bitesize.d(1m) - analyse disk I/O size by process. Uses DTrace cpuwalk.d(1m) - Measure which CPUs a process runs on. Uses DTrace creatbyproc.d(1m) - snoop creat()s by process name. Uses DTrace dappprof(1m) - profile user and lib function usage. Uses DTrace dapptrace(1m) - trace user and library function usage. Uses DTrace dispqlen.d(1m) - dispatcher queue length by CPU. Uses DTrace dtrace(1) - dynamic tracing compiler and tracing utility dtruss(1m) - process syscall details. Uses DTrace errinfo(1m) - print errno for syscall fails. Uses DTrace execsnoop(1m) - snoop new process execution. Uses DTrace fddist(1m) - file descriptor usage distributions. Uses DTrace filebyproc.d(1m) - snoop opens by process name. Uses DTrace hotspot.d(1m) - print disk event by location. Uses DTrace iofile.d(1m) - I/O wait time by file and process. Uses DTrace iofileb.d(1m) - I/O bytes by file and process. Uses DTrace iopattern(1m) - print disk I/O pattern. Uses DTrace iopending(1m) - plot number of pending disk events. Uses DTrace iosnoop(1m) - snoop I/O events as they occur. Uses DTrace iotop(1m) - display top disk I/O events by process. Uses DTrace kill.d(1m) - snoop process signals as they occur. Uses DTrace lastwords(1m) - print syscalls before exit. Uses DTrace loads.d(1m) - print load averages. Uses DTrace newproc.d(1m) - snoop new processes. Uses DTrace opensnoop(1m) - snoop file opens as they occur. Uses DTrace pathopens.d(1m) - full pathnames opened ok count. Uses DTrace perldtrace(1) - Perl's support for DTrace pidpersec.d(1m) - print new PIDs per sec. Uses DTrace plockstat(1) - front-end to DTrace to print statistics about POSIX mutexes and read/write locks priclass.d(1m) - priority distribution by scheduling class. Uses DTrace pridist.d(1m) - process priority distribution. Uses DTrace procsystime(1m) - analyse system call times. Uses DTrace rwbypid.d(1m) - read/write calls by PID. Uses DTrace rwbytype.d(1m) - read/write bytes by vnode type. Uses DTrace rwsnoop(1m) - snoop read/write events. Uses DTrace sampleproc(1m) - sample processes on the CPUs. Uses DTrace seeksize.d(1m) - print disk event seek report. Uses DTrace setuids.d(1m) - snoop setuid calls as they occur. Uses DTrace sigdist.d(1m) - signal distribution by process. Uses DTrace syscallbypid.d(1m) - syscalls by process ID. Uses DTrace syscallbyproc.d(1m) - syscalls by process name. Uses DTrace syscallbysysc.d(1m) - syscalls by syscall. Uses DTrace topsyscall(1m) - top syscalls by syscall name. Uses DTrace topsysproc(1m) - top syscalls by process name. Uses DTrace Tcl_CommandTraceInfo(3tcl), Tcl_TraceCommand(3tcl), Tcl_UntraceCommand(3tcl) - monitor renames and deletes of a command bitesize.d(1m) - analyse disk I/O size by process. Uses DTrace cpuwalk.d(1m) - Measure which CPUs a process runs on. Uses DTrace creatbyproc.d(1m) - snoop creat()s by process name. Uses DTrace dappprof(1m) - profile user and lib function usage. Uses DTrace dapptrace(1m) - trace user and library function usage. Uses DTrace dispqlen.d(1m) - dispatcher queue length by CPU. Uses DTrace dtrace(1) - dynamic tracing compiler and tracing utility dtruss(1m) - process syscall details. Uses DTrace errinfo(1m) - print errno for syscall fails. Uses DTrace execsnoop(1m) - snoop new process execution. Uses DTrace fddist(1m) - file descriptor usage distributions. Uses DTrace filebyproc.d(1m) - snoop opens by process name. Uses DTrace hotspot.d(1m) - print disk event by location. Uses DTrace iofile.d(1m) - I/O wait time by file and process. Uses DTrace iofileb.d(1m) - I/O bytes by file and process. Uses DTrace iopattern(1m) - print disk I/O pattern. Uses DTrace iopending(1m) - plot number of pending disk events. Uses DTrace iosnoop(1m) - snoop I/O events as they occur. Uses DTrace iotop(1m) - display top disk I/O events by process. Uses DTrace kill.d(1m) - snoop process signals as they occur. Uses DTrace lastwords(1m) - print syscalls before exit. Uses DTrace loads.d(1m) - print load averages. Uses DTrace newproc.d(1m) - snoop new processes. Uses DTrace opensnoop(1m) - snoop file opens as they occur. Uses DTrace pathopens.d(1m) - full pathnames opened ok count. Uses DTrace perldtrace(1) - Perl's support for DTrace pidpersec.d(1m) - print new PIDs per sec. Uses DTrace plockstat(1) - front-end to DTrace to print statistics about POSIX mutexes and read/write locks priclass.d(1m) - priority distribution by scheduling class. Uses DTrace pridist.d(1m) - process priority distribution. Uses DTrace procsystime(1m) - analyse system call times. Uses DTrace rwbypid.d(1m) - read/write calls by PID. Uses DTrace rwbytype.d(1m) - read/write bytes by vnode type. Uses DTrace rwsnoop(1m) - snoop read/write events. Uses DTrace sampleproc(1m) - sample processes on the CPUs. Uses DTrace seeksize.d(1m) - print disk event seek report. Uses DTrace setuids.d(1m) - snoop setuid calls as they occur. Uses DTrace sigdist.d(1m) - signal distribution by process. Uses DTrace syscallbypid.d(1m) - syscalls by process ID. Uses DTrace syscallbyproc.d(1m) - syscalls by process name. Uses DTrace syscallbysysc.d(1m) - syscalls by syscall. Uses DTrace topsyscall(1m) - top syscalls by syscall name. Uses DTrace topsysproc(1m) - top syscalls by process name. Uses DTrace There's a lot here; we don't need to read everything. The point is that when you run into a problem you can just search through this list and find a tool dedicated to debugging that problem. Let’s say you're facing elevated disk write issues that are causing the performance of your application to degrade. . . But is it your app at fault or some other app? rwbypid.d can help you with that: it can generate a list of processes and the number of calls they have for read/write based on the process ID as seen in the following screenshot: We can use this information to better understand IO issues in our code or even in 3rd party applications/libraries. iosnoop is another tool that helps us track IO operations but with more details: In diagnosing elusive system issues, DTrace shines by enabling detailed observation of system calls, file operations, and network activities. For instance, it can be used to uncover the root cause of unexpected system behaviors or to trace the origin of security breaches, offering a level of detail that is often unattainable with other debugging tools. Performance optimization is the main area where DTrace demonstrates its strengths. It allows administrators and developers to pinpoint performance bottlenecks, whether they lie in application code, system calls, or hardware interactions. By providing real-time data on resource usage, DTrace helps in fine-tuning systems for optimal performance. Final Words In conclusion, DTrace stands as a powerful and versatile tool in the realm of system monitoring and debugging. We've explored its broad capabilities, from in-depth system analysis to individual process tracing, and its remarkable performance efficiency that allows for its use in live environments. Its cross-platform compatibility, coupled with the challenges and solutions specific to MacOS, highlights its widespread applicability. The customizability through scripting provides unmatched flexibility, adapting to a myriad of diagnostic needs. Real-world applications of DTrace in diagnosing system issues and optimizing performance underscore its practical value. DTrace's comprehensive toolkit offers an unparalleled window into the inner workings of systems, making it an invaluable asset for system administrators and developers alike. Whether it's for routine troubleshooting or complex performance tuning, DTrace provides insights and solutions that are essential in the modern computing landscape.
Last year, I wrote a post on OpenTelemetry Tracing to understand more about the subject. I also created a demo around it, which featured the following components: The Apache APISIX API Gateway A Kotlin/Spring Boot service A Python/Flask service And a Rust/Axum service I've recently improved the demo to deepen my understanding and want to share my learning. Using a Regular Database In the initial demo, I didn't bother with a regular database. Instead: The Kotlin service used the embedded Java H2 database The Python service used the embedded SQLite The Rust service used hard-coded data in a hash map I replaced all of them with a regular PostgreSQL database, with a dedicated schema for each. The OpenTelemetry agent added a new span when connecting to the database on the JVM and in Python. For the JVM, it's automatic when one uses the Java agent. One needs to install the relevant package in Python — see next section. OpenTelemetry Integrations in Python Libraries Python requires you to explicitly add the package that instruments a specific library for OpenTelemetry. For example, the demo uses Flask; hence, we should add the Flask integration package. However, it can become a pretty tedious process. Yet, once you've installed opentelemetry-distro, you can "sniff" installed packages and install the relevant integration. Shell pip install opentelemetry-distro opentelemetry-bootstrap -a install For the demo, it installs the following: Plain Text opentelemetry_instrumentation-0.41b0.dist-info opentelemetry_instrumentation_aws_lambda-0.41b0.dist-info opentelemetry_instrumentation_dbapi-0.41b0.dist-info opentelemetry_instrumentation_flask-0.41b0.dist-info opentelemetry_instrumentation_grpc-0.41b0.dist-info opentelemetry_instrumentation_jinja2-0.41b0.dist-info opentelemetry_instrumentation_logging-0.41b0.dist-info opentelemetry_instrumentation_requests-0.41b0.dist-info opentelemetry_instrumentation_sqlalchemy-0.41b0.dist-info opentelemetry_instrumentation_sqlite3-0.41b0.dist-info opentelemetry_instrumentation_urllib-0.41b0.dist-info opentelemetry_instrumentation_urllib3-0.41b0.dist-info opentelemetry_instrumentation_wsgi-0.41b0.dist-info The above setup adds a new automated trace for connections. Gunicorn on Flask Every time I started the Flask service, it showed a warning in red that it shouldn't be used in production. While it's unrelated to OpenTelemetry, and though nobody complained, I was not too fond of it. For this reason, I added a "real" HTTP server. I chose Gunicorn, for no other reason than because my knowledge of the Python ecosystem is still shallow. The server is a runtime concern. We only need to change the Dockerfile slightly: Dockerfile RUN pip install gunicorn ENTRYPOINT ["opentelemetry-instrument", "gunicorn", "-b", "0.0.0.0", "-w", "4", "app:app"] The -b option refers to binding; you can attach to a specific IP. Since I'm running Docker, I don't know the IP, so I bind to any. The -w option specifies the number of workers Finally, the app:app argument sets the module and the application, separated by a colon Gunicorn usage doesn't impact OpenTelemetry integrations. Heredocs for the Win You may benefit from this if you write a lot of Dockerfile. Every Docker layer has a storage cost. Hence, inside a Dockerfile, one tends to avoid unnecessary layers. For example, the two following snippets yield the same results. Dockerfile RUN pip install pip-tools RUN pip-compile RUN pip install -r requirements.txt RUN pip install gunicorn RUN opentelemetry-bootstrap -a install RUN pip install pip-tools \ && pip-compile \ && pip install -r requirements.txt \ && pip install gunicorn \ && opentelemetry-bootstrap -a install The first snippet creates five layers, while the second is only one; however, the first is more readable than the second. With heredocs, we can access a more readable syntax that creates a single layer: Dockerfile RUN <<EOF pip install pip-tools pip-compile pip install -r requirements.txt pip install gunicorn opentelemetry-bootstrap -a install EOF Heredocs are a great way to have more readable and more optimized Dockerfiles. Try them! Explicit API Call on the JVM In the initial demo, I showed two approaches: The first uses auto-instrumentation, which requires no additional action The second uses manual instrumentation with Spring annotations I wanted to demo an explicit call with the API in the improved version. The use-case is analytics and uses a message queue: I get the trace data from the HTTP call and create a message with such data so the subscriber can use it as a parent. First, we need to add the OpenTelemetry API dependency to the project. We inherit the version from the Spring Boot Starter parent POM: XML <dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> At this point, we can access the API. OpenTelemetry offers a static method to get an instance: Kotlin val otel = GlobalOpenTelemetry.get() At runtime, the agent will work its magic to return the instance. Here's a simplified class diagram focused on tracing: In turn, the flow goes something like this: Kotlin val otel = GlobalOpenTelemetry.get() //1 val tracer = otel.tracerBuilder("ch.frankel.catalog").build() //2 val span = tracer.spanBuilder("AnalyticsFilter.filter") //3 .setParent(Context.current()) //4 .startSpan() //5 // Do something here span.end() //6 Get the underlying OpenTelemetry Get the tracer builder and "build" the tracer Get the span builder Add the span to the whole chain Start the span End the span; after this step, send the data to the OpenTelemetry endpoint configured Adding a Message Queue When I did the talk based on the post, attendees frequently asked whether OpenTelemetry would work with messages such as MQ or Kafka. While I thought it was the case in theory, I wanted to make sure of it: I added a message queue in the demo under the pretense of analytics. The Kotlin service will publish a message to an MQTT topic on each request. A NodeJS service will subscribe to the topic. Attaching OpenTelemetry Data to the Message So far, OpenTelemetry automatically reads the context to find out the trace ID and the parent span ID. Whatever the approach, auto-instrumentation or manual, annotations-based or explicit, the library takes care of it. I didn't find any existing similar automation for messaging; we need to code our way in. The gist of OpenTelemetry is the traceparent HTTP header. We need to read it and send it along with the message. First, let's add MQTT API to the project. XML <dependency> <groupId>org.eclipse.paho</groupId> <artifactId>org.eclipse.paho.mqttv5.client</artifactId> <version>1.2.5</version> </dependency> Interestingly enough, the API doesn't allow access to the traceparent directly. However, we can reconstruct it via the SpanContext class. I'm using MQTT v5 for my message broker. Note that the v5 allows for metadata attached to the message; when using v3, the message itself needs to wrap them. JavaScript val spanContext = span.spanContext //1 val message = MqttMessage().apply { properties = MqttProperties().apply { val traceparent = "00-${spanContext.traceId}-${spanContext.spanId}-${spanContext.traceFlags}" //2 userProperties = listOf(UserProperty("traceparent", traceparent)) //3 } qos = options.qos isRetained = options.retained val hostAddress = req.remoteAddress().map { it.address.hostAddress }.getOrNull() payload = Json.encodeToString(Payload(req.path(), hostAddress)).toByteArray() //4 } val client = MqttClient(mqtt.serverUri, mqtt.clientId) //5 client.publish(mqtt.options, message) //6 Get the span context Construct the traceparent from the span context, according to the W3C Trace Context specification Set the message metadata Set the message body Create the client Publish the message Getting OpenTelemetry Data From the Message The subscriber is a new component based on NodeJS. First, we configure the app to use the OpenTelemetry trace exporter: JavaScript const sdk = new NodeSDK({ resource: new Resource({[SemanticResourceAttributes.SERVICE_NAME]: 'analytics'}), traceExporter: new OTLPTraceExporter({ url: `${collectorUri}/v1/traces` }) }) sdk.start() The next step is to read the metadata, recreate the context from the traceparent, and create a span. JavaScript client.on('message', (aTopic, payload, packet) => { if (aTopic === topic) { console.log('Received new message') const data = JSON.parse(payload.toString()) const userProperties = {} if (packet.properties['userProperties']) { //1 const props = packet.properties['userProperties'] for (const key of Object.keys(props)) { userProperties[key] = props[key] } } const activeContext = propagation.extract(context.active(), userProperties) //2 const tracer = trace.getTracer('analytics') const span = tracer.startSpan( //3 'Read message', {attributes: {path: data['path'], clientIp: data['clientIp']}, activeContext, ) span.end() //4 } }) Read the metadata Recreate the context from the traceparent Create the span End the span For the record, I tried to migrate to TypeScript, but when I did, I didn't receive the message. Help or hints are very welcome! Apache APISIX for Messaging Though it's not common knowledge, Apache APISIX can proxy HTTP calls as well as UDP and TCP messages. It only offers a few plugins at the moment, but it will add more in the future. An OpenTelemetry one will surely be part of it. In the meantime, let's prepare for it. The first step is to configure Apache APISIX to allow both HTTP and TCP: YAML apisix: proxy_mode: http&stream #1 stream_proxy: tcp: - addr: 9100 #2 tls: false Configure APISIX for both modes Set the TCP port The next step is to configure TCP routing: YAML upstreams: - id: 4 nodes: "mosquitto:1883": 1 #1 stream_routes: #2 - id: 1 upstream_id: 4 plugins: mqtt-proxy: #3 protocol_name: MQTT protocol_level: 5 #4 Define the MQTT queue as the upstream Define the "streaming" route. APISIX defines everything that's not HTTP as streaming Use the MQTT proxy. Note APISIX offers a Kafka-based one Address the MQTT version. For version above 3, it should be 5 Finally, we can replace the MQTT URLs in the Docker Compose file with APISIX URLs. Conclusion I've described several items I added to improve my OpenTelemetry demo in this post. While most are indeed related to OpenTelemetry, some of them aren't. I may add another component in another different stack, a front-end. The complete source code for this post can be found on GitHub.
AngularAndSpringWithMaps is a Sprint Boot project that shows company properties on a Bing map and can be run on the JDK or as a GraalVM native image. ReactAndGo is a Golang project that shows the cheapest gas stations in your post code area and is compiled in a binary. Both languages are garbage collected, and the AngularAndSpringWithMaps project uses the G1 collector. The complexity of both projects can be compared. Both serve as a frontend, provide rest data endpoints for the frontend, and implement services for the logic with repositories for the database access. How to build the GraalVM native image for the AngularAndSpringWithMaps project is explained in this article. What To Compare On the performance side, Golang and Java on the JVM or as a native image are fast and efficient enough for the vast majority of use cases. Further performance fine-tuning needs good profiling and specific improvements, and often, the improvements are related to the database. The two interesting aspects are: Memory requirements Startup time(can include warmup) The memory requirements are important because the available memory limit on the Kubernetes node or deployment server is mostly reached earlier than the CPU limit. If you use less memory, you can deploy more Docker images or Spring Boot applications on the resource. The startup time is important if you have periods with little load and periods with high load for your application. The shorter the startup time is the more aggressive you can scale the amount of deployed applications/images up or down. Memory Requirements 420 MB AngularAndSpringWithMaps JVM 21 280 MB AngularAndSpringWithMaps GraalVM native image 128-260 MB ReactAndGo binary The GraalVM native image uses significantly less memory than the JVM jar. That makes the native image more resource-efficient. The native image binary is 240 MB in size, which means 40 MB of working memory. The ReactAndGo binary is 29 MB in size and uses 128-260 MB of memory depending on the size of the updates it has to process. That means if the use case would need only 40 MB of working memory like the GraalVM native image, 70 MB would be enough to run it. That makes the Go binary much more resource-efficient. Startup Time 4300ms AngularAndSpringWithMaps JVM 21 220ms AngularAndSpringWithMaps GraalVM native image 100ms ReactAndGo binary The GraalVM native image startup time is impressive and enables the scale-to-zero configurations that start the application on demand and scale down to zero without load. The JVM start time requires one running instance as a minimum. The ReactAndGo binary startup time is the fastest and enables scale to zero. Conclusion The GraalVM native image and the Go binary are the most efficient in this comparison. Due to their lower memory requirements can, the CPU resources be used more efficiently. The fast startup times enable scale to zero configurations that can save money in on-demand environments. The winner is the Go project. The result is that if efficient use of hardware resources is the most important to you, Go is the best. If your developers are most familiar with Java then the use of GraalVM native image can improve the efficient use of hardware resources. Creating GraalVM native images needs more effort and developer time. Some of the effort can be automated, and with some of the effort, that would be hard. Then the question becomes: Is the extra developer time worth the saved hardware resources?
Slow query times in large datasets are a common headache in database management. MariaDB ColumnStore offers a neat way out of this. It's a columnar storage engine that significantly speeds up data analytics. Typically, you can improve query performance in relational databases by adding appropriate indexes. However, maintaining indexes is hard, especially with ad-hoc queries where you don't know where indexes are going to be needed. ColumnStore eases this pain. It's as if you had an index on each column but without the hassle of creating and updating them. The price to pay? Well, inserts are not as fast as with InnoDB, so this is not the best option for operational/transactional databases but rather for analytical ones. Bulk inserts are very fast though. There's plenty of online documentation about ColumnStore, so I won't go through all the details on how it works or how to deploy it on production. Instead, in this article, I'll show you how to try MariaDB ColumnStore on your computer using Docker. Pre-Requisites You'll need: The mariadb command line tool Docker Setting up MariaDB ColumnStore Run a container with MariaDB + ColumnStore: Shell docker run -d -p 3307:3306 -e PM1=mcs1 --hostname=mcs1 --name mcs1 mariadb/columnstore This command runs a new Docker container using the official ColumnStore image, with several specified options: docker run: Starts a new Docker container. -d: Runs the container in detached mode (in the background). -p 3307:3306: Maps port 3307 on the host (your computer) to port 3306 inside the container. This makes the database accessible on the port 3307 on the host machine. -e PM1=mcs1: The PM1 environment variable PM1 specifies the primary database node (mcs1). --hostname=mcs1: Sets the hostname of the container to mcs1. --name mcs1: Names the container mcs1. mariadb/columnstore: Specifies the Docker image to use, in this case, an image for MariaDB with the ColumnStore storage engine. Provision ColumnStore: Shell docker exec -it mcs1 provision mcs1 The command docker exec is used to interact with a running Docker container. This is what each option does: docker exec: Executes a command in a running container. -it: This option ensures the command is run in interactive mode with a terminal. mcs1 (first occurrence): This is the name of the Docker container in which the command is to be executed. provision mcs1 This is the specific command being executed inside the container. provision is a script included in the Docker image that initializes and configures the MariaDB ColumnStore environment within the container? The argument mcs1 is passed to the provision command to specify the host for the MariaDB server within the Docker container. Connect to the MariaDB server using the default credentials defined in the MariaDB ColumnStore Docker image: Shell mariadb -h 127.0.0.1 -P 3307 -u admin -p'C0lumnStore!' Check that ColumnStore is available as a storage engine by running the following SQL sentence: Shell SHOW ENGINES; Setting up a Demo Database Create the operations database and its InnoDB tables: SQL CREATE DATABASE operations; CREATE TABLE operations.doctors( id SERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL CHECK(TRIM(name) != '') ) ENGINE=InnoDB; CREATE TABLE operations.appointments( id SERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL CHECK(TRIM(name) != ''), phone_number VARCHAR(15) NOT NULL CHECK(phone_number RLIKE '[0-9]+'), email VARCHAR(254) NOT NULL CHECK(TRIM(email) != ''), time DATETIME NOT NULL, reason ENUM('Consultation', 'Follow-up', 'Preventive', 'Chronic') NOT NULL, status ENUM ('Scheduled', 'Canceled', 'Completed', 'No Show'), doctor_id BIGINT UNSIGNED NOT NULL, CONSTRAINT fk_appointments_doctors FOREIGN KEY (doctor_id) REFERENCES doctors(id) ) ENGINE=InnoDB; Create the analytics database and its ColumnStore table: Shell CREATE DATABASE analytics; CREATE TABLE analytics.appointments( id BIGINT UNSIGNED NOT NULL, name VARCHAR(200) NOT NULL, phone_number VARCHAR(15) NOT NULL, email VARCHAR(254) NOT NULL, time DATETIME NOT NULL, reason VARCHAR(15) NOT NULL, status VARCHAR(10) NOT NULL, doctor_id BIGINT UNSIGNED NOT NULL ) ENGINE=ColumnStore; You can use the same database (or schema, they are synonyms in MariaDB) for both the InnoDB and ColumnStore tables if you prefer. Use a different name for the ColumnStore table if you opt for this alternative. Inserting Demo Data Insert a few doctors: SQL INSERT INTO operations.doctors(name) VALUES ("Maria"), ("John"), ("Jane"); Create a new file with the name test_data_insert.py with the following content: SQL import random import os import subprocess from datetime import datetime, timedelta # Function to generate a random date within a given range def random_date(start, end): return start + timedelta(days=random.randint(0, int((end - start).days))) # Function to execute a given SQL command using MariaDB def execute_sql(sql): # Write the SQL command to a temporary file with open("temp.sql", "w") as file: file.write(sql) # Execute the SQL command using the MariaDB client subprocess.run(["mariadb", "-h", "127.0.0.1", "-P", "3307", "-u", "admin", "-pC0lumnStore!", "-e", "source temp.sql"]) # Remove the temporary file os.remove("temp.sql") print("Generating and inserting data...") # Total number of rows to be inserted total_rows = 4000000 # Number of rows to insert in each batch batch_size = 10000 # Possible values for the 'reason' column and their associated weights for random selection reasons = ["Consultation", "Follow-up", "Preventive", "Chronic"] reason_weights = [0.5, 0.15, 0.25, 0.1] # Possible values for the 'status' column and their associated weights for random selection statuses = ["Scheduled", "Canceled", "Completed", "No Show"] status_weights = [0.1, 0.15, 0.7, 0.05] # Possible values for the 'doctor_id' column and their associated weights for random selection doctors = [1, 2, 3] doctors_weights = [0.4, 0.35, 0.25] # List of patient names names = [f"Patient_{i}" for i in range(total_rows)] # Insert data in batches for batch_start in range(0, total_rows, batch_size): batch_values = [] # Generate data for each row in the batch for i in range(batch_start, min(batch_start + batch_size, total_rows)): name = names[i] phone_number = f"{random.randint(100, 999)}-{random.randint(100, 999)}-{random.randint(1000, 9999)}" email = f"patient_{i}@example.com" time = random_date(datetime(2023, 1, 1), datetime(2024, 1, 1)).strftime("%Y-%m-%d %H:%M:%S") reason = random.choices(reasons, reason_weights)[0] status = random.choices(statuses, status_weights)[0] doctor_id = random.choices(doctors, doctors_weights)[0] # Append the generated row to the batch batch_values.append(f"('{name}', '{phone_number}', '{email}', '{time}', '{reason}', '{status}', {doctor_id})") # SQL command to insert the batch of data into the 'appointments' table sql = "USE operations;\nINSERT INTO appointments (name, phone_number, email, time, reason, status, doctor_id) VALUES " + ", ".join(batch_values) + ";" # Execute the SQL command execute_sql(sql) # Print progress print(f"Inserted up to row {min(batch_start + batch_size, total_rows)}") print("Data insertion complete.") Insert 4 million appointments by running the Python script: python3 test_data_insert.py Populate the ColumnStore table by connecting to the database and running: SQL INSERT INTO analytics.appointments ( id, name, phone_number, email, time, reason, status, doctor_id ) SELECT appointments.id, appointments.name, appointments.phone_number, appointments.email, appointments.time, appointments.reason, appointments.status, appointments.doctor_id FROM operations.appointments; Run Cross-Engine SQL Queries MariaDB ColumnStore is designed to run in a cluster of multiple servers. It is there where you see massive performance gains in analytical queries. However, we can also see this in action with the single-node setup of this article. Run the following query and pay attention to the time it needs to complete (make sure it queries the operations database): SQL SELECT doctors.name, status, COUNT(*) AS count FROM operations.appointments -- use the InnoDB table JOIN doctors ON doctor_id = doctors.id WHERE status IN ( 'Scheduled', 'Canceled', 'Completed', 'No Show' ) GROUP BY doctors.name, status ORDER BY doctors.name, status; On my machine, it took around 3 seconds. Now modify the query to use the ColumnStore table instead (in the analytics database): SQL SELECT doctors.name, status, COUNT(*) AS count FROM analytics.appointments -- use the ColumnStore table JOIN doctors ON doctor_id = doctors.id WHERE status IN ( 'Scheduled', 'Canceled', 'Completed', 'No Show' ) GROUP BY doctors.name, status ORDER BY doctors.name, status; It takes less than a second. Of course, you can speed up the first query by adding an index in this simplistic example, but imagine the situation in which you have hundreds of tables—it will become harder and harder to manage indexes. ColumnStore removes this complexity.
In this post, you will learn how you can integrate Large Language Model (LLM) capabilities into your Java application. More specifically, how you can integrate with LocalAI from your Java application. Enjoy! Introduction In a previous post, it was shown how you could run a Large Language Model (LLM) similar to OpenAI by means of LocalAI. The Rest API of OpenAI was used in order to interact with LocalAI. Integrating these capabilities within your Java application can be cumbersome. However, since the introduction of LangChain4j, this has become much easier to do. LangChain4j offers you a simplification in order to integrate with LLMs. It is based on the Python library LangChain. It is therefore also advised to read the documentation and concepts of LangChain since the documentation of LangChain4j is rather short. Many examples are provided though in the LangChain4j examples repository. Especially, the examples in the other-examples directory have been used as inspiration for this blog. The real trigger for writing this blog was the talk I attended about LangChain4j at Devoxx Belgium. This was the most interesting talk I attended at Devoxx: do watch it if you can make time for it. It takes only 50 minutes. The sources used in this blog can be found on GitHub. Prerequisites The prerequisites for this blog are: Basic knowledge about what a Large Language Model is Basic Java knowledge (Java 21 is used) You need LocalAI if you want to run the examples (see the previous blog linked in the introduction on how you can make use of LocalAI). Version 2.2.0 is used for this blog. LangChain4j Examples In this section, some of the capabilities of LangChain4j are shown by means of examples. Some of the examples used in the previous post are now implemented using LangChain4j instead of using curl. How Are You? As a first simple example, you ask the model how it is feeling. In order to make use of LangChain4j in combination with LocalAI, you add the langchain4j-local-ai dependency to the pom file. XML <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-local-ai</artifactId> <version>0.24.0</version> </dependency> In order to integrate with LocalAI, you create a ChatLanguageModel specifying the following items: The URL where the LocalAI instance is accessible The name of the model you want to use in LocalAI The temperature: A high temperature allows the model to respond in a more creative way. Next, you ask the model to generate an answer to your question and you print the answer. Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.9) .build(); String answer = model.generate("How are you?"); System.out.println(answer); Start LocalAI and run the example above. The response is as expected. Shell I'm doing well, thank you. How about yourself? Before continuing, note something about the difference between LanguageModel and ChatLanguageModel. Both classes are available in LangChain4j, so which one to choose? A chat model is a variation of a language model. If you need a "text in, text out" functionality, you can choose LanguageModel. If you also want to be able to use "chat messages" as input and output, you should use ChatLanguageModel. In the example above, you could just have used LanguageModel and it would behave similarly. Facts About Famous Soccer Player Let’s verify whether it also returns facts about the famous Dutch soccer player Johan Cruijff. You use the same code as before, only now you set the temperature to zero because no creative answer is required. Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); String answer = model.generate("who is Johan Cruijff?"); System.out.println(answer); Run the example, the response is as expected. Shell Johan Cruyff was a Dutch professional football player and coach. He played as a forward for Ajax, Barcelona, and the Netherlands national team. He is widely regarded as one of the greatest players of all time and was known for his creativity, skill, and ability to score goals from any position on the field. Stream the Response Sometimes, the answer will take some time. In the OpenAPI specification, you can set the stream parameter to true in order to retrieve the response character by character. This way, you can display the response already to the user before awaiting the complete response. This functionality is also available with LangChain4j but requires the use of a StreamingResponseHandler. The onNext method receives every character one by one. The complete response is gathered in the answerBuilder and futureAnswer. Running this example prints every single character one by one, and at the end, the complete response is printed. Java StreamingChatLanguageModel model = LocalAiStreamingChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); StringBuilder answerBuilder = new StringBuilder(); CompletableFuture<String> futureAnswer = new CompletableFuture<>(); model.generate("who is Johan Cruijff?", new StreamingResponseHandler<AiMessage>() { @Override public void onNext(String token) { answerBuilder.append(token); System.out.println(token); } @Override public void onComplete(Response<AiMessage> response) { futureAnswer.complete(answerBuilder.toString()); } @Override public void onError(Throwable error) { futureAnswer.completeExceptionally(error); } }); String answer = futureAnswer.get(90, SECONDS); System.out.println(answer); Run the example. The response is as expected. Shell J o h a n ... s t y l e . Johan Cruijff was a Dutch professional football player and coach who played as a forward. ... Other Languages You can instruct the model by means of a system message how it should behave. For example, you can instruct it to answer always in a different language; Dutch, in this case. This example shows clearly the difference between LanguageModel and ChatLanguageModel. You have to use ChatLanguageModel in this case because you need to interact by means of chat messages with the model. Create a SystemMessage to instruct the model. Create a UserMessage for your question. Add them to a list and send the list of messages to the model. Also, note that the response is an AiMessage. The messages are explained as follows: UserMessage: A ChatMessage coming from a human/user AiMessage: A ChatMessage coming from an AI/assistant SystemMessage: A ChatMessage coming from the system Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); SystemMessage responseInDutch = new SystemMessage("You are a helpful assistant. Antwoord altijd in het Nederlands."); UserMessage question = new UserMessage("who is Johan Cruijff?"); var chatMessages = new ArrayList<ChatMessage>(); chatMessages.add(responseInDutch); chatMessages.add(question); Response<AiMessage> response = model.generate(chatMessages); System.out.println(response.content()); Run the example, the response is as expected. Shell AiMessage { text = "Johan Cruijff was een Nederlands voetballer en trainer. Hij speelde als aanvaller en is vooral bekend van zijn tijd bij Ajax en het Nederlands elftal. Hij overleed in 1996 op 68-jarige leeftijd." toolExecutionRequest = null } Chat With Documents A fantastic use case is to use an LLM in order to chat with your own documents. You can provide the LLM with your documents and ask questions about it. For example, when you ask the LLM for which football clubs Johan Cruijff played ("For which football teams did Johan Cruijff play and also give the periods, answer briefly"), you receive the following answer. Shell Johan Cruijff played for Ajax Amsterdam (1954-1973), Barcelona (1973-1978) and the Netherlands national team (1966-1977). This answer is quite ok, but it is not complete, as not all football clubs are mentioned and the period for Ajax includes also his youth period. The correct answer should be: Years Team 1964-1973 Ajax 1973-1978 Barcelona 1979 Los Angeles Aztecs 1980 Washington Diplomats 1981 Levante 1981 Washington Diplomats 1981-1983 Ajax 1983-1984 Feyenoord Apparently, the LLM does not have all relevant information and that is not a surprise. The LLM has some basic knowledge, it runs locally and has its limitations. But what if you could provide the LLM with extra information in order that it can give an adequate answer? Let’s see how this works. First, you need to add some extra dependencies to the pom file: XML <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j</artifactId> <version>${langchain4j.version}</version> </dependency> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-embeddings</artifactId> <version>${langchain4j.version}</version> </dependency> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId> <version>${langchain4j.version}</version> </dependency> Save the Wikipedia text of Johan Cruijff to a PDF file and store it in src/main/resources/example-files/Johan_Cruyff.pdf. The source code to add this document to the LLM consists of the following parts: The text needs to be embedded; i.e., the text needs to be converted to numbers. An embedding model is needed for that, for simplicity you use the AllMiniLmL6V2EmbeddingModel. The embeddings need to be stored in an embedding store. Often a vector database is used for this purpose, but in this case, you can use an in-memory embedding store. The document needs to be split into chunks. For simplicity, you split the document into chunks of 500 characters. All of this comes together in the EmbeddingStoreIngestor. Add the PDF to the ingestor. Create the ChatLanguageModel just like you did before. With a ConversationalRetrievalChain, you connect the language model with the embedding store and model. And finally, you execute your question. Java EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder() .documentSplitter(DocumentSplitters.recursive(500, 0)) .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) .build(); Document johanCruiffInfo = loadDocument(toPath("example-files/Johan_Cruyff.pdf")); ingestor.ingest(johanCruiffInfo); ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); ConversationalRetrievalChain chain = ConversationalRetrievalChain.builder() .chatLanguageModel(model) .retriever(EmbeddingStoreRetriever.from(embeddingStore, embeddingModel)) .build(); String answer = chain.execute("Give all football teams Johan Cruijff played for in his senior career"); System.out.println(answer); When you execute this code, an exception is thrown. Shell Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.io.InterruptedIOException: timeout at dev.langchain4j.internal.RetryUtils.withRetry(RetryUtils.java:29) at dev.langchain4j.model.localai.LocalAiChatModel.generate(LocalAiChatModel.java:98) at dev.langchain4j.model.localai.LocalAiChatModel.generate(LocalAiChatModel.java:65) at dev.langchain4j.chain.ConversationalRetrievalChain.execute(ConversationalRetrievalChain.java:65) at com.mydeveloperplanet.mylangchain4jplanet.ChatWithDocuments.main(ChatWithDocuments.java:55) Caused by: java.lang.RuntimeException: java.io.InterruptedIOException: timeout at dev.ai4j.openai4j.SyncRequestExecutor.execute(SyncRequestExecutor.java:31) at dev.ai4j.openai4j.RequestExecutor.execute(RequestExecutor.java:59) at dev.langchain4j.model.localai.LocalAiChatModel.lambda$generate$0(LocalAiChatModel.java:98) at dev.langchain4j.internal.RetryUtils.withRetry(RetryUtils.java:26) ... 4 more Caused by: java.io.InterruptedIOException: timeout at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398) at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360) at okhttp3.internal.connection.RealCall.noMoreExchanges$okhttp(RealCall.kt:325) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:209) at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154) at retrofit2.OkHttpCall.execute(OkHttpCall.java:204) at dev.ai4j.openai4j.SyncRequestExecutor.execute(SyncRequestExecutor.java:23) ... 7 more Caused by: java.net.SocketTimeoutException: timeout at okio.SocketAsyncTimeout.newTimeoutException(JvmOkio.kt:147) at okio.AsyncTimeout.access$newTimeoutException(AsyncTimeout.kt:158) at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:337) at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:427) at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:320) at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29) at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:178) at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:106) at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:79) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at dev.ai4j.openai4j.ResponseLoggingInterceptor.intercept(ResponseLoggingInterceptor.java:21) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at dev.ai4j.openai4j.RequestLoggingInterceptor.intercept(RequestLoggingInterceptor.java:31) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at dev.ai4j.openai4j.AuthorizationHeaderInjector.intercept(AuthorizationHeaderInjector.java:25) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) ... 10 more Caused by: java.net.SocketException: Socket closed at java.base/sun.nio.ch.NioSocketImpl.endRead(NioSocketImpl.java:243) at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346) at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796) at java.base/java.net.Socket$SocketInputStream.read(Socket.java:1099) at okio.InputStreamSource.read(JvmOkio.kt:94) at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:125) ... 32 more This can be solved by setting the timeout of the language model to a higher value. Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .timeout(Duration.ofMinutes(5)) .build(); Run the code again, and the following answer is received, which is correct. Shell Johan Cruijff played for the following football teams in his senior career: - Ajax (1964-1973) - Barcelona (1973-1978) - Los Angeles Aztecs (1979) - Washington Diplomats (1980-1981) - Levante (1981) - Ajax (1981-1983) - Feyenoord (1983-1984) - Netherlands national team (1966-1977) Using a 1.x version of LocalAI gave this response, which was worse. Shell Johan Cruyff played for the following football teams: - Ajax (1964-1973) - Barcelona (1973-1978) - Los Angeles Aztecs (1979) The following steps were used to solve this problem. When you take a closer look at the PDF file, you notice that the information about the football teams is listed in a table next to the regular text. Remember that splitting the document was done by creating chunks of 500 characters. So, maybe this splitting is not executed well enough for the LLM. Copy the football teams in a separate text document. Plain Text Years Team Apps (Gls) 1964–1973 Ajax 245 (193) 1973–1978 Barcelona 143 (48) 1979 Los Angeles Aztecs 22 (14) 1980 Washington Diplomats 24 (10) 1981 Levante 10 (2) 1981 Washington Diplomats 5 (2) 1981–1983 Ajax 36 (14) 1983–1984 Feyenoord 33 (11) Add both documents to the ingestor. Java Document johanCruiffInfo = loadDocument(toPath("example-files/Johan_Cruyff.pdf")); Document clubs = loadDocument(toPath("example-files/Johan_Cruyff_clubs.txt")); ingestor.ingest(johanCruiffInfo, clubs); Run this code and this time, the answer was correct and complete. Shell Johan Cruijff played for the following football teams in his senior career: - Ajax (1964-1973) - Barcelona (1973-1978) - Los Angeles Aztecs (1979) - Washington Diplomats (1980-1981) - Levante (1981) - Ajax (1981-1983) - Feyenoord (1983-1984) - Netherlands national team (1966-1977) It is therefore important that the sources you provide to an LLM are split wisely. Besides that, the used technologies improve in a rapid way. Even while writing this blog, some problems were solved in a couple of weeks. Updating to a more recent version of LocalAI for example, solved one way or the other the problem with parsing the single PDF. Conclusion In this post, you learned how to integrate an LLM from within your Java application using LangChain4j. You also learned how to chat with documents, which is a fantastic use case! It is also important to regularly update to newer versions as the development of these AI technologies improves continuously.
Last year, I wrote the article, "Upgrade Guide To Spring Boot 3.0 for Spring Data JPA and Querydsl," for the Spring Boot 3.0.x upgrade. Now, we have Spring Boot 3.2. Let's see two issues you might deal with when upgrading to Spring Boot 3.2.2. The technologies used in the SAT project are: Spring Boot 3.2.2 and Spring Framework 6.1.3 Hibernate + JPA model generator 6.4.1. Final Spring Data JPA 3.2.2 Querydsl 5.0.0. Changes All the changes in Spring Boot 3.2 are described in Spring Boot 3.2 Release Notes and What's New in Version 6.1 for Spring Framework 6.1. The latest changes in Spring Boot 3.2.2 can be found on GitHub. Issues Found A different treatment of Hibernate dependencies due to the changed hibernate-jpamodelgen behavior for annotation processors Unpaged class was redesigned. Let's start with the Hibernate dependencies first. Integrating Static Metamodel Generation The biggest change comes from the hibernate-jpamodelgen dependency, which is generating a static metamodel. In Hibernate 6.3, the treatment of dependencies was changed in order to mitigate transitive dependencies. Spring Boot 3.2.0 bumped up the hibernate-jpamodelgen dependency to the 6.3 version (see Dependency Upgrades). Unfortunately, the new version causes compilation errors (see below). Note: Spring Boot 3.2.2 used here already uses Hibernate 6.4 with the same behavior. Compilation Error With this change, the compilation of our project (Maven build) with Spring Boot 3.2.2 fails on the error like this: Plain Text [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3.049 s [INFO] Finished at: 2024-01-05T08:43:10+01:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project sat-jpa: Compilation failure: Compilation failure: [ERROR] on the class path. A future release of javac may disable annotation processing [ERROR] unless at least one processor is specified by name (-processor), or a search [ERROR] path is specified (--processor-path, --processor-module-path), or annotation [ERROR] processing is enabled explicitly (-proc:only, -proc:full). [ERROR] Use -Xlint:-options to suppress this message. [ERROR] Use -proc:none to disable annotation processing. [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityRepository.java:[3,41] error: cannot find symbol [ERROR] symbol: class City_ [ERROR] location: package com.github.aha.sat.jpa.city [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityRepository.java:[3] error: static import only from classes and interfaces ... [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryCustomRepositoryImpl.java:[4] error: static import only from classes and interfaces [ERROR] java.lang.NoClassDefFoundError: net/bytebuddy/matcher/ElementMatcher [ERROR] at org.hibernate.jpamodelgen.validation.ProcessorSessionFactory.<clinit>(ProcessorSessionFactory.java:69) [ERROR] at org.hibernate.jpamodelgen.annotation.AnnotationMeta.handleNamedQuery(AnnotationMeta.java:104) [ERROR] at org.hibernate.jpamodelgen.annotation.AnnotationMeta.handleNamedQueryRepeatableAnnotation(AnnotationMeta.java:78) [ERROR] at org.hibernate.jpamodelgen.annotation.AnnotationMeta.checkNamedQueries(AnnotationMeta.java:57) [ERROR] at org.hibernate.jpamodelgen.annotation.AnnotationMetaEntity.init(AnnotationMetaEntity.java:297) [ERROR] at org.hibernate.jpamodelgen.annotation.AnnotationMetaEntity.create(AnnotationMetaEntity.java:135) [ERROR] at org.hibernate.jpamodelgen.JPAMetaModelEntityProcessor.handleRootElementAnnotationMirrors(JPAMetaModelEntityProcessor.java:360) [ERROR] at org.hibernate.jpamodelgen.JPAMetaModelEntityProcessor.processClasses(JPAMetaModelEntityProcessor.java:203) [ERROR] at org.hibernate.jpamodelgen.JPAMetaModelEntityProcessor.process(JPAMetaModelEntityProcessor.java:174) [ERROR] at jdk.compiler/com.sun.tools.javac.processing.JavacProcessingEnvironment.callProcessor(JavacProcessingEnvironment.java:1021) [ER... [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:348) [ERROR] Caused by: java.lang.ClassNotFoundException: net.bytebuddy.matcher.ElementMatcher [ERROR] at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) [ERROR] at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593) [ERROR] at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) [ERROR] ... 51 more [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException This is caused by the changed approach in the static metamodel generation announced in the Hibernate migration guide (see Integrating Static Metamodel Generation and the original issue HHH-17362). Their explanation for such change is this: "... in previous versions of Hibernate ORM you were leaking dependencies of hibernate-jpamodelgen into your compile classpath unknowingly. With Hibernate ORM 6.3, you may now experience a compilation error during annotation processing related to missing Antlr classes." Dependency Changes As you can see below in the screenshots, Hibernate dependencies were really changed. Spring Boot 3.1.6: Spring Boot 3.2.2: Explanation As stated in the migration guide, we need to change our pom.xml from a simple Maven dependency to the annotation processor paths of the Maven compiler plugin (see documentation). Solution We can remove the Maven dependencies hibernate-jpamodelgen and querydsl-apt (in our case) as recommended in the last article. Instead, pom.xml has to define the static metamodel generators via maven-compiler-plugin like this: XML <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <annotationProcessorPaths> <path> <groupId>org.hibernate.orm</groupId> <artifactId>hibernate-jpamodelgen</artifactId> <version>${hibernate.version}</version> </path> <path> <groupId>com.querydsl</groupId> <artifactId>querydsl-apt</artifactId> <version>${querydsl.version}</version> <classifier>jakarta</classifier> </path> <path> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>${lombok.version}</version> </path> </annotationProcessorPaths> </configuration> </plugin> </plugins> See the related changes in SAT project on GitHub. As we are forced to use this approach due to hibernate-jpamodelgen, we need to apply it to all dependencies tight to annotation processing (querydsl-apt or lombok). For example, when lombok is not used this way, we get the compilation error like this: Plain Text [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityService.java:[15,30] error: variable repository not initialized in the default constructor [INFO] 1 error [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 4.535 s [INFO] Finished at: 2024-01-08T08:40:29+01:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project sat-jpa: Compilation failure [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\city\CityService.java:[15,30] error: variable repository not initialized in the default constructor The same applies to querydsl-apt. In this case, we can see the compilation error like this: Plain Text [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 5.211 s [INFO] Finished at: 2024-01-11T08:39:18+01:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project sat-jpa: Compilation failure: Compilation failure: [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryRepository.java:[3,44] error: cannot find symbol [ERROR] symbol: class QCountry [ERROR] location: package com.github.aha.sat.jpa.country [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryRepository.java:[3] error: static import only from classes and interfaces [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryCustomRepositoryImpl.java:[3,41] error: cannot find symbol [ERROR] symbol: class QCity [ERROR] location: package com.github.aha.sat.jpa.city [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryCustomRepositoryImpl.java:[3] error: static import only from classes and interfaces [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryCustomRepositoryImpl.java:[4,44] error: cannot find symbol [ERROR] symbol: class QCountry [ERROR] location: package com.github.aha.sat.jpa.country [ERROR] <SAT_PATH>\sat-jpa\src\main\java\com\github\aha\sat\jpa\country\CountryCustomRepositoryImpl.java:[4] error: static import only from classes and interfaces [ERROR] -> [Help 1] The reason is obvious. We need to apply all the annotation processors at the same time. Otherwise, some pieces of code can be missing, and we get the compilation error. Unpaged Redesigned The second minor issue is related to a change in Unpaged class. A serialization of PageImpl by the Jackson library was impacted by changing Unpaged from enum to class (see spring-projects/spring-data-commons#2987). Spring Boot 3.1.6: Java public interface Pageable { static Pageable unpaged() { return Unpaged.INSTANCE; } ... } enum Unpaged implements Pageable { INSTANCE; ... } Spring Boot 3.2.2: Java public interface Pageable { static Pageable unpaged() { return unpaged(Sort.unsorted()); } static Pageable unpaged(Sort sort) { return Unpaged.sorted(sort); } ... } final class Unpaged implements Pageable { private static final Pageable UNSORTED = new Unpaged(Sort.unsorted()); ... } When new PageImpl<City>(cities) is used (as we were used to using it), then this error is thrown: Plain Text 2024-01-11T08:47:56.446+01:00 WARN 5168 --- [sat-elk] [ main] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.http.converter.HttpMessageNotWritableException: Could not write JSON: (was java.lang.UnsupportedOperationException)] MockHttpServletRequest: HTTP Method = GET Request URI = /api/cities/country/Spain Parameters = {} Headers = [] Body = null Session Attrs = {} Handler: Type = com.github.aha.sat.elk.city.CityController Method = com.github.aha.sat.elk.city.CityController#searchByCountry(String, Pageable) Async: Async started = false Async result = null Resolved Exception: Type = org.springframework.http.converter.HttpMessageNotWritableException The workaround is to use the constructor with all attributes as: Java new PageImpl<City>(cities, ofSize(PAGE_SIZE), cities.size()) Instead of: Java new PageImpl<City>(cities) Note: It should be fixed in Spring Boot 3.3 (see this issue comment). Conclusion This article has covered both found issues when upgrading to the latest version of Spring Boot 3.2.2 (at the time of writing this article). The article started with the handling of the annotation processors due to the changed Hibernate dependency management. Next, the change in Unpaged class and workaround for using PageImpl was explained. All of the changes (with some other changes) can be seen in PR #64. The complete source code demonstrated above is available in my GitHub repository.
It wasn't long ago that I decided to ditch my Ubuntu-based distros for openSUSE, finding LEAP 15 to be a steadier, more rock-solid flavor of Linux for my daily driver. The trouble is, I hadn't yet been introduced to Linux Mint Debian Edition (LMDE), and that sound you hear is my heels clicking with joy. LMDE 6 with the Cinnamon desktop. Allow me to explain. While I've been a long-time fan of Ubuntu, in recent years, it's the addition of snaps (rather than system packages) and other Ubuntu-only features started to wear on me. I wanted straightforward networking, support for older hardware, and a desktop that didn't get in the way of my work. For years, Ubuntu provided that, and I installed it on everything from old netbooks, laptops, towers, and IoT devices. More recently, though, I decided to move to Debian, the upstream Linux distro on which Ubuntu (and derivatives like Linux Mint and others) are built. Unlike Ubuntu, Debian holds fast to a truly solid, stable, non-proprietary mindset — and I can still use the apt package manager I've grown accustomed to. That is, every bit of automation I use (Chef and Ansible mostly) works the same on Debian and Ubuntu. I spent some years switching back and forth between the standard Ubuntu long-term releases and Linux Mint, a truly great Ubuntu-derived desktop Linux. Of course, there are many Debian-based distributions, but I stumbled across LMDE version 6, based on Debian GNU/Linux 12 "Bookworm" and known as Faye, and knew I was onto something truly special. As with the Ubuntu version, LMDE comes with different desktop environments, including the robust Cinnamon, which provides a familiar environment for any Linux, Windows, or macOS user. It's intuitive, chock full of great features (like a multi-function taskbar), and it supports a wide range of customizations. However, it includes no snaps or other Ubuntuisms, and it is amazingly stable. That is, I've not had a single freeze-up or odd glitch, even when pushing it hard with Kdenlive video editing, KVM virtual machines, and Docker containers. According to the folks at Linux Mint, "LMDE is also one of our development targets, as such it guarantees the software we develop is compatible outside of Ubuntu." That means if you're a traditional Linux Mint user, you'll find all the familiar capabilities and features in LMDE. After nearly six months of daily use, that's proven true. As someone who likes to hang on to old hardware, LMDE extended its value to me by supporting both 64- and 32-bit systems. I've since installed it on a 2008 Macbook (32-bit), old ThinkPads, old Dell netbooks, and even a Toshiba Chromebook. Though most of these boxes have less than 3 gigabytes of RAM, LMDE performs well. Cinnamon isn't the lightest desktop around, but it runs smoothly on everything I have. The running joke in the Linux world is that "next year" will be the year the Linux desktop becomes a true Windows and macOS replacement. With Debian Bookworm-powered LMDE, I humbly suggest next year is now. To be fair, on some of my oldest hardware, I've opted for Bunsen. It, too, is a Debian derivative with 64- and 32-bit versions, and I'm using the BunsenLabs Linux Boron version, which uses the Openbox window manager and sips resources: about 400 megabytes of RAM and low CPU usage. With Debian at its core, it's stable and glitch-free. Since deploying LMDE, I've also begun to migrate my virtual machines and containers to Debian 12. Bookworm is amazingly robust and works well on IoT devices, LXCs, and more. Since it, too, has long-term support, I feel confident about its stability — and security — over time. If you're a fan of Ubuntu and Linux Mint, you owe it to yourself to give LMDE a try. As a daily driver, it's truly hard to beat.
Hibernate Hibernate by itself does not have full-text search support. It has to rely on database engine support or third-party solutions. An extension called Hibernate Search integrates with Apache Lucene or Elasticsearch (there is also integration with OpenSearch). Postgres Postgres has had full-text search functionality since version 7.3. Although it can not compete with search engines like Elasticsearch or Lucene, it still provides a flexible and robust solution that might be enough to meet application users' expectations—features like stemming, ranking, and indexing. We will briefly explain how we can do a full-text search in Postgres. For more, please visit Postgres documentation. As for essential text matching, the most crucial part is the math operator @@. It returns true if the document (object of type tsvector) matches the query (object of type tsquery). The order is not crucial for the operator. So, it does not matter if we put the document on the left side of the operator and the query on the right side or in a different order. For better demonstration, we use a database table called the tweet. SQL create table tweet ( id bigint not null, short_content varchar(255), title varchar(255), primary key (id) ) With such data: SQL INSERT INTO tweet (id, title, short_content) VALUES (1, 'Cats', 'Cats rules the world'); INSERT INTO tweet (id, title, short_content) VALUES (2, 'Rats', 'Rats rules in the sewers'); INSERT INTO tweet (id, title, short_content) VALUES (3, 'Rats vs Cats', 'Rats and Cats hates each other'); INSERT INTO tweet (id, title, short_content) VALUES (4, 'Feature', 'This project is design to wrap already existed functions of Postgres'); INSERT INTO tweet (id, title, short_content) VALUES (5, 'Postgres database', 'Postgres is one of the widly used database on the market'); INSERT INTO tweet (id, title, short_content) VALUES (6, 'Database', 'On the market there is a lot of database that have similar features like Oracle'); Now let's see what the tsvector object looks like for the short_content column for each of the records. SQL SELECT id, to_tsvector('english', short_content) FROM tweet; Output: The output shows how to_tsvcector converts the text column to a tsvector object for the 'english' text search configuration. Text Search Configuration The first parameter for the to_tsvector function passed in the above example was the name of the text search configuration. In that case, it was the "english". According to Postgres documentation, the text search configuration is as follows: ... full text search functionality includes the ability to do many more things: skip indexing certain words (stop words), process synonyms, and use sophisticated parsing, e.g., parse based on more than just white space. This functionality is controlled by text search configurations. So, configuration is a crucial part of the process and vital to our full-text search results. For different configurations, the Postgres engine can return different results. This does not have to be the case among dictionaries for different languages. For example, you can have two configurations for the same language, but one ignores names containing digits (for example, some serial numbers). If we pass in our query the specific serial number we are looking for, which is mandatory, we won't find any record for configuration that ignores words with numbers. Even if we have such records in the database, please check the configuration documentation for more information. Text Query Text query supports such operators as & (AND), | (OR), ! (NOT), and <-> (FOLLOWED BY). The first three operators do not require a deeper explanation. The <-> operator checks if words exist and if they are placed in a specific order. So, for example, for the query "rat <-> cat", we expect that the "cat" word is going to exist, followed by the "rat." Examples Content that contains the rat and cat: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Rat & cat'); Content that contains database and market, and the market is the third word after database: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database <3> market'); Content that contains database but not Postgres: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'database & !Postgres'); Content that contains Postgres or Oracle: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ to_tsquery('english', 'Postgres | Oracle'); Wrapper Functions One of the wrapper functions that creates text queries was already mentioned in this article, which is the to_tsquery. There are more such functions like: plainto_tsquery phraseto_tsquery websearch_to_tsquery plainto_tsquery The plainto_tsquery converts all passed words to query where all words are combined with the & (AND) operator. For example, the equivalent of the plainto_tsquery('english', 'Rat cat') is to_tsquery('english', 'Rat & cat'). For the following usage: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ plainto_tsquery('english', 'Rat cat'); We get the result below: phraseto_tsquery The phraseto_tsquery converts all passed words to query where all words are combined with <-> (FOLLOW BY) operator. For example, the equivalent of the phraseto_tsquery('english', 'cat rule') is to_tsquery('english', 'cat <-> rule'). For the following usage: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ phraseto_tsquery('english', 'cat rule'); We get the result below: websearch_to_tsquery The websearch_to_tsquery uses alternative syntax to create a valid text query. Unquoted text: Converts part of syntax in the same way as plainto_tsquery Quoted text: Converts part of syntax in the same way as phraseto_tsquery OR: Converts to "|" (OR) operator "-": Same as "!" (NOT) operator For example, the equivalent of the websearch_to_tsquery('english', '"cat rule" or database -Postgres') is to_tsquery('english', 'cat <-> rule | database & !Postgres'). For the following usage: SQL SELECT t.id, t.short_content FROM tweet t WHERE to_tsvector('english', t.short_content) @@ websearch_to_tsquery('english', '"cat rule" or database -Postgres'); We get the result below: Postgres and Hibernate Native Support As mentioned in the article, Hibernate alone does not have full-text search support. It has to rely on database engine support. This means that we are allowed to execute native SQL queries as shown in the examples below: plainto_tsquery Java public List<Tweet> findBySinglePlainQueryInDescriptionForConfigurationWithNativeSQL(String textQuery, String configuration) { return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ plainto_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList(); } websearch_to_tsquery Java public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescriptionWithNativeSQL(String textQuery, String configuration) { return entityManager.createNativeQuery(String.format("select * from tweet t1_0 where to_tsvector('%1$s', t1_0.short_content) @@ websearch_to_tsquery('%1$s', :textQuery)", configuration), Tweet.class).setParameter("textQuery", textQuery).getResultList(); } Hibernate With posjsonhelper Library The posjsonhelper library is an open-source project that adds support for Hibernate queries for PostgreSQL JSON functions and full-text search. For the Maven project, we need to add the dependencies below: XML <dependency> <groupId>com.github.starnowski.posjsonhelper.text</groupId> <artifactId>hibernate6-text</artifactId> <version>0.3.0</version> </dependency> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-core</artifactId> <version>6.4.0.Final</version> </dependency> To use components that exist in the posjsonhelper library, we need to register them in the Hibernate context. This means that there must be a specified org.hibernate.boot.model.FunctionContributor implementation. The library has an implementation of this interface, that is com.github.starnowski.posjsonhelper.hibernate6.PosjsonhelperFunctionContributor. A file with the name "org.hibernate.boot.model.FunctionContributor" under the "resources/META-INF/services" directory is required to use this implementation. There is another way to register posjsonhelper's component, which can be done through programmability. To see how to do that, check this link. Now, we can use full-text search operators in Hibernate queries. PlainToTSQueryFunction This is a component that wraps the plainto_tsquery function. Java public List<Tweet> findBySinglePlainQueryInDescriptionForConfiguration(String textQuery, String configuration) { CriteriaBuilder cb = entityManager.getCriteriaBuilder(); CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class); Root<Tweet> root = query.from(Tweet.class); query.select(root); query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PlainToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext)); return entityManager.createQuery(query).getResultList(); } For a configuration with the value 'english', the code is going to generate the statement below: Java select t1_0.id, t1_0.short_content, t1_0.title from tweet t1_0 where to_tsvector('english', t1_0.short_content) @@ plainto_tsquery('english', ?); PhraseToTSQueryFunction This component wraps the phraseto_tsquery function. Java public List<Tweet> findBySinglePhraseInDescriptionForConfiguration(String textQuery, String configuration) { CriteriaBuilder cb = entityManager.getCriteriaBuilder(); CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class); Root<Tweet> root = query.from(Tweet.class); query.select(root); query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new PhraseToTSQueryFunction((NodeBuilder) cb, configuration, textQuery), hibernateContext)); return entityManager.createQuery(query).getResultList(); } For configuration with the value 'english', the code is going to generate the statement below: SQL select t1_0.id, t1_0.short_content, t1_0.title from tweet t1_0 where to_tsvector('english', t1_0.short_content) @@ phraseto_tsquery('english', ?) WebsearchToTSQueryFunction This component wraps the websearch_to_tsquery function. Java public List<Tweet> findCorrectTweetsByWebSearchToTSQueryInDescription(String phrase, String configuration) { CriteriaBuilder cb = entityManager.getCriteriaBuilder(); CriteriaQuery<Tweet> query = cb.createQuery(Tweet.class); Root<Tweet> root = query.from(Tweet.class); query.select(root); query.where(new TextOperatorFunction((NodeBuilder) cb, new TSVectorFunction(root.get("shortContent"), configuration, (NodeBuilder) cb), new WebsearchToTSQueryFunction((NodeBuilder) cb, configuration, phrase), hibernateContext)); return entityManager.createQuery(query).getResultList(); } For configuration with the value 'english', the code is going to generate the statement below: SQL select t1_0.id, t1_0.short_content, t1_0.title from tweet t1_0 where to_tsvector('english', t1_0.short_content) @@ websearch_to_tsquery('english', ?) HQL Queries All mentioned components can be used in HQL queries. To check how it can be done, please click this link. Why Use the posjsonhelper Library When We Can Use the Native Approach With Hibernate? Although dynamically concatenating a string that is supposed to be an HQL or SQL query might be easy, implementing predicates would be better practice, especially when you have to handle search criteria based on dynamic attributes from your API. Conclusion As mentioned in the previous article, Postgres full-text search support can be a good alternative for substantial search engines like Elasticsearch or Lucene, in some cases. This could save us from the decision to add third-party solutions to our technology stack, which could also add more complexity and additional costs.
Mark Gardner
Independent Contractor,
The Perl Shop
Nuwan Dias
VP and Deputy CTO,
WSO2
Radivoje Ostojic
Principal Software Engineer,
BrightMarbles
Adam Houghton
Senior Software Developer,
SAS Institute