DZone Research Report: A look at our developer audience, their tech stacks, and topics and tools they're exploring.
Getting Started With Large Language Models: A guide for both novices and seasoned practitioners to unlock the power of language models.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
A Framework for Maintaining Code Security With AI Coding Assistants
AI Fairness 360: A Comprehensive Guide for Developers
Welcome back to the series where we are learning how to integrate AI products into web applications: Intro & Setup Your First AI Prompt Streaming Responses How Does AI Work Prompt Engineering AI-Generated Images Security & Reliability Deploying Last time, we got all the boilerplate work out of the way. In this post, we’ll learn how to integrate OpenAI’s API responses into our Qwik app using fetch. We’ll want to make sure we’re not leaking API keys by executing these HTTP requests from a backend. By the end of this post, we will have a rudimentary, but working AI application. Generate OpenAI API Key Before we start building anything, you’ll need to go to platform.openai.com/account/api-keys and generate an API key to use in your application. Make sure to keep a copy of it somewhere because you will only be able to see it once. With your API key, you’ll be able to make authenticated HTTP requests to OpenAI. So it’s a good idea to get familiar with the API itself. I’d encourage you to take a brief look through the OpenAI Documentation and become familiar with some concepts. The models are particularly good to understand because they have varying capabilities. If you would like to familiarize yourself with the API endpoints, expected payloads, and return values, check out the OpenAI API Reference. It also contains helpful examples. You may notice the JavaScript package available on NPM called openai. We will not be using this, as it doesn’t quite support some things we’ll want to do, that fetch can. Make Your First HTTP Request The application we’re going to build will make an AI-generated text completion based on the user input. For that, we’ll want to work with the chat endpoint (note that the completions endpoint is deprecated). We need to make a POST request to https://api.openai.com/v1/chat/completions with the 'Content-Type' header set to 'application/json', the 'Authorization' set to 'Bearer OPENAI_API_KEY' (you’ll need to replace OPENAI_API_KEY with your API key), and the body set to a JSON string containing the GPT model to use (we’ll use gpt-3.5-turbo) and an array of messages: JavaScript fetch('https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer OPENAI_API_KEY' }, body: JSON.stringify({ 'model': 'gpt-3.5-turbo', 'messages': [ { 'role': 'user', 'content': 'Tell me a funny joke' } ] }) }) You can run this right from your browser console and see the request in the Network tab of your dev tools. The response should be a JSON object with a bunch of properties, but the one we’re most interested in is the "choices". It will be an array of text completions objects. The first one should be an object with an "message" object that has a "content" property with the chat completion. JSON { "id": "chatcmpl-7q63Hd9pCPxY3H4pW67f1BPSmJs2u", "object": "chat.completion", "created": 1692650675, "model": "gpt-3.5-turbo-0613", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Why don't scientists trust atoms?\n\nBecause they make up everything!" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 13, "total_tokens": 25 } } Congrats! Now you can request a mediocre joke whenever you want. Build the Form The fetch the request above is fine, but it’s not quite an application. What we want is something a user can interact with to generate an HTTP request like the one above. For that, we’ll probably want some sort to start with an HTML <form> containing a <textarea>. Below is the minimum markup we need: HTML <form> <label for="prompt">Prompt</label> <textarea id="prompt" name="prompt"></textarea> <button>Tell me</button> </form> We can copy and paste this form right inside our Qwik component’s JSX template. If you’ve worked with JSX in the past, you may be used to replacing the for attribute <label> with htmlFor, but Qwik’s compiler doesn’t require us to do that, so it’s fine as is. Next, we’ll want to replace the default form submission behavior. By default, when an HTML form is submitted, the browser will create an HTTP request by loading the URL provided in the form’s action attribute. If none is provided, it will use the current URL. We want to avoid this page load and use JavaScript instead. If you’ve done this before, you may be familiar with the preventDefault method on the Event interface. As the name suggests, it prevents the default behavior for the event. There’s a challenge here due to how Qwik deals with event handlers. Unlike other frameworks, Qwik does not download all the JavaScript logic for the application upon the first-page load. Instead, it has a very thin client that intercepts user interactions and downloads the JavaScript event handlers on-demand. This asynchronous nature makes Qwik applications much faster to load but introduces the challenge of dealing with event handlers asynchronously. It makes it impossible to prevent the default behavior the same way as synchronous event handlers that are downloaded and parsed before the user interactions. Fortunately, Qwik provides a way to prevent the default behavior by adding preventdefault:{eventName} to the HTML tag. A very basic form example may look something like this: JavaScript import { component$ } from '@builder.io/qwik'; export default component$(() => { return ( <form preventdefault:submit onSubmit$={(event) => { console.log(event) } > <!-- form contents --> </form> ) }) Did you notice that little $ at the end of the onSubmit$ handler, there? Keep an eye out for those, because they are usually a hint to the developer that Qwik’s compiler is going to do something funny and transform the code. In this case, it’s due to the lazy-loading event handling system I mentioned above. Incorporate the Fetch Request Now we have the tools in place to replace the default form submission with the fetch request we created above. What we want to do next is pull the data from the <textarea> into the body of the fetch request. We can do so with, which expects a form element as an argument and provides an API to access a form control values through the control’s name attribute. We can access the form element from the event’s target property, use it to create a new FormData object, and use that to get the <textarea> value by referencing its name, “prompt”. Plug that into the body of the fetch request we wrote above, and you might get something that looks like this: JavaScript export default component$(() => { return ( <form preventdefault:submit onSubmit$={(event) => { const form = event.target const formData = new FormData(form) const prompt = formData.get('prompt') const body = { 'model': 'gpt-3.5-turbo', 'messages': [{ 'role': 'user', 'content': prompt }] } fetch('https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer OPENAI_API_KEY' }, body: JSON.stringify(body) }) } > <!-- form contents --> </form> ) }) In theory, you should now have a form on your page that, when submitted, sends the value from the text to the OpenAI API. Protect Your API Keys Although our HTTP request is working, there’s a glaring issue. Because it’s being constructed on the client side, anyone can open the browser dev tools and inspect the properties of the request. This includes the Authorization header containing our API keys. Inspect properties of the request I’ve blocked out my API token here with a red bar. This would allow someone to steal our API tokens and make requests on our behalf, which could lead to abuse or higher charges on our account. Not good!!! The best way to prevent this is to move this API call to a backend server that we control that would work as a proxy. The frontend can make an unauthenticated request to the backend, and the backend would make the authenticated request to OpenAI and return the response to the frontend. However because users can’t inspect backend processes, they would not be able to see the Authentication header. So how do we move the fetch request to the backend? I’m so glad you asked! We’ve been mostly focusing on building the front end with Qwik, the framework, but we also have access to Qwik City, the full-stack meta-framework with tooling for file-based routing, route middleware, HTTP endpoints, and more. Of the various options Qwik City offers for running backend logic, my favorite is routeAction$. It allows us to create a backend function triggered by the client over HTTP (essentially an RPC endpoint). The logic would follow: Use routeAction$() to create an action. Provide the backend logic as the parameter. Programmatically execute the action’s submit() method. A simplified example could be: JavaScript import { component$ } from '@builder.io/qwik'; import { routeAction$ } from '@builder.io/qwik-city'; export const useAction = routeAction$((params) => { console.log('action on the server', params) return { o: 'k' } }) export default component$(() => { const action = useAction() return ( <form preventdefault:submit onSubmit$={(event) => { action.submit('data') } > <!-- form contents --> </form> { JSON.stringify(action) } ) }) I included a JSON.stringify(action) at the end of the template because I think you should see what the returned ActionStore looks like. It contains extra information like whether the action is running, what the submission values were, what the response status is, what the returned value is, and more. This is all very useful data that we get out of the box just by using an action, and it allows us to create more robust applications with less work. Enhance the Experience Qwik City's actions are cool, but they get even better when combined with Qwik’s <Form> component: Under the hood, the component uses a native HTML element, so it will work without JavaScript. When JS is enabled, the component will intercept the form submission and trigger the action in SPA mode, allowing to have a full SPA experience. By replacing the HTML <form> element with Qwik’s <Form> component, we no longer have to set up preventdefault:submit, onSubmit$, or call action.submit(). We can just pass the action to the action prop and it’ll take care of the work for us. Additionally, it will work if JavaScript is not available for some reason (we could have done this with the HTML version as well, but it would have been more work). JavaScript import { component$ } from '@builder.io/qwik'; import { routeAction$, Form } from '@builder.io/qwik-city'; export const useAction = routeAction$(() => { console.log('action on the server') return { o: 'k' } }); export default component$(() => { const action = useAction() return ( <Form action={action}> <!-- form contents --> </Form> ) }) So that’s an improvement for the developer experience. Let’s also improve the user experience. Within the ActionStore, we have access to the isRunning data which keeps track of whether the request is pending or not. It’s handy information we can use to let the user know when the request is in flight. We can do so by modifying the text of the submit button to say “Tell me” when it’s idle, then “One sec…” while it’s loading. I also like to assign the aria-disabled attribute to match the isRunning state. This will hint to assistive technology that it’s not ready to be clicked (though technically still can be). It can also be targeted with CSS to provide visual styles suggesting it’s not quite ready to be clicked again. HTML <button type="submit" aria-disabled={state.isLoading}> {state.isLoading ? 'One sec...' : 'Tell me'} </button> Show the Results Ok, we’ve done way too much work without actually seeing the results on the page. It’s time to change that. Let’s bring the fetch request we prototyped earlier in the browser into our application. We can copy/paste the fetch code right into the body of our action handler, but to access the user’s input data, we’ll need access to the form data that is submitted. Fortunately, any data passed to the action.submit() method will be available to the action handler as the first parameter. It will be a serialized object where the keys correspond to the form control names. Note that I’ll be using the await keyword in the body of the handler, which means I also have to tag the handler as an async function. JavaScript import { component$ } from '@builder.io/qwik'; import { routeAction$, Form } from '@builder.io/qwik-city'; export const useAction = routeAction$(async (formData) => { const prompt = formData.prompt // From <textarea name="prompt"> const body = { 'model': 'gpt-3.5-turbo', 'messages': [{ 'role': 'user', 'content': prompt }] } const response = await fetch('https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer OPENAI_API_KEY' }, body: JSON.stringify(body) }) const data = await response.json() return data.choices[0].message.content }) At the end of the action handler, we also want to return some data for the front end. The OpenAI response comes back as JSON, but I think we might as well just return the text. If you remember from the response object we saw above, that data is located at responseBody.choices[0].message.content. If we set things up correctly, we should be able to access the action handler’s response in the ActionStore‘s value property. This means we can conditionally render it somewhere in the template like so: JavaScript {action.value && ( <p>{action.value}</p> )} Use Environment Variables Alright, we’ve moved the OpenAI request to the backend, and protected our API keys from prying eyes, we’re getting a (mediocre joke) response, and displaying it on the front end. The app is working, but there’s still one more security issue to deal with. It’s generally a bad idea to hardcode API keys into your source code, for some reasons: It means you can’t share the repo publicly without exposing your keys. You may run up API usage during development, testing, and staging. Changing API keys requires code changes and re-deploys. You’ll need to regenerate API keys anytime someone leaves the org. A better system is to use environment variables. With environment variables, you can provide the API keys only to the systems and users that need access to them. For example, you can make an environment variable called OPENAI_API_KEY with the value of your OpenAI key for only the production environment. This way, only developers with direct access to that environment would be able to access it. This greatly reduces the likelihood of the API keys leaking, it makes it easier to share your code openly, and because you are limiting access to the keys to the least number of people, you don’t need to replace keys as often because someone left the company. In Node.js, it’s common to set environment variables from the command line (ENV_VAR=example npm start) or with the popular dotenv package. Then, in your server-side code, you can access environment variables using process.env.ENV_VAR. Things work slightly differently with Qwik. Qwik can target different JavaScript runtimes (not just Node), and accessing environment variables via process.env is a Node-specific concept. To make things more runtime-agnostic, Qwik provides access to environment variables through an RequestEvent object which is available as the second parameter to the route action handler function. JavaScript import { routeAction$ } from '@builder.io/qwik-city'; export const useAction = routeAction$((param, requestEvent) => { const envVariableValue = requestEvent.env.get('ENV_VARIABLE_NAME') console.log(envVariableValue) return {} }) So that’s how we access environment variables, but how do we set them? Unfortunately, for production environments, setting environment variables will differ depending on the platform. For a standard server VPS, you can still set them with the terminal as you would in Node (ENV_VAR=example npm start). In development, we can alternatively create a local.env file containing our environment variables, and they will be automatically assigned to us. This is convenient since we spend a lot more time starting the development environment, and it means we can provide the appropriate API keys only to the people who need them. So after you create a local.env file, you can assign the OPENAI_API_KEY variable to your API key. YAML OPENAI_API_KEY="your-api-key" (You may need to restart your dev server) Then we can access the environment variable through the RequestEvent parameter. With that, we can replace the hard-coded value in our fetch request’s Authorization header with the variable using Template Literals. JavaScript export const usePromptAction = routeAction$(async (formData, requestEvent) => { const OPENAI_API_KEY = requestEvent.env.get('OPENAI_API_KEY') const prompt = formData.prompt const body = { model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }] } const response = await fetch('https://api.openai.com/v1/chat/completions', { method: 'post', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${OPENAI_API_KEY}`, }, body: JSON.stringify(body) }) const data = await response.json() return data.choices[0].message.content }) For more details on environment variables in Qwik, see their documentation. Summary When a user submits the form, the default behavior is intercepted by Qwik’s optimizer which lazy loads the event handler. The event handler uses JavaScript to create an HTTP request containing the form data to send to the server to be handled by the route’s action. The route’s action handler will have access to the form data in the first parameter and can access environment variables from the second parameter (a RequestEvent object). Inside the route’s action handler, we can construct and send the HTTP request to OpenAI using the data we got from the form and the API keys we pulled from the environment variables. With the OpenAI response, we can prepare the data to send back to the client. The client receives the response from the action and can update the page accordingly. Here’s what my final component looks like, including some Tailwind classes and a slightly different template. JavaScript import { component$ } from "@builder.io/qwik"; import { routeAction$, Form } from "@builder.io/qwik-city"; export const usePromptAction = routeAction$(async (formData, requestEvent) => { const OPENAI_API_KEY = requestEvent.env.get('OPENAI_API_KEY') const prompt = formData.prompt const body = { model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }] } const response = await fetch('https://api.openai.com/v1/chat/completions', { method: 'post', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${OPENAI_API_KEY}`, }, body: JSON.stringify(body) }) const data = await response.json() return data.choices[0].message.content }) export default component$(() => { const action = usePromptAction() return ( <main class="max-w-4xl mx-auto p-4"> <h1 class="text-4xl">Hi</h1> <Form action={action} class="grid gap-4"> <div> <label for="prompt">Prompt</label> <textarea name="prompt" id="prompt"> Tell me a joke </textarea> </div> <div> <button type="submit" aria-disabled={action.isRunning}> {action.isRunning ? 'One sec...' : 'Tell me'} </button> </div> </Form> {action.value && ( <article class="mt-4 border border-2 rounded-lg p-4 bg-[canvas]"> <p>{action.value}</p> </article> )} </main> ); }); Conclusion All right! We’ve gone from a script that uses AI to get mediocre jokes to a full-blown application that securely makes HTTP requests to a backend that uses AI to get mediocre jokes and sends them back to the front end to put those mediocre jokes on a page. You should feel pretty good about yourself. But not too good, because there’s still room to improve. In our application, we are sending a request and getting an AI response, but we are waiting for the entirety of the body of that response to be generated before showing it to the users. These AI responses can take a while to complete. If you’ve used AI chat tools in the past, you may be familiar with the experience where it looks like it’s typing the responses to you, one word at a time, as they’re being generated. This doesn’t speed up the total request time, but it does get some information back to the user much sooner and feels like a faster experience. In the next post, we’ll learn how to build that same feature using HTTP streams, which are fascinating and powerful but also can be kind of confusing. So I’m going to dedicate an entire post just to that. I hope you’re enjoying this series and plan to stick around. In the meantime, have fun generating some mediocre jokes. Thank you so much for reading. If you liked this article, and want to support me, the best ways to do so are to share it and follow me on Twitter.
In this post, you will learn how you can integrate Large Language Model (LLM) capabilities into your Java application. More specifically, how you can integrate with LocalAI from your Java application. Enjoy! Introduction In a previous post, it was shown how you could run a Large Language Model (LLM) similar to OpenAI by means of LocalAI. The Rest API of OpenAI was used in order to interact with LocalAI. Integrating these capabilities within your Java application can be cumbersome. However, since the introduction of LangChain4j, this has become much easier to do. LangChain4j offers you a simplification in order to integrate with LLMs. It is based on the Python library LangChain. It is therefore also advised to read the documentation and concepts of LangChain since the documentation of LangChain4j is rather short. Many examples are provided though in the LangChain4j examples repository. Especially, the examples in the other-examples directory have been used as inspiration for this blog. The real trigger for writing this blog was the talk I attended about LangChain4j at Devoxx Belgium. This was the most interesting talk I attended at Devoxx: do watch it if you can make time for it. It takes only 50 minutes. The sources used in this blog can be found on GitHub. Prerequisites The prerequisites for this blog are: Basic knowledge about what a Large Language Model is Basic Java knowledge (Java 21 is used) You need LocalAI if you want to run the examples (see the previous blog linked in the introduction on how you can make use of LocalAI). Version 2.2.0 is used for this blog. LangChain4j Examples In this section, some of the capabilities of LangChain4j are shown by means of examples. Some of the examples used in the previous post are now implemented using LangChain4j instead of using curl. How Are You? As a first simple example, you ask the model how it is feeling. In order to make use of LangChain4j in combination with LocalAI, you add the langchain4j-local-ai dependency to the pom file. XML <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-local-ai</artifactId> <version>0.24.0</version> </dependency> In order to integrate with LocalAI, you create a ChatLanguageModel specifying the following items: The URL where the LocalAI instance is accessible The name of the model you want to use in LocalAI The temperature: A high temperature allows the model to respond in a more creative way. Next, you ask the model to generate an answer to your question and you print the answer. Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.9) .build(); String answer = model.generate("How are you?"); System.out.println(answer); Start LocalAI and run the example above. The response is as expected. Shell I'm doing well, thank you. How about yourself? Before continuing, note something about the difference between LanguageModel and ChatLanguageModel. Both classes are available in LangChain4j, so which one to choose? A chat model is a variation of a language model. If you need a "text in, text out" functionality, you can choose LanguageModel. If you also want to be able to use "chat messages" as input and output, you should use ChatLanguageModel. In the example above, you could just have used LanguageModel and it would behave similarly. Facts About Famous Soccer Player Let’s verify whether it also returns facts about the famous Dutch soccer player Johan Cruijff. You use the same code as before, only now you set the temperature to zero because no creative answer is required. Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); String answer = model.generate("who is Johan Cruijff?"); System.out.println(answer); Run the example, the response is as expected. Shell Johan Cruyff was a Dutch professional football player and coach. He played as a forward for Ajax, Barcelona, and the Netherlands national team. He is widely regarded as one of the greatest players of all time and was known for his creativity, skill, and ability to score goals from any position on the field. Stream the Response Sometimes, the answer will take some time. In the OpenAPI specification, you can set the stream parameter to true in order to retrieve the response character by character. This way, you can display the response already to the user before awaiting the complete response. This functionality is also available with LangChain4j but requires the use of a StreamingResponseHandler. The onNext method receives every character one by one. The complete response is gathered in the answerBuilder and futureAnswer. Running this example prints every single character one by one, and at the end, the complete response is printed. Java StreamingChatLanguageModel model = LocalAiStreamingChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); StringBuilder answerBuilder = new StringBuilder(); CompletableFuture<String> futureAnswer = new CompletableFuture<>(); model.generate("who is Johan Cruijff?", new StreamingResponseHandler<AiMessage>() { @Override public void onNext(String token) { answerBuilder.append(token); System.out.println(token); } @Override public void onComplete(Response<AiMessage> response) { futureAnswer.complete(answerBuilder.toString()); } @Override public void onError(Throwable error) { futureAnswer.completeExceptionally(error); } }); String answer = futureAnswer.get(90, SECONDS); System.out.println(answer); Run the example. The response is as expected. Shell J o h a n ... s t y l e . Johan Cruijff was a Dutch professional football player and coach who played as a forward. ... Other Languages You can instruct the model by means of a system message how it should behave. For example, you can instruct it to answer always in a different language; Dutch, in this case. This example shows clearly the difference between LanguageModel and ChatLanguageModel. You have to use ChatLanguageModel in this case because you need to interact by means of chat messages with the model. Create a SystemMessage to instruct the model. Create a UserMessage for your question. Add them to a list and send the list of messages to the model. Also, note that the response is an AiMessage. The messages are explained as follows: UserMessage: A ChatMessage coming from a human/user AiMessage: A ChatMessage coming from an AI/assistant SystemMessage: A ChatMessage coming from the system Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); SystemMessage responseInDutch = new SystemMessage("You are a helpful assistant. Antwoord altijd in het Nederlands."); UserMessage question = new UserMessage("who is Johan Cruijff?"); var chatMessages = new ArrayList<ChatMessage>(); chatMessages.add(responseInDutch); chatMessages.add(question); Response<AiMessage> response = model.generate(chatMessages); System.out.println(response.content()); Run the example, the response is as expected. Shell AiMessage { text = "Johan Cruijff was een Nederlands voetballer en trainer. Hij speelde als aanvaller en is vooral bekend van zijn tijd bij Ajax en het Nederlands elftal. Hij overleed in 1996 op 68-jarige leeftijd." toolExecutionRequest = null } Chat With Documents A fantastic use case is to use an LLM in order to chat with your own documents. You can provide the LLM with your documents and ask questions about it. For example, when you ask the LLM for which football clubs Johan Cruijff played ("For which football teams did Johan Cruijff play and also give the periods, answer briefly"), you receive the following answer. Shell Johan Cruijff played for Ajax Amsterdam (1954-1973), Barcelona (1973-1978) and the Netherlands national team (1966-1977). This answer is quite ok, but it is not complete, as not all football clubs are mentioned and the period for Ajax includes also his youth period. The correct answer should be: Years Team 1964-1973 Ajax 1973-1978 Barcelona 1979 Los Angeles Aztecs 1980 Washington Diplomats 1981 Levante 1981 Washington Diplomats 1981-1983 Ajax 1983-1984 Feyenoord Apparently, the LLM does not have all relevant information and that is not a surprise. The LLM has some basic knowledge, it runs locally and has its limitations. But what if you could provide the LLM with extra information in order that it can give an adequate answer? Let’s see how this works. First, you need to add some extra dependencies to the pom file: XML <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j</artifactId> <version>${langchain4j.version}</version> </dependency> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-embeddings</artifactId> <version>${langchain4j.version}</version> </dependency> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId> <version>${langchain4j.version}</version> </dependency> Save the Wikipedia text of Johan Cruijff to a PDF file and store it in src/main/resources/example-files/Johan_Cruyff.pdf. The source code to add this document to the LLM consists of the following parts: The text needs to be embedded; i.e., the text needs to be converted to numbers. An embedding model is needed for that, for simplicity you use the AllMiniLmL6V2EmbeddingModel. The embeddings need to be stored in an embedding store. Often a vector database is used for this purpose, but in this case, you can use an in-memory embedding store. The document needs to be split into chunks. For simplicity, you split the document into chunks of 500 characters. All of this comes together in the EmbeddingStoreIngestor. Add the PDF to the ingestor. Create the ChatLanguageModel just like you did before. With a ConversationalRetrievalChain, you connect the language model with the embedding store and model. And finally, you execute your question. Java EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder() .documentSplitter(DocumentSplitters.recursive(500, 0)) .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) .build(); Document johanCruiffInfo = loadDocument(toPath("example-files/Johan_Cruyff.pdf")); ingestor.ingest(johanCruiffInfo); ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); ConversationalRetrievalChain chain = ConversationalRetrievalChain.builder() .chatLanguageModel(model) .retriever(EmbeddingStoreRetriever.from(embeddingStore, embeddingModel)) .build(); String answer = chain.execute("Give all football teams Johan Cruijff played for in his senior career"); System.out.println(answer); When you execute this code, an exception is thrown. Shell Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.io.InterruptedIOException: timeout at dev.langchain4j.internal.RetryUtils.withRetry(RetryUtils.java:29) at dev.langchain4j.model.localai.LocalAiChatModel.generate(LocalAiChatModel.java:98) at dev.langchain4j.model.localai.LocalAiChatModel.generate(LocalAiChatModel.java:65) at dev.langchain4j.chain.ConversationalRetrievalChain.execute(ConversationalRetrievalChain.java:65) at com.mydeveloperplanet.mylangchain4jplanet.ChatWithDocuments.main(ChatWithDocuments.java:55) Caused by: java.lang.RuntimeException: java.io.InterruptedIOException: timeout at dev.ai4j.openai4j.SyncRequestExecutor.execute(SyncRequestExecutor.java:31) at dev.ai4j.openai4j.RequestExecutor.execute(RequestExecutor.java:59) at dev.langchain4j.model.localai.LocalAiChatModel.lambda$generate$0(LocalAiChatModel.java:98) at dev.langchain4j.internal.RetryUtils.withRetry(RetryUtils.java:26) ... 4 more Caused by: java.io.InterruptedIOException: timeout at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398) at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360) at okhttp3.internal.connection.RealCall.noMoreExchanges$okhttp(RealCall.kt:325) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:209) at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154) at retrofit2.OkHttpCall.execute(OkHttpCall.java:204) at dev.ai4j.openai4j.SyncRequestExecutor.execute(SyncRequestExecutor.java:23) ... 7 more Caused by: java.net.SocketTimeoutException: timeout at okio.SocketAsyncTimeout.newTimeoutException(JvmOkio.kt:147) at okio.AsyncTimeout.access$newTimeoutException(AsyncTimeout.kt:158) at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:337) at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:427) at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:320) at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29) at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:178) at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:106) at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:79) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at dev.ai4j.openai4j.ResponseLoggingInterceptor.intercept(ResponseLoggingInterceptor.java:21) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at dev.ai4j.openai4j.RequestLoggingInterceptor.intercept(RequestLoggingInterceptor.java:31) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at dev.ai4j.openai4j.AuthorizationHeaderInjector.intercept(AuthorizationHeaderInjector.java:25) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) ... 10 more Caused by: java.net.SocketException: Socket closed at java.base/sun.nio.ch.NioSocketImpl.endRead(NioSocketImpl.java:243) at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346) at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796) at java.base/java.net.Socket$SocketInputStream.read(Socket.java:1099) at okio.InputStreamSource.read(JvmOkio.kt:94) at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:125) ... 32 more This can be solved by setting the timeout of the language model to a higher value. Java ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .timeout(Duration.ofMinutes(5)) .build(); Run the code again, and the following answer is received, which is correct. Shell Johan Cruijff played for the following football teams in his senior career: - Ajax (1964-1973) - Barcelona (1973-1978) - Los Angeles Aztecs (1979) - Washington Diplomats (1980-1981) - Levante (1981) - Ajax (1981-1983) - Feyenoord (1983-1984) - Netherlands national team (1966-1977) Using a 1.x version of LocalAI gave this response, which was worse. Shell Johan Cruyff played for the following football teams: - Ajax (1964-1973) - Barcelona (1973-1978) - Los Angeles Aztecs (1979) The following steps were used to solve this problem. When you take a closer look at the PDF file, you notice that the information about the football teams is listed in a table next to the regular text. Remember that splitting the document was done by creating chunks of 500 characters. So, maybe this splitting is not executed well enough for the LLM. Copy the football teams in a separate text document. Plain Text Years Team Apps (Gls) 1964–1973 Ajax 245 (193) 1973–1978 Barcelona 143 (48) 1979 Los Angeles Aztecs 22 (14) 1980 Washington Diplomats 24 (10) 1981 Levante 10 (2) 1981 Washington Diplomats 5 (2) 1981–1983 Ajax 36 (14) 1983–1984 Feyenoord 33 (11) Add both documents to the ingestor. Java Document johanCruiffInfo = loadDocument(toPath("example-files/Johan_Cruyff.pdf")); Document clubs = loadDocument(toPath("example-files/Johan_Cruyff_clubs.txt")); ingestor.ingest(johanCruiffInfo, clubs); Run this code and this time, the answer was correct and complete. Shell Johan Cruijff played for the following football teams in his senior career: - Ajax (1964-1973) - Barcelona (1973-1978) - Los Angeles Aztecs (1979) - Washington Diplomats (1980-1981) - Levante (1981) - Ajax (1981-1983) - Feyenoord (1983-1984) - Netherlands national team (1966-1977) It is therefore important that the sources you provide to an LLM are split wisely. Besides that, the used technologies improve in a rapid way. Even while writing this blog, some problems were solved in a couple of weeks. Updating to a more recent version of LocalAI for example, solved one way or the other the problem with parsing the single PDF. Conclusion In this post, you learned how to integrate an LLM from within your Java application using LangChain4j. You also learned how to chat with documents, which is a fantastic use case! It is also important to regularly update to newer versions as the development of these AI technologies improves continuously.
The Spring AI is a new project of the Spring ecosystem that streamlines the creation of AI applications in Java. By using Spring AI together with PostgreSQL pgvector, you can build generative AI applications that draw insights from your data. First, this article introduces you to the Spring AI ChatClient that uses the OpenAI GPT-4 model to generate recommendations based on user prompts. Next, the article shows how to deploy PostgreSQL with the PGVector extension and perform vector similarity searches using the Spring AI EmbeddingClient and Spring JdbcClient. Adding Spring AI Dependency Spring AI supports many large language model (LLM) providers, with each LLM having its own Spring AI dependency. Let's assume that you prefer working with OpenAI models and APIs. Then, you need to add the following dependency to a project: XML <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> <version>{latest.version}</version> </dependency> Also, at the time of writing, Spring AI was in active development, with the framework artifacts being released in the Spring Milestone and/or Snapshot repositories. Thus, if you still can't find Spring AI on https://start.spring.io/, then add the repositories to the pom.xml file: XML <repositories> <repository> <id>spring-milestones</id> <name>Spring Milestones</name> <url>https://repo.spring.io/milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>spring-snapshots</id> <name>Spring Snapshots</name> <url>https://repo.spring.io/snapshot</url> <releases> <enabled>false</enabled> </releases> </repository> </repositories> Setting Up OpenAI Module The OpenAI module comes with several configuration properties, allowing the management of connectivity-related settings and fine-tuning the behavior of OpenAI models. At a minimum, you need to provide your OpenAI API key, which will be used by Spring AI to access GPT and embedding models. Once the key is created, add it to the application.properties file: Properties files spring.ai.openai.api-key=sk-... Then, if necessary, you can select particular GPT and embedding models: Properties files spring.ai.openai.chat.model=gpt-4 spring.ai.openai.embedding.model=text-embedding-ada-002 In the end, you can test that the OpenAI module is configured properly by implementing a simple assistant with Spring AI's ChatClient: Java // Inject the ChatClient bean @Autowired private ChatClient aiClient; // Create a system message for ChatGPT explaining the task private static final SystemMessage SYSTEM_MESSAGE = new SystemMessage( """ You're an assistant who helps to find lodging in San Francisco. Suggest three options. Send back a JSON object in the format below. [{\"name\": \"<hotel name>\", \"description\": \"<hotel description>\", \"price\": <hotel price>}] Don't add any other text to the response. Don't add the new line or any other symbols to the response. Send back the raw JSON. """); public void searchPlaces(String prompt) { // Create a Spring AI prompt with the system message and the user message Prompt chatPrompt = new Prompt(List.of(SYSTEM_MESSAGE, new UserMessage(prompt))); // Send the prompt to ChatGPT and get the response ChatResponse response = aiClient.generate(chatPrompt); // Get the raw JSON from the response and print it String rawJson = response.getGenerations().get(0).getContent(); System.out.println(rawJson); } For the sake of the experiment, if you pass the "I'd like to stay near the Golden Gate Bridge" prompt, then the searchPlaces the method might provide lodging recommendations as follows: JSON [ {"name": "Cavallo Point", "description": "Historic hotel offering refined rooms, some with views of the Golden Gate Bridge, plus a spa & dining.", "price": 450}, {"name": "Argonaut Hotel", "description": "Upscale, nautical-themed hotel offering Golden Gate Bridge views, plus a seafood restaurant.", "price": 300}, {"name": "Hotel Del Sol", "description": "Colorful, retro hotel with a pool, offering complimentary breakfast & an afternoon cookies reception.", "price": 200} ] Starting Postgres With PGVector If you run the previous code snippet with the ChatClient, you'll notice that it usually takes over 10 seconds for the OpenAI GPT model to generate a response. The model has a broad and deep knowledge base, and it takes time to produce a relevant response. Apart from the high latency, the GPT model might not have been trained on data that is relevant to your application workload. Thus, it might generate responses that are far from being satisfactory for the user. However, you can always expedite the search and provide users with accurate responses if you generate embeddings on a subset of your data and then let Postgres work with those embeddings. The pgvector extension allows storing and querying vector embeddings in Postgres. The easiest way to start with PGVector is by starting a Postgres instance with the extension in Docker: Shell mkdir ~/postgres-volume/ docker run --name postgres \ -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \ -p 5432:5432 \ -v ~/postgres-volume/:/var/lib/postgresql/data -d ankane/pgvector:latest Once started, you can connect to the container and enable the extension by executing the CREATE EXTENSION vector statement: Shell docker exec -it postgres psql -U postgres -c 'CREATE EXTENSION vector' Lastly, add the Postgres JDBC driver dependency to the pom.xml file: XML <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <version>{latest.version}</version> </dependency> Configure the Spring DataSource by adding the following settings to the application.properties file: Properties files spring.datasource.url = jdbc:postgresql://127.0.0.1:5432/postgres spring.datasource.username = postgres spring.datasource.password = password Performing Vector Similarity Search With Spring AI At a minimum, the vector similarity search is a two-step process. First, you need to use an embedding model to generate a vector/embedding for a provided user prompt or other text. Spring AI supports the EmbeddingClient that connects to OpenAI's or other providers' embedding models and generates a vectorized representation for the text input: Java // Inject the Spring AI Embedding client @Autowired private EmbeddingClient aiClient; public List<Place> searchPlaces(String prompt) { // Use the Embedding client to generate a vector for the user prompt List<Double> promptEmbedding = aiClient.embed(prompt); ... } Second, you use the generated embedding to perform a similarity search across vectors stored in the Postgres database. For instance, you can use the Spring JdbcClient for this task: Java @Autowired private JdbcClient jdbcClient; // Inject the Spring AI Embedding client @Autowired private EmbeddingClient aiClient; public List<Place> searchPlaces(String prompt) { // Use the Embedding client to generate a vector for the user prompt List<Double> promptEmbedding = aiClient.embed(prompt); // Perform the vector similarity search StatementSpec query = jdbcClient.sql( "SELECT name, description, price " + "FROM airbnb_listing WHERE 1 - (description_embedding <=> :user_promt::vector) > 0.7 " + "ORDER BY description_embedding <=> :user_promt::vector LIMIT 3") .param("user_promt", promptEmbedding.toString()); // Return the recommended places return query.query(Place.class).list(); } The description_embedding column stores embeddings that were pre-generated for Airbnb listing overviews from the description column. The Airbnb embeddings were produced by the same model that is used by Spring AI's EmbeddingClient for the user prompts. Postgres uses PGVector to calculate the cosine distance (<=>) between the Airbnb and user prompt embeddings (description_embedding <=> :user_prompt::vector) and then returns only those Airbnb listings whose description is > 0.7 similar to the provided user prompt. The similarity is measured as a value in the range from 0 to 1. The closer the similarity to 1, the more related the vectors are. What's Next Spring AI and PostgreSQL PGVector provide all the essential capabilities needed for building generative AI applications in Java. If you're curious to learn more, watch this hands-on tutorial. It guides you through the process of creating a lodging recommendation service in Java from scratch, optimizing similarity searches with specialized indexes, and scaling with distributed Postgres (YugabyteDB):
The Urgency of Data Privacy in a Connected World Recent years have witnessed a mounting concern about data privacy, and these concerns are not unfounded. In a world where connectivity is ubiquitous, the statistics paint a compelling picture. According to a report by Cisco, the number of connected devices worldwide is projected to reach a staggering 29.3 billion by 2023. This exponential growth for the Internet of Things (IoT) devices underscores the urgent need for robust privacy measures. Furthermore, a survey conducted by the Pew Research Center has revealed that a significant 79% of Americans express concern about the way their data is being utilized by companies. This growing awareness among users regarding their digital privacy signifies a shifting paradigm where individuals are increasingly vigilant about safeguarding their personal information. Edge AI’s Ascension: A Response to Privacy Concerns In tandem with the rising tide of privacy concerns, the adoption of Edge AI has surged. In 2021, the edge AI market was valued at USD 11.98 billion, and forecasts suggest it will reach an impressive USD 107.47 billion by 2029, according to Fortune Business Insights. This rapid expansion underscores the relevance and significance of Edge AI in the ever-evolving landscape of artificial intelligence. Gartner, in its 2023 Hype Cycle for Artificial Intelligence, predicts that Edge AI is poised to enter the mainstream within the next two years. This prediction reaffirms the technology's growing prominence and its pivotal role in addressing the privacy concerns of the digital age. Tech Giants Embrace Privacy-Centric Approaches In response to these escalating concerns and the burgeoning market demands for privacy-centric solutions, tech industry titans like Apple have taken a proactive stance. They have embraced innovative approaches designed to prioritize user privacy without sacrificing advanced functionalities. Apple, for instance, has introduced features such as "on-device processing" and "differential privacy." These features ensure that user data remains exclusively on the device, bolstering privacy safeguards while still offering cutting-edge functionality. Similarly, Google has committed to developing privacy-preserving machine learning techniques. This commitment underscores the tech industry's collective shift towards privacy-conscious AI solutions, heralding a future where technology not only empowers users but also safeguards their personal data. Privacy-Preserving Techniques in Edge AI Some examples of privacy-preserving techniques used in edge AI include: Differential privacy: This technique allows for the anonymization of data to prevent the identification of individual information while still providing valuable insights. Encryption: By encrypting data, unauthorized access can be prevented, enhancing privacy and security. However, it's important to consider the potential overhead and complexity that encryption can add. Anonymization: Anonymizing data involves removing or masking identifying information, thus protecting the privacy of the individuals associated with the data. Federated learning: This approach enables training machine learning models across decentralized edge devices without exchanging the data, thus preserving privacy. Homomorphic encryption: This technique allows for computations to be performed on encrypted data without decrypting it, maintaining the privacy of the information throughout the process. Homomorphic Encryption Homomorphic Encryption is a revolutionary technology that stands as a guardian of privacy in the realm of Edge AI. It allows computations to be performed on encrypted data without the need for decryption. In an age where data privacy and security are paramount, Homomorphic Encryption emerges as a formidable solution, particularly at the edge. At its core, Homomorphic Encryption enables secure data processing without ever exposing sensitive information. This is a game-changer in scenarios where preserving data privacy is non-negotiable, such as healthcare, finance, and personal devices. Imagine a wearable healthcare device equipped with Edge AI capabilities. With Homomorphic Encryption, it can process your health data locally without the need to reveal the raw information. This ensures that your private health information remains confidential, even during data analysis. The Practical Implications of Homomorphic Encryption The practical implications of Homomorphic Encryption are profound. In the financial sector, sensitive transactions can be securely processed at the edge, guaranteeing confidentiality while harnessing the power of AI. In personal devices like smartphones, your data can be analyzed without leaving the device, preserving your privacy at all times. It's a remarkable blend of AI capabilities and data security that epitomizes the essence of Edge AI. The Future of Edge AI and Data Privacy As we look ahead, Edge AI is poised to redefine the boundaries of technology and privacy. It promises an era where intelligent decision-making is not just swift but also profoundly secure. With data processed locally, users regain control over their information, ensuring that privacy is not just an option but a fundamental right. In this intelligent, efficient, and secure future, sectors like autonomous vehicles, healthcare, and the Internet of Things (IoT) flourish. Energy is optimized, traffic flows smoothly, and emergency responses are swift and precise. Augmented reality transforms education and gaming while healthcare embraces personalized monitoring. Edge AI becomes a beacon of sustainability, conserving energy and driving environmental conservation efforts. Conclusion Edge AI represents a monumental shift in the AI landscape — one that champions both innovation and discretion. It signifies a future where technology is seamlessly integrated into our daily lives, respecting the sanctity of our privacy. In the interconnected world we inhabit, Edge AI stands as a sentinel, ensuring that in the age of AI, privacy remains an unwavering cornerstone. It is not just a technological advancement but a societal assurance — a promise that privacy is not a trade-off for progress but a bedrock upon which innovation thrives. As users become more conscious of their digital footprint, the adoption of Edge AI is not just a technological evolution; it is a societal affirmation that privacy is not an afterthought but a fundamental right. In a world powered by Edge AI, intelligence knows no bounds, and privacy is inviolable.
GenAI is everywhere you look, and organizations across industries are putting pressure on their teams to join the race – 77% of business leaders fear they’re already missing out on the benefits of GenAI. Data teams are scrambling to answer the call. But building a generative AI model that actually drives business value is hard. And in the long run, a quick integration with the OpenAI API won’t cut it. It’s GenAI, but where’s the moat? Why should users pick you over ChatGPT? That quick check of the box feels like a step forward. Still, if you aren’t already thinking about how to connect LLMs with your proprietary data and business context actually to drive differentiated value, you’re behind. That’s not hyperbole. This week, I’ve talked with half a dozen data leaders on this topic alone. It wasn’t lost on any of them that this is a race. At the finish line, there are going to be winners and losers: the Blockbusters and the Netflixes. If you feel like the starter’s gun has gone off, but your team is still at the starting line stretching and chatting about “bubbles” and “hype,” I’ve rounded up five hard truths to help shake off the complacency. 1. Your Generative AI Features Are Not Well Adopted, and You’re Slow to Monetize “Barr, if generative AI is so important, why are the current features we’ve implemented so poorly adopted?” Well, there are a few reasons. One, your AI initiative wasn’t built to respond to an influx of well-defined user problems. For most data teams, that’s because you’re racing, and it’s early, and you want to gain some experience. However, it won’t be long before your users have a problem that GenAI best solves, and when that happens – you will have much better adoption compared to your tiger team brainstorming ways to tie GenAI to a use case. And because it’s early, the generative AI features that have been integrated are just “ChatGPT but over here.” Let me give you an example. Think about a productivity application you might use every day to share organizational knowledge. An app like this might offer a feature to execute commands like “Summarize this,” “Make longer,” or “Change tone” on blocks of unstructured text. One command equals one AI credit. Yes, that’s helpful, but it’s not differentiated. Maybe the team decides to buy some AI credits, or perhaps they just simply click over on the other tab and ask ChatGPT. I don’t want to completely overlook or discount the benefit of not exposing proprietary data to ChatGPT. Still, it’s also a smaller solution and vision than what’s being painted on earnings calls across the country. That pesky middle step from concept to value. So consider: What’s your GenAI differentiator and value add? Let me give you a hint: high-quality proprietary data. That’s why a RAG model (or sometimes, a fine-tuned model) is so important for Gen AI initiatives. It gives the LLM access to that enterprise's proprietary data. (I’ll explain why below.) 2. You’re Scared To Do More With Gen AI It’s true: generative AI is intimidating. Sure, you could integrate your AI model more deeply into your organization’s processes, but that feels risky. Let’s face it: ChatGPT hallucinates and can’t be predicted. There’s a knowledge cutoff that leaves users susceptible to out-of-date output. There are legal repercussions to data mishandling and providing consumers with misinformation, even if accidental. Sounds real enough, right? Llama 2 sure thinks so. Your data mishaps have consequences. And that’s why it’s essential to know exactly what you are feeding GenAI and that the data is accurate. In an anonymous survey, we sent to data leaders asking how far away their team is from enabling a Gen AI use case, one response was, “I don’t think our infrastructure is the thing holding us back. We’re treading quite cautiously here – with the landscape moving so fast and the risk of reputational damage from a ‘rogue’ chatbot, we’re holding fire and waiting for the hype to die down a bit!” This is a widely shared sentiment across many data leaders I speak to. If the data team has suddenly surfaced customer-facing, secure data, then they’re on the hook. Data governance is a massive consideration and a high bar to clear. These are real risks that need solutions, but you won’t solve them by sitting on the sideline. There is also a real risk of watching your business being fundamentally disrupted by the team that figured it out first. Grounding LLMs in your proprietary data with fine tuning and RAG is a big piece to this puzzle, but it’s not easy… 3. RAG Is Hard I believe that RAG (retrieval augmented generation) and fine-tuning are the centerpieces of the future of enterprise generative AI. However, RAG is a simpler approach in most cases; developing RAG apps can still be complex. Can’t we all just start RAGing? What’s the big deal? RAG might seem like the obvious solution for customizing your LLM. But RAG development comes with a learning curve, even for your most talented data engineers. They need to know prompt engineering, vector databases and embedding vectors, data modeling, data orchestration, data pipelines, and all for RAG. And, because it’s new (introduced by Meta AI in 2020), many companies just don’t yet have enough experience with it to establish best practices. RAG implementation architecture Here’s an oversimplification of RAG application architecture: RAG architecture combines information retrieval with a text generator model, so it has access to your database while trying to answer a question from the user. The database has to be a trusted source that includes proprietary data, and it allows the model to incorporate up-to-date and reliable information into its responses and reasoning. In the background, a data pipeline ingests various structured and unstructured sources into the database to keep it accurate and up-to-date. The RAG chain takes the user query (text) and retrieves relevant data from the database, then passes that data and the query to the LLM in order to generate a highly accurate and personalized response. There are a lot of complexities in this architecture, but it does have important benefits: It grounds your LLM in accurate proprietary data, thus making it so much more valuable. It brings your models to your data rather than bringing your data to your models, which is a relatively simple, cost-effective approach. We can see this becoming a reality in the Modern Data Stack. The biggest players are working at a breakneck speed to make RAG easier by serving LLMs within their environments, where enterprise data is stored. Snowflake Cortex now enables organizations to analyze data and build AI apps directly in Snowflake quickly. Databricks’ new Foundation Model APIs provide instant access to LLMs directly within Databricks. Microsoft released Microsoft Azure OpenAI Service, and Amazon recently launched the Amazon Redshift Query Editor. Snowflake data cloud I believe all of these features have a good chance of driving high adoption. But, they also heighten the focus on data quality in these data stores. If the data feeding your RAG pipeline is anomalous, outdated, or otherwise untrustworthy data, what’s the future of your generative AI initiative? 4. Your Data Isn’t Ready Yet Anyway Take a good, hard look at your data infrastructure. Chances are, if you had a perfect RAG pipeline, fine-tuned model, and clear use case ready to go tomorrow (and wouldn’t that be nice?), you still wouldn’t have clean, well-modeled datasets to plug it all into. Let’s say you want your chatbot to interface with a customer. To do anything useful, it needs to know about that organization’s relationship with the customer. If you’re an enterprise organization today, that relationship is likely defined across 150 data sources and five siloed databases…3 of which are still on-prem. If that describes your organization, it’s possible you are a year (or two!) away from your data infrastructure being GenAI-ready. This means if you want the option to do something with GenAI someday soon, you need to be creating useful, highly reliable, consolidated, well-documented datasets in a modern data platform… yesterday. Or the coach will call you into the game, and your pants will be down. Your data engineering team is the backbone for ensuring data health. A modern data stack enables the data engineering team to monitor data quality continuously in the future. It’s 2024 now. Launching a website, application, or any data product without data observability is a risk. Your data is a product, requiring data observability and governance to pinpoint data discrepancies before they move through an RAG pipeline. 5. You’ve Sidelined Critical Gen AI Players Without Knowing It Generative AI is a team sport, especially when it comes to development. Many data teams make the mistake of excluding key players from their Gen AI tiger teams, and that’s costing them in the long run. Who should be on an AI tiger team? Leadership, or a primary business stakeholder, to spearhead the initiative and remind the group of the business value. Software engineers will develop the code, the user-facing application, and the API calls. Data scientists consider new use cases, fine-tune their models, and push the team in new directions. Who’s missing here? Data engineers. Data engineers are critical to Gen AI initiatives. They will be able to understand the proprietary business data that provides the competitive advantage over a ChatGPT, and they will build the pipelines that make that data available to the LLM via RAG. If your data engineers aren’t in the room, your tiger team is not at full strength. The most pioneering companies in GenAI are telling me they are already embedding data engineers in all development squads. Winning the GenAI Race If any of these hard truths apply to you, don’t worry. Generative AI is in such nascent stages that there’s still time to start back over and, this time, embrace the challenge. Take a step back to understand the customer needs an AI model can solve, bring data engineers into earlier development stages to secure a competitive edge from the start, and take the time to build a RAG pipeline that can supply a steady stream of high-quality, reliable data. And invest in a modern data stack. Tools like data observability will be a core component of data quality best practices – and generative AI without high-quality data is just a whole lot of fluff.
To ensure the safety of rail traffic, non-destructive testing of rails is regularly carried out using various approaches and methods. One of the main approaches to determining the operational condition of railway rails is ultrasonic non-destructive testing. The assessment of the test results depends on the defectoscopist. The need to reduce the workload on humans and improve the efficiency of the process of analyzing ultrasonic testing data makes the task of creating an automated system relevant. The purpose of this work is to evaluate the possibility of creating an effective system for recognizing rail defects from ultrasonic inspection defectograms using ML methods. Domain Analysis The railway track consists of rail sections connected together by bolts and welded joints. When a defectoscope device equipped with generating piezoelectric transducers (PZTs) passes along the railway track, ultrasonic pulses are emitted into the rail at a predetermined frequency. The receiving PZTs then register the reflected waves. The detectability of defects by the ultrasonic method is based on the principle of reflection of waves from inhomogeneities in the metal since cracks, including other inhomogeneities, differ in their acoustic resistance from the rest of the metal. Principle of A-Scan Formation The registered signal reflected from a bolt hole with a perpendicular input of the probing pulse to the rail surface is presented in Figure 1. The image of such a signal is called «Amplitude scan» or abbreviated «A-scan». Fig. 1: Presentation of the registered ultrasonic inspection signal on A-scan: a) ultrasound emission and registration process, b) registered signal. The recorded amplitude of such an echo signal at each i coordinate along the length of the rail can be represented as a vector. Ai = [a1, a2, a3, ... , a j ], where aj - is the amplitude of the reflected signal at the j-th depth level of the rail. The depth for each amplitude value aj is calculated based on the registration time and the frequency of the emitted signal. Principle of B-Scan Formation The recorded A-scan echo signals at each i inspection point along the length of the rail can be represented as a two-dimensional array. B = [A1, A2, A3, ... , Ai] of size (i x j). Figure 2 schematically shows a fragment of array B with the recorded echo signals reflected from a bolt hole with a perpendicular input of the probing pulse to the surface of the rail. Fig. 2: Fragment of the array with the signals of the bolt hole and the bottom signal. The graphical representation of the two-dimensional array B in the form of an intensity graph is called «Bright-scan» («B-scan»), Figure 3, while the values of the array are displayed in three dimensions on a plane by using color as the third dimension of the data along the Z-axis. Fig. 3: Fragment of the B-scan of a bolt hole obtained by scanning with a perpendicular input of the probing pulse to the surface of the rail (Avikon-11 equipment). Formation of a Defectogram The different reflective properties of defects, their geometry, and their location in the rail require the use of ultrasonic transducers with different angles of input and registration for their detection. Therefore, modern rail flaw detectors use several transducers that are distributed along the length of the flaw detector search system and form a so-called rail section sounding scheme. One of the applied inspection schemes is shown in Figure 4, where the generating and registering transducers of each angle of ultrasonic input are located in the same housing. Fig. 4: Example of a scheme for emitting ultrasonic pulses into a rail using six transducers The formation of B-scan signals for a bolt hole using transducers with central input angles of «+420» (orange), « - 420» (blue), and «00» (green) at three characteristic points (1, 2, 3 positions) along the length of the rail is shown schematically in Figure 5a. Fig. 5: Signal formation during scanning: a) general view b) correction with offset. The information channels of a flaw detector correspond to physical sensors (transducers) that are sequentially arranged on the surface of the rail. The set of B-scans for all channels of a flaw detector for each rail, combined into a data file, is called a defectogram (scan). Often, a channel or a set of channels selected for consideration is also called a defectogram. In most cases, to improve the perception of the defectogram, it is displayed in the mode of reducing to a single section, in which the coordinates of the echo signals for channels with an inclined input of ultrasound are corrected by additionally taking into account the distance of the reflector from the point of input of the probing pulse into the metal of the rail (Figure 5b). In addition, for ease of use and reduction of the graphic appearance of the entire defectogram, a graphical grouping of data channels is performed, one of which is shown in Figure 6. Fig. 6: An example of a section of a defectogram of a bolted rail joint obtained by scanning with ultrasonic equipment Avicon-11. Decoding Defectograms (Information Feature) To visually search for defects on the B-scan and A-scan, the cognitive functions of attracted experts — flaw detectors — are used. When ultrasonic scanning of rails, their structural elements and defects have acoustic responses, which are displayed on the defectogram in the form of characteristic graphic images. Each type of defect on the defectogram is visually distinguishable for experts during the data analysis process. The main goal of defectogram analysis is to reliably find and highlight graphic images of defects against the background of possible interference and images of structural elements. Each measuring channel of the defectogram 00, ±420, ±580, +700 or their combination is designed to detect a specific group of defects. To simplify the task of searching for defects, we will decompose the problem and consider the capabilities of DL algorithms for searching for individual types of defective areas using the defectogram of channel «00» of the Avicon-11 flaw detector. In this case, the types of sites can be divided into four classes based on characteristic information features. Some idea of the diversity of the data set obtained by the Avicon-11 flaw detector can be obtained from Table 1. Table 1: Examples of instances (B-scan) for selected classes (real data) Selection and Implementation of a Classification Algorithm Despite the fact that in the operation of the railway track, the presence or absence of a defect (binary classification) is decisive, we will quantitatively assess which defective areas have a high probability of being falsely classified as non-defective, which is a dangerous case in rail diagnostics. In this work, the classification task is reduced to an unambiguous multi-class task with four classes. Data Set Generation The data set is collected from defectograms obtained by the Avicon-11 flaw detector on several Railroad Test Tracks (RTT) and conventional tracks under various conditions. Each data instance is represented as rectangular "depth × long" data and has the shape (224, 1024), which allows you to fit images of more than six bolt holes along the length of the rail at their bolted joint. The formation of a data set is difficult due to the lack of a sufficient number of defective areas, so to expand it, we used a displacement along the length of the rail and scanning of the same defect under different conditions and test equipment settings, which allows us to obtain different images of defects (Fig. 7). Fig. 7: Example of dataset expansion As a result of the specified methodology, the dataset for classes 0, 1, 2, and 3 is 2151, 1043, 1584, and 582, respectively, for a total of 5360 instances. The defect-free class «0» contains 10% (214 instances) of instances without bolt holes, and the remaining 90% (1937 instances) contain from one to six bolt holes. The dataset is named "avicon" and is used only for final testing. This allows us to avoid the problem of class imbalance during training and to obtain a more reliable assessment of the accuracy of the classifier. For the purposes of training and testing classification models in this work, a synthetic, balanced dataset is used, obtained on the basis of mathematical modeling of models describing the process of reflection and registration of ultrasonic waves from structural reflectors of rails and defects. The application of such a trained model for the classification of real data obtained by a flaw detector during rail diagnostics is demonstrated in Fig. 8. Fig. 8: Application of a neural network trained on model data Examples of synthetic instances of the selected classes are presented in Table 2. For more information on the generation of synthetic datasets, please see the works [1-4]. The modeling process allows us to obtain a significant number of instances; we will limit our work to 2048 instances for each of the synthetic sets «train», «valid», «test». Each instance of data and label is written for each set in the corresponding binary files images.bin and labels.bin (data type «uint8») according to Fig. 9. Fig. 9: Distribution of sets by directory Exploratory Data Analysis Information on the amount of data, class balance for synthetic sets, and the «avicon» set is presented in Fig. 10. Analysis of the graphical representation of frames of real data allows us to identify at least one important property of class 3 defects: the images of defects are most difficult to distinguish from the images of bolt holes, especially if they are at the same level as the rail depth, which significantly complicates the classification task. Each data instance is 224 x 1024 in size, which is large enough for the application of machine learning (ML) algorithms but also causes difficulties in organizing the training process. Each such instance can be considered as data points in a 224*1024 = 229376-dimensional space, which is highly sparse because it contains a large number of zeros. The constructed graph of the integral explainable dispersion of the «train» set as a function of the number of components of the PCA method (Fig. 11) shows that when using 1000 components (330 times smaller than the original size) already 98.5% of the dispersion is explained, which indicates a high level of redundancy in the original data. Such a reduced dataset can be used in ML algorithms, but obtaining it on the entire dataset at the same time causes difficulties; therefore, further in the work, an algorithm based on Deep Learning is considered. Fig. 11: Graph of the integral explainable dispersion of the data as a function of the number of components of the PCA method for the «train» dataset Neural Network Architecture In the work, a DL model in the form of a linear stack of layers is considered (Fig. 12a: the final version of the network). Activation function: relu (rectified linear unit), for the output fully connected layer — normalized exponential softmax function, with the sum of the values of all output neurons equal to one. Loss function: a measure of error in the form of the distance between the probability distributions of actual data and their forecast (cross-entropy). Optimizer: stochastic gradient descent algorithm in the RMSProp modification. Metrics in the training process: accuracy, as a value equal to the ratio of the number of correctly classified objects to the total number of objects. Network Training The final version of the network was trained for 50 epochs. The graphs of the changes in the quality indicators «loss» and «accuracy» characterizing the training process (Fig. 12b) converge at the training and validation stages and have low and high values, respectively, which may indicate the absence of the overfitting effect of the model. This fact is also confirmed by the relative equality of the obtained prediction accuracies of the model on the «train» set - 99.61% and the «test» set - 99.02% (Fig. 12b,c). The memory occupied by the network in H5 format is 30 KB. The full code can be found in the GitHub repository at the link [5]. Fig. 12: NN and the results of its training: a) network architecture; b) Change in «Loss» and «Accuracy» during training; c) Classification Report; d) Confusion matrix The confusion matrix and the classification report are presented in Figure 12d,c. The trained model has high precision and recall scores above 96% for all class classifiers, which also means that there are sufficient information features in the data for classification. It is important to consider the misclassified samples to understand the operation of the classifier and its changes. According to the confusion matrix, the classifier of class 3 incorrectly recognized four samples of class 0 that have at least one bolt-hole signal similar to the image of a defect of group 3 (an example is shown in Fig. 13a), which may have been the cause of the error. The class 0 classifier incorrectly recognized two samples of class 1. Both incorrectly recognized defects have a characteristic appearance and are located very close to the upper boundary of the data frame. One of such frames is shown in Fig. 13b. The class 0 classifier incorrectly recognized one sample of class 2, which is located close enough to the depth of the bolt holes (Fig. 13c). The class 0 classifier incorrectly recognized 13 samples of class 3, which is located close enough to the depth of the bolt holes (Fig. 13d). The results of the network tests indicate the difficulty of distinguishing a class 3 defect from bolt holes. Fig. 13: Characteristic frames of incorrectly classified data Evaluation of Network Efficiency Using Real Data («Avicon») To assess the quality of the work of the trained neural network for recognizing instances of real data obtained by the Avicon-11 flaw detector, modeling was performed on the avicon dataset. The accuracy of the entire network was 90%, which is 9% lower than the prediction accuracy for synthetic data. The resulting confusion matrix and summary report on the quality of the model are presented in Fig. 14. The time required to classify the tagged data makes it possible to estimate the time required to classify 100 km of railway line — 11 s. Fig. 14: Summary report on the quality of the model based on the classification of the «avicon» dataset We will analyze the most important data classification errors. According to the confusion matrix (Fig. 14), four transverse cracks with the weakest response recorded by channel «00» and belonging to class 1 were classified as class 0 (without defect). To improve the recognition of such defects, it is necessary to add additional information features that can be obtained from the inclined channels of the flaw detector, which are the main channels for detecting such types of defects (Fig. 15). Fig. 15: Typical frames of misclassified data (True = 1, Predict = 0)) The incorrect classification of 49 class 2 defects as non-defective is associated with weak signal responses recorded by the measuring channel «00». One way to improve the classification of such class 2 samples is to consider additional information features from the inclined channels (Ch ±420), as they are the main channels for detecting incorrectly classified defects (Fig. 16). Fig. 16: Typical misclassified data frame of class 2 (True =2, Predict = 0) The misclassification of 112 class 3 samples to class 0 is associated with data frames where the defect image is: Is at the level of bolt holes Located closer to the edge of the data frame Has a similar pattern to bolt holes The misclassification of 152 class 0 samples into class 3 samples is due to a similar reason — the similarity of bolt-hole patterns to class 3 defect patterns. One of the ways to improve the classification of samples of classes 0 and 3 is to consider additional information signs of inclined channels (Ch ±420), since in this case the bolt holes are well distinguished from a defect of class 3 and vice versa (Fig. 17). Fig. 17: Typical misclassified data frame of class 3 (True =3, Predict = 0) The graphical images of defects of classes 1 and 2 are similar, and the assignment of a defect to class 1 or 2 depends on the depth from which the image of the defect begins to be recorded on the defectogram. Defects of class 1 are located in the head of the rail. Defects of class 2 can be recorded starting from the transition zone of the rail from its head to the neck zone. The incorrect classification of 165 defects of class 1 as defects of class 2 is most likely associated with weak defect responses recorded in the head of the rail (Fig. 18). Fig. 18: Typical misclassified data frames of class 1 (True =1, Predict = 2) Binary Classifier One of the important tasks of the obtained classifier in its practical use will be the accurate definition of the non-defective class (class 0), which will allow the exclusion of the false assignment of defective samples to non-defective ones. It is possible to reduce the number of false positives for the class 0 classifier by changing the probability cutoff threshold. To evaluate the applicable threshold level of cutoff, the multiclass task was binarized with the isolation of the non-defective state and all defective states, which corresponds to the «one versus rest» strategy (One vs Rest). By default, for binary classification, the threshold value is taken to be 0.5 (50%). With this approach, the binary classifier has an accuracy of 92.28% (Fig. 19). Fig. 19: Qualitative indicators of a binary classifier at a cutoff threshold of 0.5 («avicon» set) The changes in precision and recall of the binary classifier depending on the changing threshold value are presented in a «precision-recall curve» graph (Fig. 20a). With a threshold value of 0.5, the value of false positives is 161 samples (Fig. 20b). Increasing the threshold value to 0.8, and 0.9 allows to reduce the number of false positives to 70 and 58, respectively, due to an increase in false negatives to 344 and 440 (Fig. 20b). It can be said that in automatic analysis, increasing the threshold value allows, on the one hand, to reduce the false assignment of defects to the non-defective state and thereby reduce the risk of missing defects. On the other hand, increases the labor intensity of a person during manual analysis of frames with known defects. Fig. 20: Influence of the cutoff threshold on the characteristics of a binary classifier: a) precision-recall curve, b) confusion matrix at different cutoff thresholds. 5. Conclusion Based on the analysis of the subject area of ultrasonic inspection of rails, information signs of defects were identified, allowing us to identify four classes of rail sections for their classification using machine learning methods. A dataset of ultrasonic rail inspection was collected and annotated, which includes 5360 instances. Synthetic training, test, and validation datasets were created based on stochastic mathematical modeling of models describing the process of reflection and registration of ultrasonic waves from structural reflectors of rails and defects. To solve the problem of unambiguous multi-class classification, a neural network structure based on a convolutional model with an overall accuracy of 99% was trained. The effectiveness of using a neural network trained on model data for the recognition of images of real rail defects has been confirmed. An estimate of the achievable classification accuracy of 90% has been given using only sections of defectograms of the zero channel of an ultrasonic flaw detector. An analysis of the causes of neural network errors has been conducted, and the need for the use of additional information features from defectograms of inclined channels of a flaw detector has been shown. References Kaliuzhnyi A. Application of Model Data for Training the Classifier of Defects in Rail Bolt Holes in Ultrasonic Diagnostics. Artificial Intelligence Evolution [Internet]. 2023 Apr. 14 [cited 2023 Jul. 28];4(1):55-69. Kaliuzhnyi A. Application of Machine Learning Methods To Search for Rail Defects (Part 2) Kaliuzhnyi A. Using Machine Learning To Detect Railway Defects. NVIDIA Blog. What Is Synthetic Data? GitHub Repository
It’s one thing to build powerful machine-learning models and another thing to be able to make them useful. A big part of it is to be able to build applications to expose its features to end users. Popular examples include ChatGPT, Midjourney, etc. Streamlit is an open-source Python library that makes it easy to build web applications for machine learning and data science. It has a set of rich APIs for visual components, including several chat elements, making it quite convenient to build conversational agents or chatbots, especially when combined with LLMs (Large Language Models). And that’s the example for this blog post as well — a Streamlit-based chatbot deployed to a Kubernetes cluster on Amazon EKS. But that’s not all! We will use Streamlit with LangChain, which is a framework for developing applications powered by language models. The nice thing about LangChain is that it supports many platforms and LLMs, including Amazon Bedrock (which will be used for our application). A key part of chat applications is the ability to refer to historical conversation(s) — at least within a certain time frame (window). In LangChain, this is referred to as Memory. Just like LLMs, you can plug-in different systems to work as the memory component of a LangChain application. This includes Redis, which is a great choice for this use case since it’s a high-performance in-memory database with flexible data structures. Redis is already a preferred choice for real-time applications (including chat) combined with Pub/Sub and WebSocket. This application will use Amazon ElastiCache Serverless for Redis, an option that simplifies cache management and scales instantly. This was announced at re:Invent 2023, so let’s explore while it’s still fresh! To be honest, the application can be deployed on other compute options such as Amazon ECS, but I figured since it needs to invoke Amazon Bedrock, it’s a good opportunity to also cover how to use EKS Pod Identity (also announced at re:Invent 2023!!) GitHub repository for the app. Here is a simplified, high-level diagram: Let’s go!! Basic Setup Amazon Bedrock: Use the instructions in this blog post to set up and configure Amazon Bedrock. EKS cluster: Start by creating an EKS cluster. Point kubectl to the new cluster using aws eks update-kubeconfig --region <cluster_region> --name <cluster_name> Create an IAM role: Use the trust policy and IAM permissions from the application GitHub repository. EKS Pod Identity Agent configuration: Set up the EKS Pod Identity Agent and associate EKS Pod Identity with the IAM role you created. ElastiCache Serverless for Redis: Create a Serverless Redis cache. Make sure it shares the same subnets as the EKS cluster. Once the cluster creation is complete, update the ElastiCache security group to add an inbound rule (TCP port 6379) to allow the application on the EKS cluster to access the ElastiCache cluster. Push the Docker Image to ECR and Deploy the App to EKS Clone the GitHub repository: Shell git clone https://github.com/abhirockzz/streamlit-langchain-chatbot-bedrock-redis-memory cd streamlit-langchain-chatbot-bedrock-redis-memory Create an ECR repository: Shell export REPO_NAME=streamlit-chat export REGION=<AWS region> ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com aws ecr create-repository --repository-name $REPO_NAME Create the Docker image and push it to ECR: Shell docker build -t $REPO_NAME . docker tag $REPO_NAME:latest $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/$REPO_NAME:latest docker push $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/$REPO_NAME:latest Deploy Streamlit Chatbot to EKS Update the app.yaml file: Enter the ECR docker image info In the Redis connection string format, enter the Elasticache username and password along with the endpoint. Deploy the application: Shell kubectl apply -f app.yaml To check logs: kubectl logs -f -l=app=streamlit-chat Start a Conversation! To access the application: Shell kubectl port-forward deployment/streamlit-chat 8080:8501 Navigate to http://localhost:8080 using your browser and start chatting! The application uses the Anthropic Claude model on Amazon Bedrock as the LLM and Elasticache Serverless instance to persist the chat messages exchanged during a particular session. Behind the Scenes in ElastiCache Redis To better understand what’s going on, you can use redis-cli to access the Elasticache Redis instance from EC2 (or Cloud9) and introspect the data structure used by LangChain for storing chat history. keys * Don’t run keys * in a production Redis instance — this is just for demonstration purposes. You should see a key similar to this — "message_store:d5f8c546-71cd-4c26-bafb-73af13a764a5" (the name will differ in your case). Check it’s type: type message_store:d5f8c546-71cd-4c26-bafb-73af13a764a5 - you will notice that it's a Redis List. To check the list contents, use the LRANGE command: Shell LRANGE message_store:d5f8c546-71cd-4c26-bafb-73af13a764a5 0 10 You should see a similar output: Shell 1) "{\"type\": \"ai\", \"data\": {\"content\": \" Yes, your name is Abhishek.\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 2) "{\"type\": \"human\", \"data\": {\"content\": \"Thanks! But do you still remember my name?\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" 3) "{\"type\": \"ai\", \"data\": {\"content\": \" Cloud computing enables convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort.\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 4) "{\"type\": \"human\", \"data\": {\"content\": \"Tell me about Cloud computing in one line\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" 5) "{\"type\": \"ai\", \"data\": {\"content\": \" Nice to meet you, Abhishek!\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 6) "{\"type\": \"human\", \"data\": {\"content\": \"Nice, my name is Abhishek\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" 7) "{\"type\": \"ai\", \"data\": {\"content\": \" My name is Claude.\", \"additional_kwargs\": {}, \"type\": \"ai\", \"example\": false}" 8) "{\"type\": \"human\", \"data\": {\"content\": \"Hi what's your name?\", \"additional_kwargs\": {}, \"type\": \"human\", \"example\": false}" Basically, the Redis memory component for LangChain persists the messages as a List and passes its contents as additional context with every message. Conclusion To be completely honest, I am not a Python developer (I mostly use Go, Java, or sometimes Rust), but I found Streamlit relatively easy to start with, except for some of the session-related nuances. I figured out that for each conversation, the entire Streamlit app is executed (this was a little unexpected coming from a backend dev background). That’s when I moved the chat ID (kind of unique session ID for each conversation) to the Streamlit session state, and things worked. This is also used as part of the name of the Redis List that stores the conversation (message_store:<session_id>) — each Redis List is mapped to the Streamlist session. I also found the Streamlit component-based approach to be quite intuitive and pretty extensive as well. I was wondering if there are similar solutions in Go. If you know of something, do let me know. Happy building!
PostgresML is an extension of the PostgreSQL ecosystem that allows the training, fine-tuning, and use of various machine learning and large language models within the database. This extension turns PostgreSQL into a complete MLOps platform, supporting various natural language processing tasks and expanding Postgres's capabilities as a vector database. The extension complements pgvector, another foundational extension for apps wishing to use Postgres as a vector database for AI use cases. With pgvector, applications can easily store and work with embeddings generated by large language models (LLMs). PostgresML takes it further by enabling the training and execution of models within the database. Let's look at the PostgresML extension in action by using PostgreSQL for language translation tasks and user sentiment analysis. Enable PostgresML The easiest way to start with PostgresML is by deploying a database instance with the pre-installed extension in Docker. Use the following command to launch PostgreSQL with PostgresML in a container and open a database session with the psql tool: Shell docker run \ -it \ -v postgresml_data:/var/lib/postgresql \ -p 5432:5432 \ -p 8000:8000 \ ghcr.io/postgresml/postgresml:2.7.12 \ sudo -u postgresml psql -d postgresml Once the container has started and the psql session is open, check that the pgml extension (short for PostgresML) is on the extensions list: SQL select * from pg_extension; oid | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition -------+---------+----------+--------------+----------------+------------+-----------+-------------- 13540 | plpgsql | 10 | 11 | f | 1.0 | | 16388 | pgml | 16385 | 16387 | f | 2.7.12 | | (2 rows) Finally, if you run the \d command, you'll see a list of database objects used internally by PostgresML. SQL \d List of relations Schema | Name | Type | Owner --------+-----------------------+----------+------------ pgml | deployed_models | view | postgresml pgml | deployments | table | postgresml pgml | deployments_id_seq | sequence | postgresml pgml | files | table | postgresml pgml | files_id_seq | sequence | postgresml pgml | models | table | postgresml pgml | models_id_seq | sequence | postgresml ...truncated Text Translation With PostgresML PostgresML integrates with Hugging Face Transformers to enable the latest natural language processing (NLP) models in PostgreSQL. Hugging Face features thousands of pre-trained models that can be used for tasks like sentiment analysis, text classification, summarization, translation, question answering, and more. For instance, suppose you store a product catalog in PostgreSQL, with all the product descriptions in English. Now, you need to display these descriptions in French for customers visiting your e-commerce website from France. What if someone gets interested in Apple's AirTag? PostgresML can facilitate the translation from English to French using one of the translation transformers: SQL SELECT pgml.transform( 'translation_en_to_fr', inputs => ARRAY[ 'AirTag is a supereasy way to keep track of your stuff. Attach one to your keys, slip another in your backpack. And just like that, they’re on your radar in the Find My app, where you can also track down your Apple devices and keep up with friends and family.' ] ) AS french; -[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- french | [{"translation_text": "AirTag est un moyen super facile de suivre vos objets. Attachez-leur à vos clés, glissez-leur dans votre sac à dos. Et comme ça, ils sont sur votre radar dans l’app Find My, où vous pouvez aussi retrouver vos appareils Apple et suivre vos amis et votre famille."}] translation_en_to_fr - the name of a pre-configured transformer utilizing one of the models from Hugging Face. inputs - an array of text that needs translation. If the e-commerce website also caters to Spanish-speaking countries, then product descriptions can be translated into Spanish using a different model: SQL select pgml.transform( task => '{"task": "translation", "model": "Helsinki-NLP/opus-mt-en-es" }'::JSONB, inputs => ARRAY[ 'AirTag is a supereasy way to keep track of your stuff. Attach one to your keys, slip another in your backpack. And just like that, they’re on your radar in the Find My app, where you can also track down your Apple devices and keep up with friends and family.' ] ) as spanish; -[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- spanish | [{"translation_text": "AirTag es una manera superfácil de hacer un seguimiento de tus cosas. Conecta una a tus llaves, desliza otra en tu mochila. Y así mismo, están en tu radar en la aplicación Find My, donde también puedes rastrear tus dispositivos Apple y mantenerte al día con tus amigos y familiares."}] task - a custom task for translation using one of Helsinki-NLP's models. You can choose from thousands of models available on the Hugging Face hub. Overall, PostgresML can improve user experience by returning text that has already been translated back to the application layer. Sentiment Analysis With PostgresML What about engaging in more sophisticated ML and AI-related tasks with PostgresML? One such task is the sentiment analysis of data being inserted or stored in the database. Imagine that customers of the e-commerce website can share their feedback on the products. PostgresML can assist in monitoring customer sentiment about specific products and proactively responding to various concerns and complaints. For example, a customer purchased a headset and shared feedback that PostgresML classified as negative: SQL SELECT pgml.transform( task => 'text-classification', inputs => ARRAY[ 'I regret buying this headset. It does not connect to my laptop over Bluetooth.' ] ) AS positivity; -[ RECORD 1 ]---------------------------------------------------- positivity | [{"label": "NEGATIVE", "score": 0.9996261596679688}] task - a pre-configured transformation for text classification tasks. inputs - the text for sentiment analysis. A company representative reached out to the customer promptly and helped to solve the problem. As a result, the customer shared follow-up feedback that was classified as positive. SQL SELECT pgml.transform( task => 'text-classification', inputs => ARRAY[ 'I regret buying this headset. It does not connect to my laptop over Bluetooth.', 'The problem is solved. Jane reached out to me and helped with the setup. Love the product!' ] ) AS positivity; -[ RECORD 1 ]------------------------------------------------------------------------------------------------------- positivity | [{"label": "NEGATIVE", "score": 0.9996261596679688}, {"label": "POSITIVE", "score": 0.999795138835907}] Just like with the translation tasks, you can utilize thousands of other models from Hugging Face for sentiment analysis and other text classification tasks. For instance, here's how you can switch to the RoBERTa model, which was trained on approximately 40,000 English posts on X (Twitter): SQL SELECT pgml.transform( task => '{"task": "text-classification", "model": "finiteautomata/bertweet-base-sentiment-analysis" }'::jsonb, inputs => ARRAY[ 'I regret buying this headset. It does not connect to my laptop over Bluetooth.', 'The problem is solved. Jane reached out to me and helped with the setup. Love the product!' ] ) AS positivity; -[ RECORD 1 ]---------------------------------------------------------------------------------------------- positivity | [{"label": "NEG", "score": 0.9769334197044371}, {"label": "POS", "score": 0.9884902238845824}] The RoBERTa model has also accurately classified the sentiment of the comments, allowing the e-commerce company to address user concerns and complaints promptly as soon as negative feedback gets into PostgreSQL. Summary As a vector database, Postgres isn't limited to storing and querying embeddings. With the PostgresML extension, Postgres can be transformed into a computational platform for various AI and ML tasks. Discover more about PostgresML and PostgreSQL as a vector database in the following hands-on practical guides:
As we approach 2024, the cloud computing landscape is on the cusp of significant changes. In this article, I explore my predictions for the future of cloud computing, highlighting the integration of Generative AI Fabric, its application in enterprises, the advent of quantum computing with specialized chips, the merging of Generative AI with edge computing, and the emergence of sustainable, self-optimizing cloud environments. Generative AI Fabric: The Future of Generative AI Cloud Architecture The Generative AI Fabric is set to become a crucial architectural element in cloud computing, functioning as a middleware layer. This fabric will facilitate the operation of Large Language Models (LLMs) and other AI tools, serving as a bridge between the technological capabilities of AI and the strategic business needs of enterprises. The integration of Generative AI Fabric into cloud platforms will signify a shift towards more adaptable, efficient, and intelligent cloud environments, capable of handling sophisticated AI operations with ease. Generative AI’s Integration in Enterprises Generative AI will play a pivotal role in enterprise operations by 2024. Cloud providers will enable easier integration of these AI models, particularly in coding and proprietary data management. This trend includes the deployment of AI code pilots that directly enhance enterprise code bases, improving development efficiency and accuracy. A part from enhancing enterprise code bases, another significant trend in the integration of Generative AI in enterprises is the incorporation of proprietary data with Generative AI services. Enterprises are increasingly leveraging their unique datasets in combination with advanced AI services, including those at the edge, to unlock new insights and capabilities. This integration allows for more tailored AI solutions that are finely tuned to the specific needs and challenges of each business. It enables enterprises to gain a competitive edge by leveraging their proprietary data in more innovative and efficient ways. The integration of Generative AI in enterprises will also be mindful of data security and privacy, ensuring a responsible yet revolutionary approach to software development, data management, and analytics. Quantum Computing in the Cloud Quantum computing will emerge as a game-changing addition to cloud computing in 2024. The integration of specialized quantum chips within cloud platforms will provide unparalleled computational power. These chips will enable businesses to perform complex simulations and solve problems across various sectors, such as pharmaceuticals and environmental science. Quantum computing in cloud services will redefine the boundaries of computational capabilities, offering innovative solutions to challenging problems. An exciting development in this area is the potential introduction of Generative AI copilots for quantum computing. These AI copilots could play a crucial role in both educational and practical applications of quantum computing. For educational purposes, they could demystify quantum computing concepts, making them more accessible to students and professionals looking to venture into this field. The AI copilots could break down complex quantum theories into simpler, more digestible content, enhancing learning experiences. In practical applications, Generative AI copilots could assist in the implementation of quantum computing solutions. They could provide guidance on best practices, help optimize quantum algorithms, and even suggest innovative approaches to leveraging quantum computing in various industries. This assistance would be invaluable for organizations that are new to quantum computing, helping them integrate this technology into their operations more effectively and efficiently. Generative AI and Edge Computing The integration of Generative AI with edge computing is expected to make significant strides in 2024. This synergy is set to enhance the capabilities of edge computing, especially in areas of real-time data processing and AI-driven decision-making. By bringing Generative AI capabilities closer to the data source, edge computing will enable faster and more efficient processing, which is crucial for a variety of applications. One of the key benefits of this integration is improved data privacy. By processing data locally on edge devices, rather than transmitting it to centralized cloud servers, the risk of data breaches and unauthorized access is greatly reduced. This localized processing is particularly important for sensitive data in sectors like healthcare, finance, and personal data services. In addition to IoT and real-time analytics, other use cases include smart city management, personalized healthcare monitoring, and enhanced retail experiences. I have covered the future of retail with Generative AI in my earlier blog Sustainable and Self-Optimizing Cloud Environments Sustainable cloud computing will become a pronounced trend in 2024. Self-optimizing cloud environments focusing on energy efficiency and reduced environmental impact will rise. These systems, leveraging AI and automation, will dynamically manage resources, leading to more eco-friendly and cost-effective cloud solutions. This trend towards sustainable cloud computing reflects a global shift towards environmental responsibility. Conclusion As 2024 approaches, the cloud computing landscape is set to undergo a series of transformative changes. The development of Generative AI Fabric as a middleware layer, its integration into enterprise environments, the emergence of quantum computing with specialized chips, the fusion of Generative AI with edge computing, and the rise of sustainable, self-optimizing cloud infrastructures are trends that I foresee shaping the future of cloud computing. These advancements promise to bring new efficiencies, capabilities, and opportunities, underscoring the importance of staying informed and adaptable in this evolving domain.
If you’re not yet familiar with the open-source pgvector extension for PostgreSQL, now’s the time to do so. The tool is extremely helpful for searching text data fast without needing a specialized database to store embeddings. Embeddings represent word similarity and are stored as vectors (a list of numbers). For example, the words “tree” and “bush” are related more closely than “tree” and “automobile.” The open-source pgvector tool makes it possible to search for closely related vectors and find text with the same semantic meaning. This is a major advance for text-based data, and an especially valuable tool for building Large Language Models (LLMs)... and who isn’t right now? By turning PostgreSQL into a high-performance vector store with distance-based embedding search capabilities, pgvector allows users to explore vast textual data easily. This also enables exact nearest neighbor search and approximate nearest neighbor search using L2 (or Euclidian) distance, inner product, and cosine distance. Cosine distance is recommended by OpenAI for capturing semantic similarities efficiently. Using Embeddings in Retrieval Augmented Generation (RAG) and LLMs Embeddings can play a valuable role in the Retrieval Augmented Generation (RAG) process, which is used to fine-tune LLMs on new knowledge. The process includes retrieving relevant information from an external source, transforming it into an LLM digestible format, and then feeding it to the LLM to generate text output. Let’s put an example to it. Searching documentation for answers to technical problems is something I’d bet anyone here has wasted countless hours on. For this example below, using documentation as the source, you can generate embeddings to store in PostgreSQL. When a user queries that documentation, the embeddings make it possible to represent the words in a query as vector numbers, perform a similarity search, and retrieve relevant pieces of the documentation from the database. The user’s query and retrieved documentation are both passed to the LLM, which accurately delivers relevant documentation and sources that answer the query. We tested out pgvector and embeddings using our own documentation at Instaclustr. Here are some example user search phrases to demonstrate how embeddings will plot them relative to one another: “Configure hard drive failure setting in Apache Cassandra” “Change storage settings in Redis” “Enterprise pricing for a 2-year commitment” “Raise a support ticket” “Connect to PostgreSQL using WebSockets” Embeddings plot the first two phases nearest each other, even though they include none of the same words. The LLM Context Window Each LLM has a context window: the number of tokens it can process at once. This can be a challenge, in that models with a limited context window can falter with large inputs, but models trained with large context windows (100,000 tokens, or enough to use a full book in a prompt) suffer from latency and must store that full context in memory. The goal is to use the smallest possible context window that generates useful answers. Embeddings help by making it possible to provide the LLM with only data recognized as relevant so that even an LLM with a tight context window isn’t overwhelmed. Feeding the Embedding Model With LangChain The model that generates embeddings — OpenAI’s text-embedding-ada-002 — has a context window of its own. That makes it essential to break documentation into chunks so this embedding model can digest more easily. The LangChain Python framework offers a solution. An LLM able to answer documentation queries needs these tasks completed first: Document loading: LangChain makes it simple to scrape documentation pages, with the ability to load diverse document formats from a range of locations. Document transformation: Segmenting large documents into smaller digestible chunks enables retrieval of pertinent document sections. Embedding generation: Calculate embeddings for the chunked documentation using OpenAI’s embedding model. Data storing: Store embeddings and original content in PostgreSQL. This process yields the semantic index of documentation we’re after. An Example User Query Workflow Now consider this sample workflow for a user query (sticking with our documentation as the example tested). First, a user submits the question: “How do I create a Redis cluster using Terraform?” OpenAI’s embeddings API calculates the question’s embeddings. The system then queries the semantic index in PostgreSQL using cosine similarity, asking for the original content closest to the embeddings of the user’s question. Finally, the system grabs the original content returned in the vector search, concatenates it together, and includes it in a specially crafted prompt with the user’s original question. Implementing pgvector and a User Interface Now let’s see how we put pgvector into action. First, we enabled the pgvector extension in our PostgreSQL database, and created a table for storing all documents and their embeddings: Python CREATE EXTENSION vector; CREATE TABLE insta_documentation (id bigserial PRIMARY KEY, title, content, url, embedding vector(3)); The following Python code scrapes the documentation, uses Beautiful Soup to extract main text parts such as title and content, and stores them and the URL in the PostgreSQL table: Python urls = [...] def init_connection(): return psycopg2.connect(**st.secrets["postgres"]) def extract_info(url): hdr = {'User-Agent': 'Mozilla/5.0'} req = Request(url,headers=hdr) response = urlopen(req) soup = BeautifulSoup(response, 'html.parser') title = soup.find('title').text middle_section = soup.find('div', class_='documentation-middle').contents # middle section consists of header, content and instaclustr banner and back and forth links - we want only the first two content = str(middle_section[0]) + str(middle_section[1]) return title, content, url conn = init_connection() cursor = conn.cursor() for url in urls: page_content = extract_info(url) postgres_insert_query = """ INSERT INTO insta_documentation (title, content, url) VALUES (%s, %s, %s)""" cursor.execute(postgres_insert_query, page_content) conn.commit() if conn: cursor.close() conn.close() Next, we loaded the documentation pages from the database, divided them into chunks, and created and stored the crucial embeddings. Python def init_connection(): return psycopg2.connect(**st.secrets["postgres"]) conn = init_connection() cursor = conn.cursor() # Define and execute query to the insta_documentation table, limiting to 10 results for testing (creating embeddings through the OpenAI API can get costly when dealing with a huge amount of data) postgres_query = """ SELECT title, content, url FROM insta_documentation LIMIT 10""" cursor.execute(postgres_query) results = cursor.fetchall() conn.commit() # Load results into pandas DataFrame for easier manipulation df = pd.DataFrame(results, columns=['title', 'content', 'url']) # Break down content text which exceed max input token limit into smaller chunk documents # Define text splitter html_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.HTML, chunk_size=1000, chunk_overlap=100) # We need to initialize our embeddings model embeddings = OpenAIEmbeddings(model="text-embedding-ada-002") docs = [] for i in range(len(df.index)): # Create document with metadata for each content chunk docs = docs + html_splitter.create_documents([df['content'][i]], metadatas=[{"title": df['title'][i], "url": df['url'][i]}]) # Create pgvector dataset db = Pgvector.from_documents( embedding=embeddings, documents=docs, collection_name=COLLECTION_NAME, connection_string=CONNECTION_STRING, distance_strategy=DistanceStrategy.COSINE, ) Lastly, the retriever found the correct information to answer a given query. In our test example, we searched our documentation to learn how to sign up for an account: Python query = st.text_input('Your question', placeholder='How can I sign up for an Instaclustr console account?') retriever = store.as_retriever(search_kwargs={"k": 3}) qa = RetrievalQA.from_chain_type( llm=OpenAI(), chain_type="stuff", retriever=retriever, return_source_documents=True, verbose=True, ) result = qa({"query": query}) source_documents = result["source_documents"] document_page_content = [document.page_content for document in source_documents] document_metadata = [document.metadata for document in source_documents] Using Streamlit, a powerful tool for building interactive Python interfaces, we built this interface to test the system and view the successful query results: Data Retrieval With Transformative Efficiency Harnessing PostgreSQL and the open-source pgvector project empowers users to leverage natural language queries to answer questions immediately, with no need to comb through irrelevant data. The result: super accurate, performant, and efficient LLMs, groundbreaking textual capabilities, and meaningful time saved!
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Thomas Jardinet
IT Architect,
Rhapsodies Conseil
Sibanjan Das
Zone Leader,
DZone
Tim Spann
Principal Developer Advocate,
Cloudera