DZone Research Report: A look at our developer audience, their tech stacks, and topics and tools they're exploring.
Getting Started With Large Language Models: A guide for both novices and seasoned practitioners to unlock the power of language models.
Containers allow applications to run quicker across many different development environments, and a single container encapsulates everything needed to run an application. Container technologies have exploded in popularity in recent years, leading to diverse use cases as well as new and unexpected challenges. This Zone offers insights into how teams can solve these challenges through its coverage of container performance, Kubernetes, testing, container orchestration, microservices usage to build and deploy containers, and more.
Container Security: The Art and Science of Safeguarding Cloud-Native Environments
Mastering Docker Networking Drivers: Optimizing Container Communication
Artificial Intelligence (AI) and Machine Learning (ML) have revolutionized the way we approach problem-solving and data analysis. These technologies are powering a wide range of applications, from recommendation systems and autonomous vehicles to healthcare diagnostics and fraud detection. However, deploying and managing ML models in production environments can be a daunting task. This is where containerization comes into play, offering an efficient solution for packaging and deploying ML models. In this article, we'll explore the challenges of deploying ML models, the fundamentals of containerization, and the benefits of using containers for AI and ML applications. The Challenges of Deploying ML Models Deploying ML models in real-world scenarios presents several challenges. Traditionally, this process has been cumbersome and error-prone due to various factors: Dependency hell: ML models often rely on specific libraries, frameworks, and software versions. Managing these dependencies across different environments can lead to compatibility issues and version conflicts. Scalability: As the demand for AI/ML services grows, scalability becomes a concern. Ensuring that models can handle increased workloads and auto-scaling as needed can be complex. Version control: Tracking and managing different versions of ML models is crucial for reproducibility and debugging. Without proper version control, it's challenging to roll back to a previous version or track the performance of different model iterations. Portability: ML models developed on one developer's machine may not run seamlessly on another's. Ensuring that models can be easily moved between development, testing, and production environments is essential. Containerization Fundamentals Containerization addresses these challenges by encapsulating an application and its dependencies into a single package, known as a container. Containers are lightweight and isolated, making them an ideal solution for deploying AI and ML models consistently across different environments. Key containerization concepts include: Docker: Docker is one of the most popular containerization platforms. It allows you to create, package, and distribute applications as containers. Docker containers can run on any system that supports Docker, ensuring consistency across development, testing, and production. Kubernetes: Kubernetes is an open-source container orchestration platform that simplifies the management and scaling of containers. It automates tasks like load balancing, rolling updates, and self-healing, making it an excellent choice for deploying containerized AI/ML workloads. Benefits of Containerizing ML Models Containerizing ML models offer several benefits: Isolation: Containers isolate applications and their dependencies from the underlying infrastructure. This isolation ensures that ML models run consistently, regardless of the host system. Consistency: Containers package everything needed to run an application, including libraries, dependencies, and configurations. This eliminates the "it works on my machine" problem, making deployments more reliable. Portability: Containers can be easily moved between different environments, such as development, testing, and production. This portability streamlines the deployment process and reduces deployment-related issues. Scalability: Container orchestration tools like Kubernetes enable auto-scaling of ML model deployments, ensuring that applications can handle increased workloads without manual intervention. Best Practices for Containerizing AI/ML Models To make the most of containerization for AI and ML, consider these best practices: Version control: Use version control systems like Git to track changes to your ML model code. Include version information in your container images for easy reference. Dependency management: Clearly define and manage dependencies in your ML model's container image. Utilize virtual environments or container images with pre-installed libraries to ensure reproducibility. Monitoring and logging: Implement robust monitoring and logging solutions to gain insights into your containerized AI/ML applications' performance and behavior. Security: Follow security best practices when building and deploying containers. Keep container images up to date with security patches and restrict access to sensitive data and APIs. Case Studies Several organizations have successfully adopted containerization for AI/ML deployment. One notable example is Intuitive, which leverages containers and Kubernetes to manage its machine-learning infrastructure efficiently. By containerizing ML models, Intuitive can seamlessly scale its Annotations engine to millions of users while maintaining high availability. Another example is Netflix, which reported a significant reduction in deployment times and resource overheads after adopting containers for their recommendation engines. Conclusion While containerization offers numerous advantages, challenges such as optimizing resource utilization and minimizing container sprawl persist. Additionally, the integration of AI/ML with serverless computing and edge computing is an emerging trend worth exploring. In conclusion, containerization is a powerful tool for efficiently packaging and deploying ML models. It addresses the challenges associated with dependency management, scalability, version control, and portability. As AI and ML continue to shape the future of technology, containerization will play a pivotal role in ensuring reliable and consistent deployments of AI-powered applications. By embracing containerization, organizations can streamline their AI/ML workflows, reduce deployment complexities, and unlock the full potential of these transformative technologies in today's rapidly evolving digital landscape.
In the rapidly changing world of technology, DevOps is the vehicle that propels software development forward, making it agile, cost-effective, fast, and productive. This article focuses on key DevOps tools and practices, delving into the transformative power of technologies such as Docker and Kubernetes. By investigating them, I hope to shed light on what it takes to streamline processes from conception to deployment and ensure high product quality in a competitive technological race. Understanding DevOps DevOps is a software development methodology that bridges the development (Dev) and operations (Ops) teams in order to increase productivity and shorten development cycles. It is founded on principles such as continuous integration, process automation, and improving team collaboration. Adopting DevOps breaks down silos and accelerates workflows, allowing for faster iterations and faster deployment of new features and fixes. This reduces time to market, increases efficiency in software development and deployment, and improves final product quality. The Role of Automation in DevOps In DevOps, automation is the foundation of software development and delivery process optimization. It involves using tools and technologies to automatically handle a wide range of routine tasks, such as code integration, testing, deployment, and infrastructure management. Through automation, development teams get the ability to reduce human error, standardize processes, enable faster feedback and correction, improve scalability and efficiency, and bolster testing and quality assurance, eventually enhancing consistency and reliability. Several companies have successfully leveraged automation: Walmart: The retail corporation has embraced automation in order to gain ground on its retail rival, Amazon. WalmartLabs, the company's innovation arm, has implemented OneOps cloud-based technology, which automates and accelerates application deployment. As a result, the company was able to quickly adapt to changing market demands and continuously optimize its operations and customer service. Etsy: The e-commerce platform fully automated its testing and deployment processes, resulting in fewer disruptions and an enhanced user experience. Its pipeline stipulates that Etsy developers first run 4,500 unit tests, spending less than a minute on it, before checking the code into run and 7,000 automated tests. The whole process takes no more than 11 minutes to complete. These cases demonstrate how automation in DevOps not only accelerates development but also ensures stable and efficient product delivery. Leveraging Docker for Containerization Containerization, or packing an application's code with all of the files and libraries needed to run quickly and easily on any infrastructure, is one of today's most important software development processes. The leading platform that offers a comprehensive set of tools and services for containerization is Docker. It has several advantages for containerization in the DevOps pipeline: Isolation: Docker containers encapsulate an application and its dependencies, ensuring consistent operation across different computing environments. Efficiency: Containers are lightweight, reducing overhead and improving resource utilization when compared to traditional virtual machines. Portability: Docker containers allow applications to be easily moved between systems and cloud environments. Many prominent corporations leverage Docker tools and services to optimize their development cycles. Here are some examples: PayPal: The renowned online payment system embraced Docker for app development, migrating 700+ applications to Docker Enterprise and running over 200,000 containers. As a result, the company's productivity in developing, testing, and deploying applications increased by 50%. Visa: The global digital payment technology company used Docker to accelerate application development and testing by standardizing environments and streamlining operations. The Docker-based platform assisted in the processing of 100,000 transactions per day across multiple global regions six months after its implementation. Orchestrating Containers With Kubernetes Managing complex containerized applications is a difficult task that necessitates the use of a specialized tool. Kubernetes (aka K8S), an open-source container orchestration system, is one of the most popular. It organises the containers that comprise an application into logical units to facilitate management and discovery. It then automates application container distribution and scheduling across a cluster of machines, ensuring resource efficiency and high availability. Kubernetes enables easy and dynamic adjustment of application workloads, accommodating changes in demand without requiring manual intervention. This orchestration system streamlines complex tasks, allowing for more consistent and manageable deployments while optimizing resource utilization. Setting up a Kubernetes cluster entails installing Kubernetes on a set of machines, configuring networking for pods (containers), and deploying applications using Kubernetes manifests or helm charts. This procedure creates a stable environment in which applications can be easily scaled, updated, and maintained. Automating Development Workflows Continuous Integration (CI) and Continuous Deployment (CD) are critical components of DevOps software development. CI is the practice of automating the integration of code changes from multiple contributors into a single software project. It is typically implemented in such a way that it triggers an automated build with testing, with the goals of quickly detecting and fixing bugs, improving software quality, and reducing release time. After the build stage, CD extends CI by automatically deploying all code changes to a testing and/or production environment. This means that, in addition to automated testing, the release process is also automated, allowing for a more efficient and streamlined path to delivering new features and updates to users. Docker and Kubernetes are frequently used to improve efficiency and consistency in CI/CD workflows. The code is first built into a Docker container, which is then pushed to a registry in the CI stage. During the CD stage, Kubernetes retrieves the Docker container from the registry and deploys it to the appropriate environment, whether testing, staging, or production. This procedure automates deployment and ensures that the application runs consistently across all environments. Many businesses use DevOps tools to automate development cycles. Among them are: Siemens: The German multinational technology conglomerate uses GitLab's integration with Kubernetes to set up new machines in minutes. This improves software development and deployment efficiency, resulting in faster time-to-market for their products and cost savings for the company. Shopify: The Canadian e-commerce giant chose Buildkite to power its continuous integration (CI) systems due to its flexibility and ability to be used in the company's own infrastructure. Buildkite allows lightweight Buildkite agents to run in a variety of environments and is compatible with all major operating systems. Ensuring Security in DevOps Automation Lack of security in DevOps can lead to serious consequences such as data breaches, where vulnerabilities in software expose sensitive information to hackers. This can not only result in operational disruptions like system outages significantly increasing post-deployment costs but also lead to legal repercussions linked to compliance violations. Integrating security measures into the development process is thus crucial to avoid these risks. The best practices for ensuring security involve: In the case of Docker containers, using official images, scanning for vulnerabilities, implementing least privilege principles, and regularly updating containers are crucial for enhancing security. For Kubernetes clusters, it is essential to configure role-based access controls, enable network policies, and use namespace strategies to isolate resources. Here are some examples of companies handling security issues: Capital One: The American bank holding company uses DevSecOps to automate security in its CI/CD pipelines, ensuring that security checks are integrated into every stage of software development and deployment. Adobe: The American multinational computer software company has integrated security into its DevOps culture. Adobe ensures that its software products meet stringent security standards by using automated tools for security testing and compliance monitoring. Overcoming Challenges and Pitfalls Implementing DevOps and automation frequently encounters common stumbling blocks, such as resistance to change, a lack of expertise, and integration issues with existing systems. To overcome these, clear communication, training, and demonstrating the value of DevOps to all stakeholders are required. Here are some examples of how businesses overcame obstacles on their way to implementing DevOps methodology: HP: As a large established corporation, HP encountered a number of challenges in transitioning to DevOps, including organizational resistance to new development culture and tools. It relied on a "trust-based culture and a strong set of tools and processes" while taking a gradual transition approach. It started with small projects and scaled up, eventually demonstrating success in overcoming skepticism. Target: While integrating new DevOps practices, the US's seventh-largest retailer had to deal with organizational silos and technology debt accumulated over 50 years in business. It introduced a set of integration APIs that broke down departmental silos while fostering a learning and experimentation culture. They gradually improved their processes over time, resulting in successful DevOps implementation. The Future of DevOps and Automation With AI and ML taking the world by storm, these new technologies are rapidly reshaping DevOps practices. In particular, they enable the adoption of more efficient decision-making and predictive analytics, significantly optimizing the development pipeline. They also automate tasks such as code reviews, testing, and anomaly detection, which increases the speed and reliability of continuous integration and deployment processes. To prepare for the next evolution in DevOps, it's crucial to embrace trending technologies such as AI and machine learning and integrate them into your processes for enhanced automation and efficiency. This involves investing in training and upskilling teams to adapt to these new tools and methodologies. Adopting flexible architectures like microservices and leveraging data analytics for predictive insights will be key. Conclusion In this article, we have delved into the evolution of the approaches toward software development, with the DevOps methodology taking center stage in this process. DevOps is created for streamlining and optimizing development cycles through automation, containerization, and orchestration. To reach its objectives, DevOps uses powerful technologies like Docker and Kubernetes, which not only reshape traditional workflows but also ensure enhanced security and compliance. As we look towards the future, the integration of AI and ML within this realm promises further advancements, ensuring that DevOps continues to evolve, adapting to the ever-changing landscape of software development and deployment. Additional Resources Read on to learn more about this topic: The official Docker documentation; The official Kubernetes documentation; "DevOps with Kubernetes"; "DevOps: Puppet, Docker, and Kubernetes"; "Introduction to DevOps with Kubernetes"; "Docker in Action".
The most popular use case in current IT architecture is moving from Serverful to Serverless design. There are cases where we might need to design a service in a Serverful manner or move to Serverful as part of operational cost. In this article, we will be showing how to run Kumologica flow as a docker container. Usually, the applications built on Kumologica are focussed on serverless computing like AWS Lambda, Azure function, or Google function but here we will be building the service very similar to a NodeJS express app running inside a container. The Plan We will be building a simple hello world API service using a low code integration tooling and wrapping it as a docker image. We will then run the docker container using the image in our local machine. Then test the API using an external client. Prerequisites To start the development we need to have the following utilities and access ready. NodeJS installed Kumologica Designer Docker installed Implementation Building the Service First, let's start the development of Hello World service by opening the designer. To open the designer use the following command kl open. Once the designer is opened, Drag and drop an EvenListener node to the canvas. Click open the configuration and provide the below details. Plain Text Provider : NodeJS Verb : GET Path : /hello Display Name : [GET] /hello Now drag and drop a logger node from pallet to canvas and wire it after the EventListener node. Plain Text Display name : Log_Entry level : INFO Message : Inside the service Log Format : String Drag and drop the EventListenerEnd node to the canvas wire it to the Logger node and provide the following configuration. Plain Text Display Name : Success Payload : {"status" : "HelloWorld"} ContentType : application/json The flow is now completed. Let's dockerize it. Dockerizing the Flow To dockerize the flow open the project folder and place the following Docker file on the root project folder (same level as package.json). Plain Text FROM node:16-alpine WORKDIR /app COPY package*.json ./ RUN npm install ENV PATH /app/node_modules/.bin:$PATH COPY . . EXPOSE 1880 CMD ["node","index.js"] Note: The above Dockerfile is very basic and can be modified according to your needs. Now we need to add another file that treats Kumologica flow to run as an NodeJS express app. Create an index.js file with the following Javascript content. Replace the "your-flow.json" with the name of the flow.json in your project folder. JavaScript const { NodeJsFlowBuilder } = require('@kumologica/runtime'); new NodeJsFlowBuilder('your-flow.json').listen(); Now let's test the flow locally by invoking the endpoint from Postman or any REST client of your choice. curl http://localhost:1880/hello You will be getting the following response: JSON {"status" : "HelloWorld"} As we are done with our local testing, Now we will build an image based on our Docker file. To build the image, go to the root of the project folder and run the following command from a command line in Windows or a terminal in Mac. Plain Text docker build . -t hello-kl-docker-app Now the Docker image is built. Let's check the image locally by running the following command. Plain Text docker images Let's test the image running the image locally by executing the following command. Plain Text docker run -p 1880:1880 hello-kl-docker-app Check the container by running the following command: Plain Text docker ps -a You should now see the container name and ID listed. Now we are ready to push the image to any registry of your choice.
Kubernetes, an open-source platform designed for automating deployment, scaling, and operations of application containers across clusters of hosts, has revolutionized how we manage applications in containers. A crucial feature of Kubernetes is its persistent volume (PV) system, which offers a way to manage storage resources. Persistent volumes provide a method for storing data generated and used by applications, ensuring data persists beyond the life of individual pods. This feature is vital for stateful applications, where data integrity and persistence are critical. Kubernetes and AWS: A Synergy in Data Management Kubernetes, when integrated with Amazon Web Services (AWS), offers robust solutions for data management. AWS provides a range of volume types like Elastic Block Store (EBS), Elastic File System (EFS), and more. Among these, EBS volumes are commonly used with Kubernetes and support dynamic resizing, making them ideal for applications that require flexibility in storage management. Step-by-Step Guide on Resizing Persistent Volumes Prerequisites Basic understanding of Kubernetes concepts, such as pods, nodes, and PVs Kubernetes cluster with a storage class that supports volume expansion Access to the Kubernetes command-line tool, kubectl Steps 1. Verify Volume Expansion Support Ensure your storage class supports volume expansion. You can check this by examining the allowVolumeExpansion: true field in the storage class definition. 2. Edit the PersistentVolumenClaim (PVC) PVCs are requests for storage by users. To resize a volume, edit the PVC associated with it. Use kubectl edit pvc <pvc-name> and modify the spec.resources.requests.storage field to the desired size. YAML apiVersion: v1 kind: PersistentVolumeClaim metadata: name: example-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi # Update this value to the desired size storageClassName: gp3 # Ensure this is as per your AWS EBS storage class 3. Wait for the Volume to Resize Once the PVC is updated, Kubernetes will automatically initiate the resizing process. This is done without disrupting the associated pod. 4. Verify the Resizing After the resizing process, verify the new size by checking the PVC status using kubectl get pvc <pvc-name>. Common Challenges and Best Practices Downtime Considerations While resizing can be a non-disruptive process, some older storage systems might require pod restarts. Plan for potential downtime in such scenarios. Data Backup Always back up data before attempting a resize to prevent data loss. Monitoring and Alerts Implement monitoring to track PVC sizes and alerts when they approach their limits. Automation Use automation tools to manage PVC resizing more efficiently in large-scale environments. An example CronJob YAML snippet is shown below. This CronJob can be customized with scripts to assess and resize volumes as needed. YAML apiVersion: batch/v1beta1 kind: CronJob metadata: name: volume-resizer spec: schedule: "0 0 * * *" # This cron schedule runs daily jobTemplate: spec: template: spec: containers: - name: resizer image: volume-resizer-image # Your custom image with resizing logic args: - /bin/sh - -c - resize-script.sh # Script to check and resize volumes restartPolicy: OnFailure Real-World Scenarios and Benefits Scaling Databases For a growing application, database storage needs can increase unpredictably. Dynamic resizing allows for seamless scaling without service interruption. CI/CD Pipelines In CI/CD pipelines, dynamic volume resizing can be particularly beneficial. For instance, during a heavy build process or testing phase, additional storage might be necessary. Post-completion, the storage can be scaled down to optimize costs. Implementing automatic resizing in CI/CD pipelines ensures efficient resource utilization and cost savings, especially in dynamic development environments. Data Analysis and Big Data Resizing is crucial in data analysis scenarios, where data volume can fluctuate significantly. Conclusion Incorporating dynamic resizing of persistent volumes in Kubernetes, especially when integrated with AWS services, enhances flexibility and efficiency in managing storage resources. The addition of automation, particularly through Kubernetes CronJobs, elevates this process, ensuring optimal resource utilization. This capability is especially impactful in scenarios like CI/CD pipelines, where storage needs can fluctuate rapidly. The synergy between Kubernetes and AWS in managing data storage is a powerful tool in any developer's arsenal, combining flexibility, scalability, and automation. This guide aims to demystify the process of persistent volume resizing in Kubernetes, making it accessible to those with basic Kubernetes knowledge while providing insights beneficial for experienced users. As with any technology, continuous learning and adaptation are key to leveraging these features effectively.
MoneySuite is a Fintech SaaS platform offering payments and financial automation solutions. As a regulated financial service provider, our applications are bank-grade Java microservices running in Docker container images. As an integral part of our service operations readiness, we conduct thorough performance analysis and troubleshooting for our services running in the Docker infrastructure for the following purposes: Analyzing and tuning service performance: Docker containers operate within a confined subset of system resources. Applications running inside Docker undergo meticulous performance analysis and tuning under stress to ensure optimal functionality within the restricted environment. Debugging production issues: The Java service Docker images require vigilant monitoring and troubleshooting in a live production environment, presenting distinct challenges compared to managing them on the host machine. This blog comprehensively discusses and shares the methodologies we’ve implemented to achieve performance analysis and effective troubleshooting for Java Docker images running within Docker containers. Container Monitoring Tools Container services are comparatively new to enterprises, and there are only very few tools available for container management. In this blog, we will be using yCrash, a non-intrusive comprehensive tool with 360° data capture and analysis for troubleshooting container environments and the services running in it. yCrash service managing container application Set Up yCrash Setting up yCrash involves three simple steps: Register with yCrash. Install the yCrash Agent. Set up the yCrash Server. Register With yCrash yCrash offers a free tier. You can register on their website. Install yCrash Server The yCrash server is a web application that provides an incident management report and comes in two variants: Cloud Service — A secured managed service with no installation required Enterprise Edition — An on-premises instance that offers greater control and adheres to enterprise compliance requirements Note: For this tutorial, we will be using the cloud service. Install yCrash Agent The yCrash agent captures a 360° view of the service and underlying infrastructure. It can be deployed with just a few simple steps. The yCrash agent can be installed in the container along with the application or outside the container. In this example, we will be using the in-container option. Please refer to the yCrash agent installation guide at the following link here. Sample Application Troubleshooting can be done to any Docker container running Java service. For this blog, we utilized the Spring Boot container application, a comprehensive Spring Boot container API service capable of simulating various performance problems. You can check out and run the Spring Boot Buggy API service from this GitHub repo. yCrash is non-intrusive, incurs almost zero overhead, and exhibits in-depth analysis capabilities. YCrash summary of the incident reported on a Docker image Java App: RCA and Performance Tuning The Java process running in the containers needs performance tuning and continuous monitoring to check if it runs healthily under the container environment. The analysis tools, such as YCrash, will show in detail the health and performance stats of the Java applications. The Java app performance parameters include: Garbage Collection analysis: A wrongly configured GC could result in application performance degradation and increased resource usage. It is essential to analyze and tune GC configuration settings definitively. YCrash showing GC stats for tuning Thread analysis: Design and coding errors could lead to severe thread problems, such as deadlocks and blocked threads. An in-depth analysis of thread status is required to analyze and fix performance-related thread issues. Below is the thread dump analysis report from the yCrash tool highlighting the two threads and their stack trace which were causing the Deadlock. YCrash highlighting deadlocks issue with detail You can clearly see yCrash reporting Thread-4 and Thread-5 suffering from Deadlock. yCrash also reports the stack trace of Thread-4 and Thread-5. From the stack trace, you can notice Thread-5 acquired the lock of CoolObject, and it is waiting for the HotObject lock. On the other hand, Thread-1 acquired the lock of HotObject, and it is waiting for the CoolObject lock. Now based on this stacktrace, we know the exact line of code that is causing the problem. Heap and memory analysis: Memory analysis is a critical part of Java applications to avoid any critical outages related to memory, such as heap space unavailability and stack overflow. Log analysis: Most applications log exceptions to application log files, which helps in analyzing Java errors in runtime. Resource Monitoring The Docker service running inside the container environments requires regular health monitoring for health and maintenance purposes. The common things that are monitored include: Process management: Service maintenance involves monitoring the process status running on the container machine. Network health: Network monitoring to understand the slowness or outages resulted due to network connections Resource management: There is a requirement to monitor the complete resources of the containers, such as CPU utilization, IO, etc., that could affect the performance of the service running in the container environment. Disk management: Keep monitoring the disk space; running out of disk space will result in degraded application performance or even crashing. YCrash showing 360-degree resource view of Docker container Continuous Log Monitoring and Notifications To continuously monitor for errors or performance degradations, the logs inside the container environment need to be monitored for any exceptions. The environment should be thoroughly monitored, and any incidents should be reported through notifications for immediate service management actions. YCrash showing log status highlighting exceptions Conclusion We have seen the importance of monitoring and managing the service running in a Docker container environment with a real example — a monitoring tool like YCrash, which displays all the details of the Docker container for performance tuning, root cause analysis, and continuous monitoring.
The official Ubuntu Docker image is the most downloaded image from Docker Hub. With over one billion downloads, Ubuntu has proven itself to be a popular and reliable base image on which to build your own custom Docker images. In this post, I show you how to make the most of the base Ubuntu images while building your own Docker images. An Example Dockerfile This is an example Dockerfile that includes the tweaks discussed in this post. I go through each of the settings to explain what value they add: Dockerfile FROM ubuntu:22.04 RUN echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf.d/00-docker RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf.d/00-docker RUN DEBIAN_FRONTEND=noninteractive \ apt-get update \ && apt-get install -y python3 \ && rm -rf /var/lib/apt/lists/* RUN useradd -ms /bin/bash apprunner USER apprunner Build the image with the command: Shell docker build . -t myubuntu Now that you've seen how to build a custom image from the Ubuntu base image, let's go through each of the settings to understand why they were added. Selecting a Base Image Docker images are provided for all versions of Ubuntu, including Long Term Support (LTS) releases such as 20.04 and 22.04, and normal releases like 19.04, 19.10, 21.04, and 21.10. LTS releases are supported for 5 years, and the associated Docker images are also maintained by Canonical during this period, as described on the Ubuntu release cycle page: These images are also kept up to date, with the publication of rolled up security updated images on a regular cadence, and you should automate your use of the latest images to ensure consistent security coverage for your users. When creating Docker images hosting production software, it makes sense to base your images on the latest LTS release. This allows DevOps teams to rebuild their custom images on top of the latest LTS base image, which automatically includes all updates but is also unlikely to include the kind of breaking changes that can be introduced between major operating system versions. I used the Ubuntu 22.04 LTS Docker image as the base for this image: Shell FROM ubuntu:22.04 Not Installing Suggested or Recommended Dependencies Some packages have a list of suggested or recommended dependencies that aren't required but are installed by default. These additional dependencies can add to the size of the final Docker image unnecessarily, as Ubuntu notes in their blog post about reducing Docker image sizes. To disable the installation of these optional dependencies for all invocations of apt-get, the configuration file at /etc/apt/apt.conf.d/00-docker is created with the following settings: Shell RUN echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf.d/00-docker RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf.d/00-docker Installing Additional Packages Most custom images based on Ubuntu require you to install additional packages. For example, to run custom applications written in Python, PHP, Java, Node.js, or DotNET, your custom image must have the packages associated with those languages installed. On a typical workstation or server, packages are installed with a simple command like: Shell apt-get install python3 The process of installing new software in a Docker image is non-interactive, which means you don't have an opportunity to respond to prompts. This means you must add the -y argument to automatically answer "yes" to the prompt asking to continue with the package installation: Shell RUN apt-get install -y python3 Preventing Prompt Errors During Package Installation The installation of some packages attempts to open additional prompts to further customize installation options. In a non-interactive environment, such as during the construction of a Docker image, attempts to open these dialogs result in errors like: Shell unable to initialize frontend: Dialog These errors can be ignored as they don't prevent the packages from being installed. But the errors can be prevented by setting the DEBIAN_FRONTEND environment variable to noninteractive: Shell RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3 The Docker website provides official guidance on the use of the DEBIAN_FRONTEND environment variable. They consider it a cosmetic change and recommend against permanently setting the environment variable. The command above sets the environment variable for the duration of the single apt-get command, meaning any subsequent calls to apt-get will not have the DEBIAN_FRONTEND defined. Cleaning Up Package Lists Before any packages can be installed, you need to update the package list by calling: Shell RUN apt-get update However, the package list is of little value after the required packages have been installed. It's best practice to remove any unnecessary files from a Docker image to ensure the resulting image is as small as it can be. To clean up the package list after the required packages have been installed, the files under /var/lib/apt/lists/ are deleted. Here you update the package list, install the required packages, and clean up the package list as part of a single command, broken up over multiple lines with a backslash at the end of each line: Shell RUN DEBIAN_FRONTEND=noninteractive \ apt-get update \ && apt-get install -y python3 \ && rm -rf /var/lib/apt/lists/* Run as a Non-Root User By default, the root user is run in a Docker container. The root user typically has far more privileges than are required when running a custom application, so creating a new user without root privileges provides better security. The useradd command provides a non-interactive way to create new users. This isn't to be confused with the adduser command, which is a higher-level wrapper over useradd. After all configuration files have been edited and packages have been installed, you create a new user called apprunner: Shell RUN useradd -ms /bin/bash apprunner This user is then set as the default user for any further operations: Shell USER apprunner Conclusion It's possible to use the base Ubuntu Docker images with little customization beyond installing any required additional packages. But with a few tweaks to limit optional packages from being installed, cleaning up package lists after the packages are installed, and creating new users with limited permissions to run custom applications, you can create smaller and more secure images for your custom applications. Learn how to use other popular container images: Using the NGINX Docker image Using the Alpine Docker image Resources Ubuntu Docker image Dockerfile reference Happy deployments!
1. Start With a Minimal Base Image Starting with a basic, minimum image is essential when creating Docker images. This method minimizes security concerns while shrinking the image size. For basic base images, Alpine Linux and scratch (an empty base image) are common options. Avoid utilizing heavyweight base pictures unless essential and select a base image that complies with the needs of your application. There are various benefits to starting with a simple basic image. As fewer packages and libraries are included, it first decreases the attack surface of your container. As a result, security flaws are less likely to occur. Second, reduced image sizes result, making it simpler to share and deploy your container. 2. Use Multi-Stage Builds For good reason, multi-stage constructions have grown in favor. They let you utilize numerous Docker images throughout the build process, which helps to reduce the size of the final image by removing unneeded build artifacts. The idea behind multi-stage builds is to have one step for developing and compiling your software and another for the final runtime image. This separation guarantees that the final picture has just the information required for your program to run. In the build step, unneeded build tools, libraries, and intermediate files are left behind. For example, in a Go application, you can have one stage for building the binary and another for the runtime environment. This approach significantly reduces the image size and ensures that only the compiled binary and necessary runtime components are included. Multi-stage builds are particularly valuable for compiled languages like Go or Java, where intermediate build artifacts can be significant. They allow you to achieve a small image size while retaining the necessary components for running your application. 3. Optimize Layering Docker images are constructed from multiple layers, and optimizing these layers can have a significant impact on image size and build speed. Proper layering can be achieved through several practices: Combine Related Operations One key principle is to combine related operations into a single RUN instruction. Each RUN instruction creates a new layer in the image. By grouping related commands, you reduce the number of layers, which results in a smaller image. For example, instead of having separate RUN instructions for installing packages like this: Shell 1.RUN apt-get update 2.RUN apt-get install -y package 3.RUN apt-get install -y package2 You can combine them into a single RUN instruction: Shell 1.RUN apt-get update && apt-get install -y package1 package This simple change can lead to a significant reduction in the number of layers and, consequently, a smaller image. Order Instructions Carefully The order of instructions in your Dockerfile can also impact image size and build time. Docker caches each layer, and when it encounters a change in an instruction’s arguments, it invalidates the cache from that point onward. To maximize caching benefits, place instructions that change infrequently or not at all near the top of your Dockerfile. For instance, package installations or code downloads should be placed at the bottom of your Dockerfile, as they tend to change less frequently during development. Use .dockerignore The .dockerignore file is often overlooked but can significantly impact the size of your Docker image. Just as .gitignore excludes files from version control, .dockerignore specifies which files and directories should be excluded from being added to the image. By defining what to ignore, you can prevent unnecessary files and directories from being included in the image, further reducing its size. Typical entries in a .dockerignore file might include build artifacts, log files, and temporary files. 4. Minimize the Number of Run Instructions Each RUN instruction in a Dockerfile creates a new image layer. This means that minimizing the number of RUN instructions can make your Docker image smaller and more efficient. While it’s essential to break down your application setup into manageable steps, you should also aim to strike a balance between modularity and layer efficiency. Combining multiple commands into a single RUN instruction, as mentioned in the previous section, is a helpful strategy. However, keep in mind that a Dockerfile should also be readable and maintainable. While reducing the number of RUN instructions is beneficial for image size, it should not come at the cost of code clarity. 5. Leverage Caching Docker’s built-in caching mechanism can significantly speed up image builds by reusing previously cached layers. Understanding how caching works can help you optimize your Dockerfile. The caching mechanism works by comparing the arguments of an instruction with the previous build. If the arguments haven’t changed, Docker reuses the cached layer, saving time and resources. Here are some practices to make the most of Docker’s caching: Place Stable Instructions at the Top Instructions that change infrequently or not at all should be placed near the top of your Dockerfile. These can include base image pulls, package installations, and other setup steps that remain consistent. For example, if you’re using a package manager like apt, you can start by updating the package list and installing common dependencies. These commands are unlikely to change often during development. Define Variables Before the RUN Instruction If you’re using environment variables in your Dockerfile, define them before the RUN instruction that uses them. Docker caches layers based on instruction arguments, and environment variables are part of these arguments. By defining variables early, you ensure that changes in those variables don’t invalidate the entire cache for subsequent RUN instructions. 6. Clean Up Unnecessary Files After each RUN instruction, it’s a good practice to clean up any temporary or unnecessary files that were created during the build process. While these files may be required for specific build steps, they are often unnecessary in the final image. Use the RUN instruction to remove packages, source code, or build artifacts that were required during the build but are no longer needed in the final image. This not only reduces the image size but also minimizes potential security risks by removing unnecessary components. For instance, if you’ve compiled a binary from source code during the build, you can delete the source code and any intermediate build artifacts. Similarly, if you’ve installed development packages, you can remove them once the application is built. Here’s an example: Shell 1. # Install build dependencies 2. RUN apt-get update && apt-get install -y build-essential 3. # Build the application from source 4. RUN make 5. # Remove build dependencies 6. RUN apt-get purge -y build-essential 7. # Clean up package cache to reduce image size 8. RUN apt-get clean 9. # Remove any temporary files or build artifacts 10. RUN rm -rf /tmp/* This approach ensures that your final image only includes the necessary files and is as minimal as possible. 7. Set Environment Variables Wisely Environment variables are an essential part of configuring your application, but they should be used wisely in your Docker image. Avoid hardcoding sensitive information, such as passwords or API keys, directly into the image, as it poses security risks. Instead, consider the following practices for setting environment variables: Use Environment Files One common approach is to use environment files (.env) to store sensitive information. These files are not included in the image, making it easier to manage and secure sensitive data. For example, you can have an .env file that defines environment variables: YAML 1. DATABASE_URL=postgres://user:password@database:5432/mydb 2.API_KEY=your_api_key In your Dockerfile, you can then use these environment variables without exposing the actual values: Shell 1. # Copy the environment file into the container 2. COPY .env /app/.env 3. # Use environment variables in your application 4.CMD ["./start.sh"] This approach enhances security and allows you to change configuration without modifying the Docker image. Secrets Management In addition to environment files, you can use secrets management tools and features provided by Docker or container orchestration platforms. Docker Swarm and Kubernetes, for example, offer mechanisms for storing and injecting secrets into containers. These tools securely manage sensitive data and provide a way to pass secrets as environment variables to your containers without exposing them in the Dockerfile or image. 8. Use Labels for Metadata Docker allows you to add metadata to your images using labels. These labels can provide essential information about the image, such as the version, maintainer, or licensing information. Using labels helps with image organization and provides valuable documentation for your images. You can add labels to your Docker image using the LABEL instruction in your Dockerfile. Here’s an example: Shell 1. LABEL version="1.0" 2. LABEL maintainer="Your Name <your.email@example.com>" 3. LABEL description="This image contains the application XYZ." Labels are valuable for various purposes, including: Image Identification Labels help identify and categorize your images. You can use labels to specify the version of the application, its purpose, or any other relevant information. For example, you can add labels indicating whether an image is for development, testing, or production, making it easier to manage images in different environments. Documentation Labels also serve as documentation for your images. When someone else or a different team works with your Docker image, they can quickly find information about the image, its purpose, and contact details for the maintainer. Organization In large projects with multiple Docker images, labels can help organize and group images based on their functionality or role within the project. Labels are a simple yet effective way to enhance the clarity and manageability of your Docker images. 9. Security Considerations Security should be a top priority when building Docker images. Ensuring that your images are free from vulnerabilities and adhere to security best practices is essential for protecting your applications and data. Here are some security considerations to keep in mind: Regularly Update Your Base Image Your base image serves as the foundation for your Docker image. It’s important to keep it up to date to patch known vulnerabilities. Popular base images like Alpine Linux and official images from Docker Hub are frequently updated to address security issues. Set up a process to regularly check for updates to your base image and rebuild your Docker images to incorporate the latest security patches. Only Install Necessary Dependencies When creating your Docker image, only install the dependencies and packages that are necessary for your application to run. Unnecessary dependencies increase the attack surface and potential security risks. Review the packages in your image and remove any that are not required. Scan Your Images for Vulnerabilities Numerous tools and services are available for scanning Docker images for known vulnerabilities. Tools like Clair, Trivy, and Anchore can automatically check your images against known security databases and provide reports on any vulnerabilities detected. Incorporate regular image scanning into your CI/CD pipeline to catch and address vulnerabilities early in the development process. Principle of Least Privilege Adhere to the principle of least privilege when configuring your containers. Grant only the necessary permissions to your containers and applications. Avoid running containers as the root user, as this can lead to increased security risks. Use user namespaces and other security features to restrict the privileges of your containers, ensuring that they have the minimum access required to function. Secure Secrets Management Securely manage sensitive information such as passwords, API keys, and tokens. Avoid storing secrets directly in your Docker image or environment variables. Instead, use secrets management tools provided by your container orchestration platform or consider third-party solutions. Secrets management tools, like Docker’s own secret management or Kubernetes’ Secrets, can help protect sensitive data and control access to it. Monitoring and Auditing Implement monitoring and auditing mechanisms to track and detect any suspicious activities within your containers. Use container-specific security solutions to gain visibility into your containerized applications and monitor for security breaches. Regularly review and analyze logs and events generated by your containers to identify and respond to potential security threats. 10. Conclusion Building efficient and secure Docker images is critical for containerized application success. You can generate pictures that are smaller, quicker, and more secure by following best practices such as beginning with a minimum base image, employing multi-stage builds, optimizing layering, and addressing security. Your Docker image construction process may be simplified and more robust with careful preparation and attention to detail, resulting in a smoother development and deployment experience. Following these best practices enhances not just your containerized applications, but also leads to improved resource management and lower operational overhead. In conclusion, recommended practices for Docker image construction are critical for optimizing your containerized applications and assuring their efficiency and security. By following these suggestions, you may improve your Docker image-building process and generate more manageable and stable containers.
WebAssembly, or Wasm, is increasingly relevant in software development. It's a portable binary code format designed for efficient and fast execution on any platform, including web browsers. Watch the hands-on tutorial: WebAssembly is so essential for the web that Solomon Hykes, the founder of Docker, announced that if Wasm and WASI were available in 2008, Docker might not have been developed. Despite its advantages, WebAssembly has yet to reach the same level of adoption as Docker. Part of the challenge lies in the complexity of the tooling for Wasm, particularly in building, running, and debugging across different languages. For example, creating a Wasm binary involves installing a language-specific compiler toolchain. Docker offers a solution here, providing a reproducible and isolated build environment. Let's see how it works. Docker and WebAssembly In July 2023, Docker introduced experimental support for WebAssembly, adding a new dimension to running Wasm applications. This integration brings several advantages: Simplified process: Using Docker for building and running Wasm applications reduces the learning curve by minimizing the required tools. Enhanced portability: Wasm containers don't require different builds for different machine architectures, simplifying deployment. Consistent builds: Docker's isolated environment ensures consistent builds across various platforms. Integration with existing tools: Docker's compatibility with Docker Compose and Kubernetes facilitates complex deployments and scaling. Building a Wasm Container Let's start by creating a new project: $ cargo new docker-wasm-demo-rust The default project is a "Hello, World" application printing to the console, perfect as a first step. To build and run it, we use cargo: $ cargo build --release $ cargo run Hello World! But this is a native binary, not WebAssembly. To compile a Wasm target, let's install the WebAssembly toolchain. This command requires that you have installed rustup: $ rustup target add wasm32-wasi Next, we can build a Wasm target with: $ cargo build --target wasm32-wasi --release We can't run this directly from the command line unless we install some runtime like wasmtime: $ was time target/wasm32-wasi/release/docker-wasm-demo-rust.wasm Hello World! This works, but there's a lot of tooling involved. We need the Rust compilers, the Wasm toolchain, and some runtime to test it. We can streamline all this with Docker. Running Wasm in Docker Before using Docker with WebAssembly, we need to ensure we have the latest version of Docker Desktop installed and then enable containerd and Wasm support in the options: Then we need to create a simple Dockerfile that copies the wasm inside the container: # Dockerfile FROM scratch COPY target/wasm32-wasi/release/docker-wasm-demo-rust.wasm /hello.wasm ENTRYPOINT [ "/hello.wasm" ] We need to build the image, but we need to switch to the wasi/wasm platform for this case. docker build --platform wasi/wasm -t hello-wasm . If we check the Docker Desktop image tab, we should see the new image with the WASM badge. This means the image is actually WebAssembly instead of a regular container: To run the image, we use the familiar docker run with two extra flags: --runtime=io.containerd.wasmedge.v1 one of the possible runtimes supported by Docker Desktop --platform=wasi/wasm to tell Docker we want to run the Wasm image. Otherwise, Docker will fail to find the image. docker run --runtime=io.containerd.wasmedge.v1 --platform=wasi/wasm hello-wasm Building Wasm With Docker We can step the process further by using Docker to build the image. This allows us to run the build in a clean, shareable environment, making it easier to run the build stage in CI/CD. The following 2-stage Dockerfile builds and creates the Wasm image: The first stage uses the Rust base image to build the Wasm inside the container The second stage copies the Wasm binary from the first stage and creates the Wasm image. # Dockerfile FROM --platform=$BUILDPLATFORM rust:1.74 AS build RUN rustup target add wasm32-wasi RUN mkdir -p /build WORKDIR /build COPY Cargo.toml . COPY src ./src RUN cargo build --target wasm32-wasi --release RUN chmod a+x /build/target/wasm32-wasi/release/docker-wasm-demo-rust.wasm FROM scratch COPY --link --from=build /build/target/wasm32-wasi/release/docker-wasm-demo-rust.wasm /hello.wasm ENTRYPOINT [ "/hello.wasm" ] Let's build the image again: $ docker build --platform wasi/wasm -t hello-wasm . And then run it: $ docker run --runtime=io.containerd.wasmedge.v1 --platform=wasi/wasm hello-wasm As you can see, this new Docker feature lets us work with Wasm without needing anything other than Docker Desktop on our machines. Neat! Conclusion WebAssembly and Docker do make an interesting combination. At the end of the day, having more options for developers, it's always good news. Be sure to check the YouTube tutorial video above for a more in-depth explanation. Thanks for reading, and happy building!
This blog post focuses on optimizing the size of JVM Docker images. It explores various techniques such as multi-stage builds, jlink, jdeps, and experimenting with base images. By implementing these optimizations, deployments can be faster, and resource usage can be optimized. The Problem Since Java 11, there is no pre-bundled JRE provided. As a result, basic Dockerfiles without any optimization can result in large image sizes. In the absence of a provided JRE, it becomes necessary to explore techniques and optimizations to reduce the size of JVM Docker images. Now, let's take a look at the simplest version of the Dockerfile for our application and see what's wrong with it. The project we will use in all the examples is Spring Petclinic. The simplest Dockerfile for our project looks like this: NOTE: Do not forget to build your JAR file. Dockerfile FROM eclipse-temurin:17 VOLUME /tmp COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar app.jar After we have built the JAR file of our project, let's build our Dockerfile image and compare the sizes of our JAR file and the created Docker image. Dockerfile docker build -t spring-pet-clinic/jdk -f Dockerfile . docker image ls spring-pet-clinic/jdk # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jdk latest 3dcd0ab89c3d 23 minutes ago 465MB If we look at the SIZE column, we can see that the size of our Docker image is 465MB! That's a lot, you might think, but maybe it's because our JAR is pretty big? In order to verify this, let's take a look at the size of our JAR file using the following command: Dockerfile ls -lh target/spring-petclinic-3.1.0-SNAPSHOT.jar | awk '{print $9, $5}' # target/spring-petclinic-3.1.0-SNAPSHOT.jar 55M According to the output of our command, you can see that the size of our JAR file is only 55MB. If we compare it to the size of a built Docker image, our JAR file is almost nine times smaller! Let's move on to analyze the reasons and how to make it smaller. What Are the Reasons for Big Docker Images, and How To Reduce Them? Before we move on to the optimization of our Docker image, we need to find out what exactly is causing it to be so relatively large. To do this, we will use a tool called Dive which is used for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image. To install Dive, follow the guide in their README: Now, let’s find out why our Docker image has such a size by exploring layers by using this command: dive spring-pet-clinic/jdk (instead of spring-pet-clinic/jdk use your Docker image name). Its output may feel a little bit overwhelming, but don’t worry, we will explore its output together. For our purpose, we are mostly interested only in the top left part, which is the layers of our Docker image. We can navigate between layers by using the “arrow” buttons. Now, let’s find out which layers our Docker image consists of. Remember, these are the layers of Docker image built from our basic Dockerfile. The first layer is our operating system. By default, it is Ubuntu. In the next one, it installs tzdata, curl, wget, locales, and some more different utils, which takes 50MB! The third layer, as you can see from the screenshot above, is our entire Eclipse Temurin 17 JDK, and it takes 279MB, which is pretty big. And the last one is our built JAR, which takes 58MB. Now that we understand what our Docker image consists of, we can see that a big part of our Docker image includes the entire JDK and things such as timezones, locales, and different utilities, which is unnecessary. The first optimization for our Docker images is to use jlink tool included in Java 9 along with modularity. With jlink, we can create a custom Java runtime that includes only the necessary components, resulting in a smaller final image. Now, let's take a look at our new Dockerfile incorporating the jlink tool, which, in theory, should be smaller than the previous one. Dockerfile # Example of custom Java runtime using jlink in a multi-stage container build FROM eclipse-temurin:17 as jre-build # Create a custom Java runtime RUN $JAVA_HOME/bin/jlink \ --add-modules ALL-MODULE-PATH \ --strip-debug \ --no-man-pages \ --no-header-files \ --compress=2 \ --output /javaruntime # Define your base image FROM debian:buster-slim ENV JAVA_HOME=/opt/java/openjdk ENV PATH "${JAVA_HOME}/bin:${PATH}" COPY --from=jre-build /javaruntime $JAVA_HOME # Continue with your application deployment RUN mkdir /opt/app COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar /opt/app/app.jar CMD ["java", "-jar", "/opt/app/app.jar"] To understand how our new Dockerfile works, let's walk through it: We use multi-stage Docker build in this Dockerfile and it consists of 2 stages. For the first stage, we use the same base image as in the previous Dockerfile. Also, we employ jlink tool to create a custom JRE, including all Java modules using —add-modules ALL-MODULE-PATH The second stage uses the debian:buster-slim base image and sets the environment variables for JAVA_HOME and PATH. It copies the custom JRE created in the first stage to the image. The Dockerfile then creates a directory for the application, copies the application JAR file into it, and specifies a command to run the Java application when the container starts. Let’s now build our container image and find out how much smaller it has become. Dockerfile docker build -t spring-pet-clinic/jlink -f Dockerfile_jlink . docker image ls spring-pet-clinic/jlink # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jlink latest e7728584dea5 1 hours ago 217MB Our new container image is 217MB in size, which is two times smaller than our previous one. Stripping Container Image Size, Even More, Using Java Dependency Analysis Tool (Jdeps) What if I told you that the size of our container image can be made even smaller? When paired with jlink, you can also use the Java Dependency Analysis Tool (jdeps), which was first introduced in Java 8, to understand the static dependencies of your applications and libraries. In our previous example, for the jlink —add-modules parameter, we set ALL-MODULE-PATH which adds all existing Java modules in our custom JRE, and obviously, we don’t need to include every module. This way we can use jdeps to analyze the project's dependencies and remove any unused ones, further reducing the image size. Let’s take a look at how to use jdeps in our Dockerfile: Dockerfile # Example of custom Java runtime using jlink in a multi-stage container build FROM eclipse-temurin:17 as jre-build COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar /app/app.jar WORKDIR /app # List jar modules RUN jar xf app.jar RUN jdeps \ --ignore-missing-deps \ --print-module-deps \ --multi-release 17 \ --recursive \ --class-path 'BOOT-INF/lib/*' \ app.jar > modules.txt # Create a custom Java runtime RUN $JAVA_HOME/bin/jlink \ --add-modules $(cat modules.txt) \ --strip-debug \ --no-man-pages \ --no-header-files \ --compress=2 \ --output /javaruntime # Define your base image FROM debian:buster-slim ENV JAVA_HOME=/opt/java/openjdk ENV PATH "${JAVA_HOME}/bin:${PATH}" COPY --from=jre-build /javaruntime $JAVA_HOME # Continue with your application deployment RUN mkdir /opt/server COPY --from=jre-build /app/app.jar /opt/server/ CMD ["java", "-jar", "/opt/server/app.jar"] Even without going into details, you can see that our Dockerfile has become much larger. Now let's analyze each piece and what it is responsible for: We still use multi-stage Docker build. Copy our built Java app and set WORKDIR to /app. Unpacks the JAR file, making its contents accessible for jdeps tool. The second RUN instruction runs jdeps tool on the extracted JAR file to analyze its dependencies and create a list of required Java modules. Here's what each option does: --ignore-missing-deps: Ignores any missing dependencies, allowing the analysis to continue. --print-module-deps: Specifies that the analysis should print the module dependencies. --multi-release 17: Indicates that the application JAR is compatible with multiple Java versions, in our case, Java 17. --recursive: Performs a recursive analysis to identify dependencies at all levels. --class-path 'BOOT-INF/lib/*': Defines the classpath for the analysis, instructing "jdeps" to look in the "BOOT-INF/lib" directory within the JAR file. app.jar > modules.txt: Redirects the output of the "jdeps" command to a file named "modules.txt," which will contain the list of Java modules required by the application. Then, we replace the ALL-MODULE-PATH value for —add-modules jlink parameter with $(cat modules.txt) to include only necessary modules # Define your base image section stays the same as in the previous Dockerfile. # Continue with your application deployment was modified to COPY out JAR file from the previous stage. The only thing left to do is to see how much the container image has shrunk using our latest Dockerfile: Dockerfile docker build -t spring-pet-clinic/jlink_jdeps -f Dockerfile_jdeps . docker image ls spring-pet-clinic/jlink_jdeps # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jlink_jdeps latest d24240594f1e 3 hours ago 184MB So, by using only the modules we need to run our application, we reduced the size of our container image by 33MB, not a lot, but still nice. Conclusion Let's take another look, using Dive, at how our Docker images have shrunk after our optimizations. Instead of using the entire JDK, in this case, we built our custom JRE using jlink tool and using debian-slim base image. Which significantly reduced our image size. And, as you can see, we don’t have unnecessary stuff, such as timezones, locales, big OS, and entire JDK. We include only what we use and need. Dockerfile_jlink Here, we went even further and passed only used Java modules to our JRE, making the built JRE even smaller, thus reducing the size of the entire final image. Dockerfile_jdeps In conclusion, reducing the size of JVM Docker images can significantly optimize resource usage and speed up deployments. Employing techniques like multi-stage builds, jlink, jdeps, and experimenting with base images can make a substantial difference. While the size reduction might seem minimal in some cases, the cumulative effect can be significant, especially in environments where multiple containers are running. Thus, optimizing Docker images should be a key consideration in any application development and deployment process.
Cluster management systems are critical software solutions that enable the efficient allocation and utilization of computing resources in a network of interconnected machines. No doubt, they play a vital role in modern computing by ensuring scalability, high availability, and effective resource management, making them essential for running complex applications, managing data centers, and further multiplying the power of distributed computing. As reported by National Grid ESO, data centers, despite all advancements, still account for a significant 1% of global electricity consumption, and that is where Cluster Management Systems might play a crucial role in enhancing energy efficiency. Before we dive into the details, it's important to note that this article is not about declaring one system as the "better" choice. Instead, we're starting a journey to compare and contrast two prominent open-source cluster management systems, Kubernetes and Apache Mesos, for they have quite different approaches. We'll shine a light on their unique features, strengths, and weaknesses, helping one make informed decisions based on one’s specific needs. So, whether you're a seasoned IT professional looking to fine-tune your cluster management strategy or someone new to the world of distributed systems, join us on this route as we dissect and explore the fascinating realm of Kubernetes and Apache Mesos. It's all about understanding the nuances and making the right choice for your next big project. Why Them Specifically? The Reason Behind the Comparison Comparing Kubernetes and Mesos is a strategic choice born out of their prominence in the world of cluster management systems. These two open-source solutions have earned substantial attention, boasting large user communities, diverse use cases, and robust ecosystems of tools and extensions. While there are indeed other cluster management tools available, such as Docker Swarm or Nomad, Kubernetes, and Mesos often appear as top contenders in discussions about large-scale orchestration and resource management. This comparison is a starting point for understanding the fundamental approaches and philosophies behind different cluster management systems. Background Information: Kubernetes Kubernetes was born at Google. It evolved from their internal Borg system and its offspring, the experimental cluster manager Omega. Google open-sourced Kubernetes in 2014, and since then, it's grown into a crushing force with a thriving open-source community. As reported in the Kubernetes Companies table dashboard, among its contributors are such eminent tech companies as Google itself (128 contributions in the last six months), Red Hat (109), Microsoft (55), and others. Key Features and Concepts Kubernetes can be viewed as the core part, providing storage and a suite of APIs for constructing distributed systems, complemented by a robust set of built-in objects and controllers like the "batteries-included" package. Some of its prominent features include: Pods: Pods are the smallest units of work in Kubernetes, grouping one or more containers together. Services: They assist the applications in communication with each other, whether they're in the same pod or scattered across the cluster. Replication Controllers: These keep the applications running smoothly by making sure one has the right number of copies (replicas) running. Load Balancing: Kubernetes can distribute traffic evenly to the application replicas; it ensures the users get a smooth experience. Introduction to Apache Mesos The journey of Apache Mesos started back at the University of California, Berkeley, and it was open-sourced in 2010. Initially, it was a research project conducted by PhD student Benjamin Hindman. In the subsequent, Hindman collaborated a lot with John Wilkes, one of the authors of the Omega mentioned above: they cooperated extensively on the design of Apache Mesos and Omega, though their respective approaches eventually took distinct paths in the realm of cluster management. Now, Apache Mesos is a robust framework used by companies like Twitter (now X) and Airbnb. Key Features and Concepts Mesos is not just about containers but rather about managing resources like CPU and memory across your entire data center. As stated in the whitepaper by its creators, Mesos allocates resources in a fine-grained manner, letting frameworks achieve data proximity by alternating reads from data stored on each machine. Some of its features are: Resource Allocation: As mentioned above, Mesos can divide the data center's resources, allocating them to applications dynamically. Frameworks: Think of these as specialized managers for different types of workloads, like running Spark for big data or a web server for one’s website. Fault Tolerance: Mesos is known for its resilience, managing hardware failures using Zookeeper for fault-resistant coordination and utilizing sharding techniques that synchronize with host agents post-leader failure. Multi-Tenancy: It is able to run different workloads on the same cluster without interruption. In summary, we have Kubernetes, the sophisticated container orchestrator, and Mesos, the master of resource allocation. These introductions set the stage for the deep examination of their worlds. Comparison Architecture and Design Kubernetes and Mesos come at cluster management from different angles. On the one hand, Mesos master extends offers to application schedulers (known as "frameworks"), which they can choose to accept or decline; on the other hand, Kubernetes enables clients (be it controllers or even via CLI) to submit a resource request (in the form of a Pod) to a specific scheduler that satisfies those requests. Scalability and Performance Kubernetes has a mastery of scaling applications up or down. Its auto-scaling features allow for adapting to changing workloads seamlessly. Kubernetes also has built-in load balancing, which helps distribute traffic smoothly to keep the apps performing. Mesos, with its fine-grained resource allocation, showcases the great performance. It can allocate resources with great precision, making it suitable for diverse workloads. It's the most suitable for resource allocation and ensuring efficient use of the cluster's resources. Ecosystem and Community Kubernetes has a massive and vibrant community. The ecosystem is vast, with tools like Helm for packaging apps, Prometheus for monitoring, and Grafana for visualization. Other than that, Kubernetes has gained extensive support from major cloud providers such as Google Cloud's GKE, Amazon's EKS, and Microsoft Azure's AKS. Mesos has a smaller community but still has its share of frameworks and libraries. Apache Spark and Hadoop are some famous frameworks that call Mesos home. While Kubernetes sees broader managed service support, Mesos also receives backing from various cloud providers, including Microsoft and Oracle, which announced support for it on their cloud platforms, Azure and Oracle Container Cloud Service, respectively. Ease of Use and Learning Curve Kubernetes has significantly advanced in usability, yet it may present complexities for those unfamiliar with its ecosystem. It necessitates some learning, particularly concerning YAML files and their distinct terminology. Mesos, on the other hand, offers a more straightforward initiation for those acquainted with Linux. Nonetheless, constructing custom frameworks presents its own set of challenges and demands diligence. Fault Tolerance and High Availability Kubernetes has robust fault tolerance, as it is built on top of etcd, a distributed, reliable key-value store. If a pod goes down, Kubernetes resurrects it. Mesos handles failures similarly as it relies on Zookeeper with similar use cases as Kubernetes, but fault tolerance often depends on the frameworks you're using. Security Kubernetes offers strong security features with role-based access control, network policies, and pod security policies. Mesos has security measures like framework isolation and authentication. It ensures that your frameworks can't trample over each other. What Might Be the Choice? The choice between Kubernetes and Apache Mesos depends on various factors, including the specific use case, requirements, and organizational context. There is no one universal answer, as both cluster management systems have their strengths and weaknesses. Here are some considerations to help one make an informed decision: Choose Kubernetes if the most important is: Container orchestration Community and ecosystem Ease of use Standardization Choose Apache Mesos if you value: Resource flexibility Multi-Tenancy Advanced use cases Customization Legacy Integration Ultimately, the choice depends on the specific requirements, existing infrastructure, and the expertise of the team. In some cases, organizations may even choose to use both Kubernetes and Mesos within their environments, each serving a distinct purpose. It's crucial to evaluate both systems thoroughly and consider how well they align with the long-term goals and technical constraints before making a decision — and hopefully, this article was able to assist you in that.
Yitaek Hwang
Software Engineer,
NYDIG
Abhishek Gupta
Principal Developer Advocate,
AWS
Marija Naumovska
Product Manager,
Microtica