Development and programming tools are used to build frameworks, and they can be used for creating, debugging, and maintaining programs — and much more. The resources in this Zone cover topics such as compilers, database management systems, code editors, and other software tools and can help ensure engineers are writing clean code.
AWS Gateway is a powerful tool for building APIs that scale to meet the demands of modern web and mobile applications. With AWS Gateway, you can create RESTful APIs that expose your data and business logic to developers who can then build rich, interactive applications that consume your API. REST API is an industry standard for building scalable, distributed web applications. With AWS Gateway, you can easily build a REST API that supports both GET and POST methods, as well as complex query parameters. You can also add support for other HTTP methods, such as PUT, DELETE, and HEAD. Using AWS Gateway, you can quickly create APIs that are secure and robust. You can also use it to deploy your code to a production environment with minimal effort. Additionally, AWS Gateway allows for seamless integration with other AWS services, such as S3 and DynamoDB, enabling you to easily add complex functionality to your APIs. Pre-requisites Before building a RESTful API with AWS Gateway, you should have the following in place: Create an AWS account if you don’t have one already. Log in to the AWS Management Console and navigate to the Amazon API Gateway service. Click on "Create API" and select "REST API". Click on "Actions" to define the resource and click "Create Method". Choose the HTTP verb (e.g. GET, POST, PUT, etc.) and click on the checkmark to create the method. In the "Integration type" section, select "Lambda Function" and enter the name of the Lambda function you want to use to handle the API requests. Click on "Save" to create the API. Select Node from the Runtime Dropdown. Code Example Python import json # Example data data = { "items": [ {"id": 1, "name": "Item 1", "price": 10.99}, {"id": 2, "name": "Item 2", "price": 15.99}, {"id": 3, "name": "Item 3", "price": 20.99}, ] } def lambda_handler(event, context): # Determine the HTTP method of the request http_method = event["httpMethod"] # Handle GET request if http_method == "GET": # Return the data in the response response = { "statusCode": 200, "body": json.dumps(data) } return response # Handle POST request elif http_method == "POST": # Retrieve the request's body and parse it as JSON body = json.loads(event["body"]) # Add the received data to the example data data["items"].append(body) # Return the updated data in the response response = { "statusCode": 200, "body": json.dumps(data) } return response # Handle PUT request elif http_method == "PUT": # Retrieve the request's body and parse it as JSON body = json.loads(event["body"]) # Update the example data with the received data for item in data["items"]: if item["id"] == body["id"]: item.update(body) break # Return the updated data in the response response = { "statusCode": 200, "body": json.dumps(data) } return response # Handle DELETE request elif http_method == "DELETE": # Retrieve the request's body and parse it as JSON body = json.loads(event["body"]) # Find the item with the specified id in the example data for i, item in enumerate(data["items"]): if item["id"] == body["id"]: # Remove the item from the example data del data["items"][i] break # Return the updated data in the response response = { "statusCode": 200, "body": json.dumps(data) } return response else: # Return an error message for unsupported methods response = { "statusCode": 405, "body": json.dumps({"error": "Method not allowed"}) } return response This code defines a Lambda function, lambda_handler, that handles different types of HTTP requests (GET, POST, PUT, DELETE) on some data. The data is an object containing an array of items, each item has an id, name, and price. When the function is called, it first determines the HTTP method of the request from the event object. Then it handles the request accordingly: GET: returns the data in the response with a status code of 200. POST: retrieves the request's body and parses it as JSON, then add the received data to the example data, then returns the updated data in the response with a status code of 200. PUT: retrieves the request's body and parses it as JSON, then updates the example data with the received data, then returns the updated data in the response with a status code of 200. DELETE: retrieves the request's body and parses it as JSON, then find the item with the specified id in the example data and removes it, then returns the updated data in the response with a status code of 200. If the method is not supported, it will return an error message with a status code of 405. Deploy the API by clicking on "Actions" and selecting "Deploy API". Select a deployment stage (e.g. "prod" or "test") and click on "Deploy".Use the generated API endpoint to make requests to your API. Running and Testing the Code in Postman Now, our API is up and running. You can send a test HTTP request through Postman. By sending a request to your invoke URL, you should see a 200 OK status code. For this test, no request body is needed for the incoming request.
Apache Kafka is an event streaming platform that was developed by LinkedIn and later made open-source under the Apache Software Foundation. Its primary function is to handle high-volume real-time data streams and provide a scalable and fault-tolerant architecture for creating data pipelines, streaming applications, and microservices. Kafka employs a publish-subscribe messaging model, in which data is sorted into topics, and publishers send messages to those topics. Subscribers can then receive those messages in real time. The platform offers a scalable and fault-tolerant architecture by spreading data across multiple nodes and replicating data across multiple brokers. This guarantees that data is consistently available, even if a node fails. Kafka's architecture is based on several essential components, including brokers, producers, consumers, and topics. Brokers manage the message queues and handle message persistence, while producers and consumers are responsible for publishing and subscribing to Kafka topics, respectively. Topics function as the communication channels through which messages are sent and received. Kafka also provides an extensive range of APIs and tools to manage data streams and build real-time applications. Kafka Connect, one of its most popular tools and APIs, enables the creation of data pipelines that integrate with other systems. Kafka Streams, on the other hand, allows developers to build streaming applications using a high-level API. In summary, Kafka is a robust and adaptable platform that can be used to construct real-time data pipelines and streaming applications. It has been widely adopted in various sectors, including finance, healthcare, e-commerce, and more. To create a Kafka data stream using Camel, you can use the Camel-Kafka component, which is already included in Apache Camel. Below are the steps to follow for creating a Kafka data stream using Camel: Prepare a Kafka broker and create a topic for the data stream. Set up a new Camel project on your IDE and include the required Camel dependencies, including the Camel-Kafka component. Create a new Camel route within your project that defines the data stream. The route should use the Kafka component and specify the topic to which the data should be sent or received. Select the appropriate data format for the data stream. For instance, if you want to send JSON data, use the Jackson data format to serialize and deserialize the data. Launch the Camel context and the Kafka producer or consumer to start sending or receiving data. Overall, using the Camel-Kafka component with Apache Camel is a simple way to create data streams between applications and a Kafka cluster. Here is the code for reading Table form DB and writing to Kafka cluster: Apache Camel Producer Application: Java import org.apache.camel.builder.RouteBuilder; import org.apache.camel.component.kafka.KafkaConstants; import org.springframework.stereotype.Component; @Component public class OracleDBToKafkaRouteBuilder extends RouteBuilder { @Override public void configure() throws Exception { // Configure Oracle DB endpoint String oracleDBEndpoint = "jdbc:oracle:thin:@localhost:1521:orcl"; String oracleDBUser = "username"; String oracleDBPassword = "password"; String oracleDBTable = "mytable"; String selectQuery = "SELECT * FROM " + oracleDBTable; // Configure Kafka endpoint String kafkaEndpoint = "kafka:my-topic?brokers=localhost:9092"; String kafkaSerializer = "org.apache.kafka.common.serialization.StringSerializer"; from("timer:oracleDBPoller?period=5000") // Read from Oracle DB .to("jdbc:" + oracleDBEndpoint + "?user=" + oracleDBUser + "&password=" + oracleDBPassword) .setBody(simple(selectQuery)) .split(body()) // Serialize to Kafka .setHeader(KafkaConstants.KEY, simple("${body.id}")) .marshal().string(kafkaSerializer) .to(kafkaEndpoint); } } Here is the code for reading Kafka Topic and writing the Oracle DB table: Apache Camel Camel Application; Java import org.apache.camel.builder.RouteBuilder; import org.apache.camel.component.kafka.KafkaConstants; import org.springframework.stereotype.Component; @Component public class KafkaToOracleDBRouteBuilder extends RouteBuilder { @Override public void configure() throws Exception { // Configure Kafka endpoint String kafkaEndpoint = "kafka:my-topic?brokers=localhost:9092"; String kafkaDeserializer = "org.apache.kafka.common.serialization.StringDeserializer"; // Configure Oracle DB endpoint String oracleDBEndpoint = "jdbc:oracle:thin:@localhost:1521:orcl"; String oracleDBUser = "username"; String oracleDBPassword = "password"; String oracleDBTable = "mytable"; from(kafkaEndpoint) // Deserialize from Kafka .unmarshal().string(kafkaDeserializer) .split(body().tokenize("\n")) // Write to Oracle DB .to("jdbc:" + oracleDBEndpoint + "?user=" + oracleDBUser + "&password=" + oracleDBPassword) .setBody(simple("INSERT INTO " + oracleDBTable + " VALUES(${body})")) .to("jdbc:" + oracleDBEndpoint + "?user=" + oracleDBUser + "&password=" + oracleDBPassword); } }
Docker Swarm: Simplifying Container Orchestration In recent years, containers have become an increasingly popular way to package, distribute, and deploy software applications. They offer several advantages over traditional virtual machines, including faster start-up times, improved resource utilization, and greater flexibility. However, managing containers at scale can be challenging, especially when running large, distributed applications. This is where container orchestration tools come into play, and Docker Swarm is one of the most popular options available. What Is Docker Swarm? Docker Swarm is a container orchestration tool that allows you to deploy and manage a cluster of Docker nodes. Each node is a machine that hosts one or more Docker containers, and together, they form a swarm. Docker Swarm provides a simple and intuitive interface for managing and monitoring your containers, making it an ideal tool for large-scale container deployments. Docker Swarm makes it easy to deploy and manage containerized applications across multiple hosts. It provides features such as load balancing, automatic service discovery, and fault tolerance. With Docker Swarm, you can easily scale your applications up or down by adding or removing Docker nodes from the cluster, making it easy to handle changes in traffic or resource usage. How Does Docker Swarm Work? Docker Swarm allows you to deploy and manage a cluster of Docker nodes. The nodes are machines that host one or more Docker containers, and they work together to form a swarm. When you deploy an application to Docker Swarm, you define a set of services that make up the application. Each service consists of one or more containers that perform a specific function. For example, you might have a service that runs a web server and another service that runs a database. Docker Swarm automatically distributes the containers across the nodes in the swarm, ensuring that each service is running on the appropriate nodes. It also provides load balancing and service discovery, making it easy to access your applications from outside the swarm. Docker Swarm uses a leader-follower model to manage the nodes in the swarm. The leader node is responsible for managing the overall state of the swarm and coordinating the activities of the follower nodes. The follower nodes are responsible for running the containers and executing the tasks assigned to them by the leader node. Docker Swarm is built on top of the Docker Engine, which is the core component of the Docker platform. The Docker Engine runs on each node in the swarm and manages the lifecycle of containers running on that node. When you deploy an application to a Docker Swarm, you define a set of services that make up the application. Each service consists of one or more containers that perform a specific function. For example, you might have a service that runs a web server and another service that runs a database. Docker Swarm automatically distributes the containers across the nodes in the swarm, ensuring that each service is running on the appropriate nodes. It also provides load balancing and service discovery, making it easy to access your applications from outside the swarm. Docker Swarm provides several features that make it easy to manage containers at scale, including: Load Balancing Docker Swarm automatically distributes incoming traffic across the nodes running the containers in the swarm, ensuring that each container receives a fair share of the traffic. Docker Swarm provides built-in load balancing to distribute traffic evenly across containers in a cluster. This helps to ensure that each container receives an equal share of the workload and prevents any single container from becoming overloaded. Automatic Service Discovery Docker Swarm automatically updates a DNS server with the IP addresses of containers running in the swarm. This makes it easy to access your containers using a simple domain name, even as the containers move around the swarm. Docker Swarm automatically assigns unique DNS names to containers, making it easy to discover and connect to services running within the swarm. This feature simplifies the management of large, complex, containerized applications. Fault Tolerance Docker Swarm automatically detects when a container fails and automatically restarts it on another node in the swarm. This ensures that your applications remain available even if individual containers or nodes fail. Scaling Docker Swarm makes it easy to scale your applications up or down by adding or removing nodes from the swarm. This makes it easy to handle changes in traffic or resource usage. Docker Swarm enables easy scaling of containerized applications. As your application traffic grows, you can add more nodes to the cluster, and Docker Swarm automatically distributes the containers across the new nodes. Rolling Updates Docker Swarm allows for rolling updates, where you can update containers without disrupting the application’s availability. This is achieved by updating containers one at a time while other containers continue to handle the traffic. Security Docker Swarm provides built-in security features to help protect your containerized applications. For example, it supports mutual TLS encryption for securing communication between nodes in the cluster. Ease of Use Docker Swarm is designed to be easy to use, with a simple API and command-line interface that makes it easy to deploy and manage containerized applications. High Availability Docker Swarm is designed to provide high availability for containerized applications. It automatically distributes containers across multiple nodes in a cluster and provides fault tolerance so that even if a node or container fails, the application remains available. Overall, Docker Swarm provides a range of powerful features that make it an ideal choice for managing containers at scale. With its support for high availability, scalability, load balancing, service discovery, rolling updates, security, and ease of use, Docker Swarm simplifies the management of containerized applications, allowing you to focus on delivering value to your customers. Benefits of Docker Swarm Docker Swarm offers several benefits for organizations that are deploying containerized applications at scale. These include: Simplified Management Docker Swarm provides a simple and intuitive interface for managing containers at scale. This makes it easy to deploy, monitor, and scale your applications. High Availability Docker Swarm provides built-in fault tolerance, ensuring that your applications remain available even if individual containers or nodes fail. Scalability Docker Swarm makes it easy to scale your applications up or down by adding or removing nodes from the swarm. This makes it easy to handle changes in traffic or resource usage. Compatibility Docker Swarm is fully compatible with the Docker platform, making it easy to use alongside other Docker tools and services. Portability Docker Swarm allows you to easily deploy and manage containerized applications across different environments, including on-premises and in the cloud. This helps to ensure that your applications can be easily moved and scaled as needed, providing flexibility and agility for your business. Conclusion Docker Swarm is a powerful tool for managing containers at scale. It provides a simple and intuitive interface for deploying and managing containerized applications across multiple hosts while also providing features such as load balancing, automatic service discovery, and fault tolerance. Docker Swarm is a very powerful tool for anyone looking to deploy and manage containerized applications at scale. It provides a simple and intuitive interface for managing a cluster of Docker nodes, allowing you to easily deploy and manage services across multiple hosts. With features such as load balancing, service discovery, and fault tolerance, Docker Swarm makes it easy to run containerized applications in production environments. If you’re using Docker for containerization, Docker Swarm is definitely worth checking out.
What Is SIEM? SIEM stands for Security Information and Event Management. It is a software solution that provides real-time analysis of security alerts generated by network hardware and applications. SIEM collects log data from multiple sources such as network devices, servers, and applications, then correlates and analyzes this data to identify security threats. SIEM can help organizations improve their security posture by providing a centralized view of security events across the entire IT infrastructure. It allows security analysts to quickly identify and respond to security incidents and provides detailed reports for compliance purposes. Some of the key features of SIEM solutions include: Log collection and analysis Real-time event correlation and alerting User and entity behavior analytics Threat intelligence integration Compliance reporting SIEM is often used in conjunction with other security solutions, such as firewalls, intrusion detection systems, and antivirus software, to provide comprehensive security monitoring and incident response capabilities. What Is ELK? ELK is an acronym for a set of open-source software tools used for log management and analysis: Elasticsearch, Logstash, and Kibana. Elasticsearch is a distributed search and analytics engine that provides fast search and efficient storage of large volumes of data. It is designed to be scalable and can handle a large number of queries and indexing operations in real-time. Logstash is a data collection and processing tool that allows you to collect logs and other data from multiple sources, such as log files, syslog, and other data sources, and transform and enrich the data before sending it to Elasticsearch. Kibana is a web-based user interface that allows you to visualize and analyze data stored in Elasticsearch. It provides a range of interactive visualizations, such as line graphs, bar charts, and heatmaps, as well as features such as dashboards and alerts. Together, these three tools form a powerful platform for managing and analyzing logs and other types of data, commonly referred to as the ELK stack or Elastic stack. The ELK stack is widely used in IT operations, security monitoring, and business analytics to gain insights from large amounts of data. Ingesting SIEM Data to ELK Ingesting SIEM data into the ELK stack can be useful for organizations that want to combine the security event management capabilities of SIEM with the log management and analysis features of ELK. Here are the high-level steps to ingest SIEM data into ELK: Configure the SIEM to send log data to Logstash, which is part of the ELK stack. Create a Logstash configuration file that defines the input, filters, and output for the SIEM data. Start Logstash and verify that it is receiving and processing SIEM data correctly. Configure Elasticsearch to receive and store the SIEM data. Create Kibana visualizations and dashboards to display the SIEM data. Here is an example of a Logstash configuration file that receives Syslog messages from a SIEM and sends them to Elasticsearch: Python input { syslog { type => "syslog" port => 5514 } } filter { if [type] == "syslog" { grok { match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] } } } output { elasticsearch { hosts => ["localhost:9200"] index => "siem" } } Once Logstash is configured and running, SIEM data will be ingested into Elasticsearch and can be visualized and analyzed in Kibana. It's important to ensure that the appropriate security measures are in place to protect the SIEM and ELK environments, and to monitor and alert on any security events. Detecting Host Hack Attempt Detecting host hack attempts using SIEM in ELK involves monitoring and analyzing system logs and network traffic to identify suspicious activity that may indicate a hack attempt. Here are the high-level steps to set up host hack attempt detection using SIEM in ELK: Configure the hosts to send system logs and network traffic to a centralized log collection system. Set up Logstash to receive and parse the logs and network traffic data from the hosts. Configure Elasticsearch to store the parsed log data. Use Kibana to analyze the log data and create dashboards and alerts to identify potential hack attempts. Here are some specific techniques that can be used to detect host hack attempts: Monitor for failed login attempts: Look for repeated failed login attempts from a single IP address, which may indicate a brute-force attack. Use Logstash to parse the system logs for failed login events and create a Kibana dashboard or alert to monitor for excessive failed login attempts. Monitor for suspicious network traffic: Look for network traffic to or from known malicious IP addresses or domains. Use Logstash to parse network traffic data and create a Kibana dashboard or alert to monitor for suspicious traffic patterns. Monitor for file system changes: Look for unauthorized changes to system files or settings. Use Logstash to parse file system change events and create a Kibana dashboard or alert to monitor for unauthorized changes. Monitor for suspicious process activity: Look for processes that are running with elevated privileges or that are performing unusual actions. Use Logstash to parse process events and create a Kibana dashboard or alert to monitor for suspicious process activity. By implementing these techniques and regularly monitoring the logs and network traffic, organizations can improve their ability to detect and respond to host hack attempts using SIEM in ELK. Configure Alert in ELK to Detect Host Hack Attempt To configure an alert in ELK to detect a host hack attempt, you can follow these general steps: Create a search query in Kibana that filters logs for Host Hack Attempt events. For example, you can use the following search query to detect failed login attempts: Python from elasticsearch import Elasticsearch es = Elasticsearch() search_query = { "query": { "bool": { "must": [ { "match": { "event.dataset": "auth" } }, { "match": { "event.action": "failed_login" } } ] } } } res = es.search(index="siem", body=search_query) for hit in res['hits']['hits']: print(hit['_source']) Once you have created your search query, save it as a Kibana saved search. Go to the Kibana Alerts and Actions interface and create a new alert. Choose the saved search you created in step 2 as the basis for the alert. Configure the alert to trigger when a certain threshold is met. For example, you can configure the alert to trigger when there are more than 5 failed login attempts within a 5-minute window. Configure the alert to send a notification, such as an email or Slack message, when it triggers. Test the alert to ensure that it is working as expected. Once the alert is configured, it will automatically trigger when it detects a Host Hack Attempt event, such as a failed login attempt. This can help organizations detect and respond to security threats efficiently and effectively. It is important to regularly review and update your alerts to ensure they are detecting the most relevant and important security events. Conclusion Using ELK to detect host hack attempts is an effective approach to enhance the security posture of an organization. ELK provides a powerful combination of log collection, parsing, storage, analysis, and alerting capabilities, which enable organizations to detect and respond to host hack attempts in real-time. By monitoring system logs and network traffic, and using advanced search queries and alerting mechanisms, ELK can help organizations detect a wide range of host hack attempts, including failed login attempts, suspicious network traffic, file system changes, and suspicious process activity. Implementing a robust host hack attempt detection strategy using ELK requires careful planning, configuration, and testing. However, with the right expertise and tools, organizations can create a comprehensive security monitoring system that provides real-time visibility into their network, improves incident response times, and helps prevent security breaches before they occur.
To brief you about me, I lead the Big Data team at NIO, an electric vehicle manufacturer. I have tried a fair share of the OLAP tools available on the market and here is what I think you need to know. Apache Druid Back in 2017, looking for an OLAP tool on the market was like seeking a tree on an African prairie—there were only a few of them. As we looked up and scanned the horizon, our eyes linger on Apache Druid and Apache Kylin. We landed on Druid because we were already been familiar with it, while Kylin, despite its impressively high query efficiency in pre-computation, had a few shortcomings: The best storage engine for Kylin would be HBase, but introducing HBase would bring in a whole new bunch of operation and maintenance burdens. Kylin pre-computes the dimensions and metrics, but the dimensional explosion coming along puts great pressure on storage. As for Druid, it used columnar storage, supported real-time and offline data ingestion, and delivered fast queries. On the flip side, it: Uses no standard protocols, such as JDBC, and thus was beginner-unfriendly. Had weak support for Join. Could be slow in exact deduplication and thus lowered down performance. Required huge maintenance efforts due to all the components with various installation methods and dependencies. Required changes in Hadoop integration and dependency of JAR packages when it came to data ingestion. TiDB We tried TiDB in 2019. Long story short, here are its pros and cons: Pros It was an OLTP + OLAP database that supported easy updates. It had the features we needed, including aggregate and breakdown queries, metric computation, and dashboarding. It supported standard SQL, so it was easy to grasp. It didn’t require too much maintenance. Cons The fact that TiFlash relied on OLTP could put more pressure on storage. As a non-independent OLAP, its analytical processing capability was less than ideal. Its performance varied among scenarios. ClickHouse vs. Apache Doris We did our research into ClickHouse and Apache Doris. We were impressed by ClickHouse’s awesome standalone performance, but stopped looking further into it when we found that: It did not give us what we wanted when it came to multi-table Join, which was kind of an important usage for us. It had relatively low concurrency. It could bring high operation and maintenance costs. Apache Doris, on the other hand, ticked a lot of the boxes on our requirement list: It supported high-concurrency queries, which was our biggest concern. It was capable of real-time and offline data processing. It supported aggregate and breakdown queries. Its unique model (a type of data model in Doris that ensured unique keys) supported updates. It could largely speed up queries via materialized view. It was compatible with MySQL protocol, so there was little trouble in development and adoption. Its query performance fills the bill. It only required simple O and M. To sum up, Apache Doris appeared to be an ideal substitute for Apache Druid + TiDB. Our Hands-On OLAP Experience Here is a diagram to show you how data flows through our OLAP system: Data Sources We pool data from our business system, event tracking, devices, and vehicles into our big data platform. Data Import We enable CDC for our business data. Any changes in such data will be converted into a data stream and stored in Kafka, ready for stream computing. As for data that can only be imported in batches, it will go directly into our distributed storage. Data Processing Instead of integrating streaming and batch processing, we adopted Lambda architecture. Our business status quo determines that our real-time and offline data come from different links. In particular: Some data comes in the form of streams. Some data can be stored in streams, while some historical data will not be stored in Kafka. Some scenarios require high data precision. To realize that, we have an offline pipeline that re-computes and refreshes all relevant data. Data Warehouse Instead of using the Flink/Spark-Doris Connector, we use the routine load method to transfer data from Flink to Doris, and broker load from Spark to Doris. Data produced in batches by Flink and Spark will be backed up to Hive for usage in other scenarios. This is our way to increase data efficiency. Data Services In terms of data services, we enable auto-generation of APIs through data source registration and flexible configuration so we can manage traffic and authority via APIs. In combination with the K8s serverless solution, the whole thing works great. Data Application In the data application layer, we have two types of scenarios: User-facing scenarios such as dashboards and metrics. Vehicle-oriented scenarios, where vehicle data is collected into Apache Doris for further processing. Even after aggregation, we still have a data size measured in billion but the overall computing performance is up to scratch. Our CDP Practice Like most companies, we build our own Customer Data Platform (CDP): Usually, a CDP is made up of a few modules: Tags: the building block, obviously. We have basic tags and customer behavior tags. We can also define other tags as we want. Groups: divide customers into groups based on the tags. Insights: characteristics of each customer group. Reach: ways to reach customers, including text messages, phone calls, APP notifications, and IM. Effect analysis: feedback about how the CDP runs. We wanted to achieve real-time + offline integration, fast grouping, quick aggregation, multi-table Join, and federated queries in our CDP. Here is how it is done: Real-Time + Offline We have real-time tags and offline tags and need them to be placed together. Plus, columns on the same data might be updated at different frequencies. Some basic tags (regarding the identity of customers) should be updated in real time, while other tags (age, gender) can be updated daily. We want to put all the atomic tags of customers in one table because that brings the least maintenance costs and can largely reduce the number of required tables when we add self-defined tags. So how do we achieve this? We use the routineload method of Apache Doris to update real-time data, and the broker load method to batch import offline data. We also use these two methods to update different columns in the same table, respectively. Fast Grouping Basically, grouping is to combine a certain group of tags and find the overlapping data. This can be complicated. Doris helped speed up this process by SIMD optimization. Quick Aggregation We need to update all the tags, re-compute the distribution of customer groups, and analyze effects on a daily basis. Such processing needs to be quick and neat. So we divide data into tablets based on time so there will be less data transfer and faster computation. When calculating the distribution of customer groups, we pre-aggregate data at each node and collect them for further aggregation. In addition, the vectorized execution engine of Doris is a real performance accelerator. Multi-Table Join Since our basic data is stored in multiple data tables, when CDP users customize the tags they need, they need to conduct multi-table Join. An important factor that attracted us to Apache Doris was its promising multi-table Join capability. Federated Queries Currently, we use Apache Doris in combination with TiDB. Records about customer reach will be put in TiDB, and data regarding credit points and vouchers will be processed in TiDB, too, since it is a better OLTP tool. As for more complicated analysis, such as monitoring the effectiveness of customer operation, we need to integrate information about task execution and target groups. This is when we conduct federated queries across Doris and TiDB. Conclusion This is our journey from Apache Druid, TiDB, and Apache Doris (and a short peek into ClickHouse in the middle). We looked into the performance, SQL semantics, system compatibility, and maintenance costs of each of them and ended up with the OLAP architecture we have now. If you have the same aspects of concern as us, this might be a reference for you.
In a microservices architecture, it’s common to have multiple services that need access to sensitive information, such as API keys, passwords, or certificates. Storing this sensitive information in code or configuration files is not secure because it’s easy for attackers to gain access to this information if they can access your source code or configuration files. To protect sensitive information, microservices often use a secrets management system, such as Amazon Secrets Manager, to securely store and manage this information. Secrets management systems provide a secure and centralized way to store and manage secrets, and they typically provide features such as encryption, access control, and auditing. Amazon Secrets Manager is a fully managed service that makes it easy to store and retrieve secrets, such as database credentials, API keys, and other sensitive information. It provides a secure and scalable way to store secrets, and integrates with other AWS services to enable secure access to these secrets from your applications and services. Some benefits of using Amazon Secrets Manager in your microservices include: Centralized management: You can store all your secrets in a central location, which makes it easier to manage and rotate them. Fine-grained access control: You can control who has access to your secrets, and use AWS Identity and Access Management (IAM) policies to grant or revoke access as needed. Automatic rotation: You can configure Amazon Secrets Manager to automatically rotate your secrets on a schedule, which reduces the risk of compromised secrets. Integration with other AWS services: You can use Amazon Secrets Manager to securely access secrets from other AWS services, such as Amazon RDS or AWS Lambda. Overall, using a secrets management system, like Amazon Secrets Manager, can help improve the security of your microservices by reducing the risk of sensitive information being exposed or compromised. In this article, we will discuss how you can define a secret in Amazon Secrets Manager and later pull it using the Spring Boot microservice. Creating the Secret To create a new secret in Amazon Secrets Manager, you can follow these steps: Open the Amazon Secrets Manager console by navigating to the “AWS Management Console,” selecting “Secrets Manager” from the list of services, and then clicking “Create secret” on the main page. Choose the type of secret you want to create: You can choose between “Credentials for RDS database” or “Other type of secrets.” If you select “Other type of secrets,” you will need to enter a custom name for your secret. Enter the secret details: The information you need to enter will depend on the type of secret you are creating. For example, if you are creating a database credential, you will need to enter the username and password for the database. Configure the encryption settings: By default, Amazon Secrets Manager uses AWS KMS to encrypt your secrets. You can choose to use the default KMS key or select a custom key. Define the secret permissions: You can define who can access the secret by adding one or more AWS Identity and Access Management (IAM) policies. Review and create the secret: Once you have entered all the required information, review your settings and click “Create secret” to create the secret. Alternatively, you can also create secrets programmatically using AWS SDK or CLI. Here’s an example of how you can create a new secret using the AWS CLI: Shell aws secretsmanager create-secret --name my-secret --secret-string '{"username": "myuser", "password": "mypassword"}' This command creates a new secret called “my-secret” with a JSON-formatted secret string containing a username and password. You can replace the secret string with any other JSON-formatted data you want to store as a secret. You can also create these secrets from your microservice as well: Add the AWS SDK for Java dependency to your project: You can do this by adding the following dependency to your pom.xml file: XML <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-secretsmanager</artifactId> <version>1.12.83</version> </dependency> Initialize the AWS Secrets Manager client: You can do this by adding the following code to your Spring Boot application’s configuration class: Java @Configuration public class AwsConfig { @Value("${aws.region}") private String awsRegion; @Bean public AWSSecretsManager awsSecretsManager() { return AWSSecretsManagerClientBuilder.standard() .withRegion(awsRegion) .build(); } } This code creates a new bean for the AWS Secrets Manager client and injects the AWS region from the application.properties file. Create a new secret: You can do this by adding the following code to your Spring Boot service class: Java @Autowired private AWSSecretsManager awsSecretsManager; public void createSecret(String secretName, String secretValue) { CreateSecretRequest request = new CreateSecretRequest() .withName(secretName) .withSecretString(secretValue); CreateSecretResult result = awsSecretsManager.createSecret(request); String arn = result.getARN(); System.out.println("Created secret with ARN: " + arn); } This code creates a new secret with the specified name and value. It uses the CreateSecretRequest class to specify the name and value of the secret and then calls the createSecret method of the AWS Secrets Manager client to create the secret. The method returns a CreateSecretResult object, which contains the ARN (Amazon Resource Name) of the newly created secret. These are just some basic steps to create secrets in Amazon Secrets Manager. Depending on your use case and requirements, there may be additional configuration or setup needed. Pulling the Secret Using Microservices Here are the complete steps for pulling a secret from the Amazon Secrets Manager using Spring Boot: First, you need to add the following dependencies to your Spring Boot project: XML <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-secretsmanager</artifactId> <version>1.12.37</version> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-core</artifactId> <version>1.12.37</version> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-aws</artifactId> <version>2.3.2.RELEASE</version> </dependency> Next, you need to configure the AWS credentials and region in your application.yml file: YAML aws: accessKey: <your-access-key> secretKey: <your-secret-key> region: <your-region> Create a configuration class for pulling the secret: Java import org.springframework.beans.factory.annotation.Autowired; import org.springframework.cloud.aws.secretsmanager.AwsSecretsManagerPropertySource; import org.springframework.context.annotation.Configuration; import com.amazonaws.services.secretsmanager.AWSSecretsManager; import com.amazonaws.services.secretsmanager.AWSSecretsManagerClientBuilder; import com.amazonaws.services.secretsmanager.model.GetSecretValueRequest; import com.amazonaws.services.secretsmanager.model.GetSecretValueResult; import com.fasterxml.jackson.databind.ObjectMapper; @Configuration public class SecretsManagerPullConfig { @Autowired private AwsSecretsManagerPropertySource awsSecretsManagerPropertySource; public <T> T getSecret(String secretName, Class<T> valueType) throws Exception { AWSSecretsManager client = AWSSecretsManagerClientBuilder.defaultClient(); String secretId = awsSecretsManagerPropertySource.getProperty(secretName); GetSecretValueRequest getSecretValueRequest = new GetSecretValueRequest() .withSecretId(secretId); GetSecretValueResult getSecretValueResult = client.getSecretValue(getSecretValueRequest); String secretString = getSecretValueResult.getSecretString(); ObjectMapper objectMapper = new ObjectMapper(); return objectMapper.readValue(secretString, valueType); } } In your Spring Boot service, you can inject the SecretsManagerPullConfig class and call the getSecret method to retrieve the secret: Java import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; @Service public class MyService { @Autowired private SecretsManagerPullConfig secretsManagerPullConfig; public void myMethod() throws Exception { MySecrets mySecrets = secretsManagerPullConfig.getSecret("mySecrets", MySecrets.class); System.out.println(mySecrets.getUsername()); System.out.println(mySecrets.getPassword()); } } In the above example, MySecrets is a Java class that represents the structure of the secret in the Amazon Secrets Manager. The getSecret method returns an instance of MySecrets that contains the values of the secret. Note: The above code assumes the Spring Boot application is running on an EC2 instance with an IAM role that has permission to read the secret from the Amazon Secrets Manager. If you are running the application locally or on a different environment, you will need to provide AWS credentials with the necessary permissions to read the secret. Conclusion Amazon Secrets Manager is a secure and convenient way to store and manage secrets such as API keys, database credentials, and other sensitive information in the cloud. By using Amazon Secrets Manager, you can avoid hardcoding secrets in your Spring Boot application and, instead, retrieve them securely at runtime. This reduces the risk of exposing sensitive data in your code and makes it easier to manage secrets across different environments. Integrating Amazon Secrets Manager with Spring Boot is a straightforward process thanks to AWS SDK for Java. With just a few lines of code, you can create and retrieve secrets from Amazon Secrets Manager in your Spring Boot application. This allows you to build more secure and scalable applications that can be easily deployed to the cloud. Overall, Amazon Secrets Manager is a powerful tool that can help you manage your application secrets in a more secure and efficient way. By integrating it with Spring Boot, you can take advantage of its features and benefits without compromising on the performance or functionality of your application.
MicrostarterCLI is a rapid development tool. It helps you as a developer generate standard reusable codes, configurations, or patterns you need in your application. In a previous article, I went through a basic example of creating REST and GraphQL endpoints in a Micronaut application. This article demonstrates an example of bootstrapping a Micronaut microservices application using MicrostarterCLI. The application's architecture consists of the following: Fruit Service: A simple CRUD service for Fruit objects. Vegetable Service: A simple CRUD service for Vegetable objects. Eureka Service Discovery Server Consul Configuration Server. Spring Cloud Gateway. Set Up the Environment Download the MicrostarterCLI 2.5.0 or higher zip file. Then, unzip it and configure the folder in the environment variables. Then, you will be able to access the mc.jar, mc.bat, and mc from any folder in the operating system. To verify the configuration, check the Microstarter CLI by running the below command from the command prompt: PowerShell mc --version My environment details are as follows: Operating System Windows 11 Java Version 11 IDE IntelliJ Let's Start Development Step 0: Create a Workspace Directory Create a folder in your system in which you will save the projects. This step is optional, but it's helpful to organize the work. In this article, the workspace is c:\workspace. Step 1: Create Fruit Service The FruitService is a simple service with all CRUD operations to handle the Fruits objects. For simplicity, we will use the H2 database. We will use the MicrostarterCLI to generate all CRUD operation code and configuration as following steps: First, generate a Micronaut application from the Micronaut Launch and extract the project zip file in the c:\workspace folder. Alternatively, we can use the init Command of the MicrostarterCLI to generate the project directly from the Micronaut launch as follows: PowerShell mc init --name FruitService --package io.hashimati After running the init command, go to the FruitService directory. Then, run the entity command to add the required dependencies, configure the service, and generate the necessary CRUD services code. PowerShell cd fruit mc entity -e Fruit Once you run the command, the MicrostarterCLI will start with the configuration question.Answer the configuration question as follows. Option` Answer Description Is the application monolithic? no To specify if the service is monolithic or microservice Enter the server port number between 0 - 65535 -1 To set the server port. This service will use a random port number to run the services. Enter the service id: fruit-service To set the service ID. Select Reactive Framework reactor To user reactor framework in case the developer wants to user reactive data access. Do you want to use Lombok? yes To use the Lombok library to generate the entity class. Select Annotation: Micronaut Micronaut enables the developer to use Micronaut, JAX-RS, or Spring. This step instructs the MicrostarterCLI to use Micronaut annotation in generating the code. Enter the database name: fruits to set the database name. Select Database Type: H2 to specify the database management engine. Select Database Backend: JDBC To specify the database access Select Data Migration: liquibase To use liquibase as a data migration tool to create the database schema. Select Messaging type: none Do you want to add cache-caffeine support? yes To use cache Do you want to add Micrometer feature? yes To collect metrics on the endpoints and service calls. Select Distributed Tracing: Jaeger To use Jaeger for distributed tracing. Do you want to add GraphQL-Java-Tools support? yes To add GraphQL dependency to the project. Do you want to add GRPC support? no If the answer is "yes," MicrostarterCli will prepare the project to be ready for the GRPC services. In this example, will not user GRPC, Do you want to use File Services? no This article will not use storage services. Do you want to configure VIEWS? yes To confirm if you want to add a "view" dependency. Select the views configuration views-thymeleaf To add Thymeleaf dependency. Once you complete the configuration, the MicrostarterCLI will ask you to enter the collection's name or the table. Then, it will prompt you to enter the attribute. Enter the attributes as follows: Attribute Type validation FindBy()Method FindAllBy() method UpdateBy() Method Name String - Yes Yes No quantity Int - No Yes No By the end of the step, the MicrostarterCLI will generate the classes of the entity, service, controller, client, test controllers, liquibase xml files, and configurations. Step 2: Create Vegetable Service The vegetable service will host the CRUD service of the Vegetable objects. To define it, we will repeat the same steps of Step 1 as follows: Step 3: Create Eureka Server In this step, we will create the Eureka service discovery server. The service will be listening on port 8761. The Fruit and Vegetable services will register in the Eureka server as we point their Eureka client to localhost and port 8761. To create the Eureka server project using MicorstarterCLI, run the below command from the c:\workspace Shell mc eureka -version 2.7.8 --javaVersion 11 Step 4: Create a Gateway Service The last component we will create is Spring Cloud Gateway. The gateway service will listen on port 8080. To generate the Gateway project using the MicrostarterCLI, run the gateway command below: Shell mc gateway -version 2.7.8 --javaVersion 11 Step 5: Gateway/Microservice Routes Configurations In this step, we will configure the root routes for both Fruit and Vegetable APIs. As generated by the MIcrostarterCLI, the root route for the Fruit API, as in @Controller annotation of io.hashimati.controllers.FruitController a class of the Fruit Service. The Vegetable API's root route is /api/v1/vegetable as in the io.hashimati.controllers.VegetableController class of the Vegetable Service. To register the routes, we will use the register subcommand of the gateway command. To run the command go to c:\workspace\gateway and run the below command: mc gateway register When the command run, the MicrostarterCLI will prompt you to enter the Service ID, Service Name, and the routes. Run the register subcommand twice to configure the root routes for the Fruit and the Vegetable APIs. You configured the CRUD endpoints for the Fruit and Vegetable server from the Gateway by completing this step. Run and Try To run the service, we will create a run.bat file to launch all the services as below: cd c:\workspace\eureka\ start gradlew bootRun -x test& cd c:\workspace\FruitService start gradlew run -x test& cd c:\workspace\VegetableService start gradlew run -x test& cd c:\workspace\gateway start gradlew bootRun -x test& After running the run.bat file, all services start and run. wait until all the services complete their start-up process. Open this URL. You should see all services registered on the Eureka server. To try the CRUD services, you can use the .http file of IntelliJ. We will create test.http file as follows: HTTP POST http://localhost:8080/api/v1/fruit/save Content-Type: application/json { "name": "Apple", "quantity": 100 } ### GET http://localhost:8080/api/v1/fruit/findAll ### POST http://localhost:8080/api/v1/vegetable/save Content-Type: application/json { "name": "Onion", "quantity": 100 } ### GET http://localhost:8080/api/v1/vegetable/findAll By running from IntelliJ, It works! Conclusion Using MicostarterCLI, you can generate the configuration of the needed component for Microservice Archtication like JWT security, distributed tracing, messaging, and observability. MicrostarterCLI also supports Groovy and Kotlin programming languages. Please visit the project repository for more information. Check out the article example here. Happy coding!
This article explains what Eclipse JKube Remote Development is and how it helps developers build Kubernetes-native applications with Quarkus. Introduction As mentioned in my previous article, microservices don’t exist in a vacuum. They typically communicate with other services, such as databases, message brokers, or other microservices. Because of this distributed nature, developers often struggle to develop (and test) individual microservices that are part of a larger system. The previous article examines some common inner-loop development cycle challenges and shows how Quarkus, combined with other technologies, can help solve some of the challenges. Eclipse JKube Remote Development was not one of the technologies mentioned because it did not exist when the article was written. Now that it does exist, it certainly deserves to be mentioned. What Is Eclipse JKube Remote Development? Eclipse JKube provides tools that help bring Java applications to Kubernetes and OpenShift. It is a collection of plugins and libraries for building container images and generating and deploying Kubernetes or OpenShift manifests. Eclipse JKube Remote Development is a preview feature first released as part of Eclipse JKube 1.10. This new feature is centered around Kubernetes, allowing developers the ability to run and debug Java applications from a local machine while connected to a Kubernetes cluster. It is logically similar to placing a local development machine inside a Kubernetes cluster. Requests from the cluster can flow into a local development machine, while outgoing requests can flow back onto the cluster. Remember this diagram from the first article using the Quarkus Superheroes? Figure 1: Local development environment logically inserted into a Kubernetes cluster. We previously used Skupper as a proxy to connect a Kubernetes cluster to a local machine. As part of the 1.10 release, Eclipse JKube removes the need to use Skupper or install any of its components on the Kubernetes cluster or your local machine. Eclipse JKube handles all the underlying communication to and from the Kubernetes cluster by mapping Kubernetes Service ports to and from the local machine. Eclipse JKube Remote Development and Quarkus The new Eclipse JKube Remote Development feature can make the Quarkus superheroes example very interesting. If we wanted to reproduce the scenario shown in Figure 1, all we’d have to do is re-configure the rest-fights application locally a little bit and then run it in Quarkus dev mode. First, deploy the Quarkus Superheroes to Kubernetes. Then, add the Eclipse JKube configuration into the <plugins> section in the rest-fights/pom.xml file: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>openshift-maven-plugin</artifactId> <version>1.11.0</version> <configuration> <remoteDevelopment> <localServices> <localService> <serviceName>rest-fights</serviceName> <port>8082</port> </localService> </localServices> <remoteServices> <remoteService> <hostname>rest-heroes</hostname> <port>80</port> <localPort>8083</localPort> </remoteService> <remoteService> <hostname>rest-villains</hostname> <port>80</port> <localPort>8084</localPort> </remoteService> <remoteService> <hostname>apicurio</hostname> <port>8080</port> <localPort>8086</localPort> </remoteService> <remoteService> <hostname>fights-kafka</hostname> <port>9092</port> </remoteService> <remoteService> <hostname>otel-collector</hostname> <port>4317</port> </remoteService> </remoteServices> </remoteDevelopment> </configuration> </plugin> Version 1.11.0 of the openshift-maven-plugin was the latest version as of the writing of this article. You may want to check if there is a newer version available. This configuration tells OpenShift (or Kubernetes) to proxy requests going to the OpenShift Service named rest-fights on port 8082 to the local machine on the same port. Additionally, it forwards the local machine ports 8083, 8084, 8086, 9092, and 4317 back to the OpenShift cluster and binds them to various OpenShift Services. The code listing above uses the JKube OpenShift Maven Plugin. If you are using other Kubernetes variants, you could use the JKube Kubernetes Maven Plugin with the same configuration. If you are using Gradle, there is also a JKube OpenShift Gradle Plugin and JKube Kubernetes Gradle Plugin available. Now that the configuration is in place, you need to open two terminals in the rest-fights directory. In the first terminal, run ./mvnw oc:remote-dev to start the remote dev proxy service. Once that starts, move to the second terminal and run: Shell ./mvnw quarkus:dev \ -Dkafka.bootstrap.servers=PLAINTEXT://localhost:9092 \ -Dmp.messaging.connector.smallrye-kafka.apicurio.registry.url=http://localhost:8086 This command starts up a local instance of the rest-fights application in Quarkus dev mode. Requests from the cluster will come into your local machine. The local application will connect to other services back on the cluster, such as the rest-villains and rest-heroes applications, the Kafka broker, the Apicurio Registry instance, and the OpenTelemetry collector. With this configuration, Quarkus Dev Services will spin up a local MongoDB instance for the locally-running application, illustrating how you could combine local services with other services available on the remote cluster. You can do live code changes to the local application while requests flow through the Kubernetes cluster, down to your local machine, and back to the cluster. You could even enable continuous testing while you make local changes to ensure your changes do not break anything. The main difference between Quarkus Remote Development and Eclipse JKube Remote Development is that, with Quarkus Remote Development, the application is running in the remote Kubernetes cluster. Local changes are synchronized between the local machine and the remote environment. With JKube Remote Development, the application runs on the local machine, and traffic flows from the cluster into the local machine and back out to the cluster. Wrap-Up As you can see, Eclipse JKube Remote Development compliments the Quarkus Developer Joy story quite well. It allows you to easily combine the power of Quarkus with Kubernetes to help create a better developer experience, whether local, distributed, or somewhere in between.
The availability of various tools in the market has often kept you thinking about which tool is appropriate for testing the web application. It is important to test the web application to ensure that it functions as per the user’s requirement and gives a high-end user experience. End-to-end testing is an approach that is designed to ensure the functionality of the applications by automating the browsers to run the scenario of particular actions made by end users. To accomplish this, Cypress and Puppeteer have commonly used tools, and their detailed comparison is the main focus of the blog. The use of Cypress has increased in the recent year for web automation testing addressing issues faced by modern web applications. Now, Puppeteer is also widely accepted for web automation testing. This triggered debate on Cypress vs. Puppeteer. To have a piece of good information on the testing tools and Cypress vs. Puppeteer’s detailed comparison is crucial. Let’s get started with discussing the overview of Cypress and Puppeteer What Is Cypress? Cypress is an open-source automation testing tool based on JavaScript solution, mainly used for modern web automation. The front-end testing framework helps us write the test cases in de-factor web language for the web application. It gives the option for testing with respect to unit tests and integration tests, including significance like easy reporting, test configuration, and many more. It also supports the Mocha test framework. The working of Cypress is different from other testing tools. For example, when you need to run a script inside the browser, it is mainly executed in the same loop as that of your application. However, when its execution needs to be done outside the browser for the same scripts, it leverages the Node.js server to support it. Features of Cypress Some of the exciting features of Cypress that you should know are as follows: · It takes snips of snapshots during the running of the tests. · It allows real-time and fast debugging by using tools like Developer Tools. · It has automated waiting, so you do not have to add waits or sleep to the running tests. · You can verify and manage the behavior of the functions, timers, and server response. · It effortlessly controls, test, and stub the edge cases without involving the servers. What Is Puppeteer? It is an open-source node js library-based framework used for automation testing and web scraping tools. It gives high-level API to control Chromium and Chrome, which runs headless by default. Puppeteer is very easy to use by the testers as it is based on the DevTools Protocol, which is similar to the one used by the Chrome Developer Tools. You need to be aware of the Chrome Developer Tools to ensure running with Puppeteer quickly. Cypress vs. Puppeteer The comparison between the Cypress vs. Puppeteer is made below based on highlighting aspects that will help you get a clear picture. Brief A puppeteer is a tool which is developed by Google which works for automating Chrome with the use of DevTool protocol. However, Cypress is developed by Semiconductors which is a test runner-open source. The main difference between Puppeteer and Cypress is based on their work. Puppeteer is basically allowing browser automation based on node library, whereas Cypress is purely a test automation framework allowing End-to-End Testing, Integration Testing, and Unit Testing. It could better be understood that Puppeteer is not a framework but just a chromium version of the node version, which provides browser automation for Chrome and Chromium. The running is executed headless by default which can be further configured to run full Chromium or Chrome. In addition, Puppeteer is such a tool that provides a high level of API for controlling Chrome and Chromium over the DevTool protocol. Relating this to Cypress, it is mainly a front-end testing tool that is built for the modern web. Lastly, Puppeteer is free to use, whereas Cypress comes with both free and paid versions. Language With respect to the testing in the programming language, both Cypress and Puppeteer are based on JavaScript language. This gives you an easy option to work on both tools. Types of Testing Comparing the testing done by Cypress and Puppeteer, Cypress gives you wider options. For example, if you are looking the testing an entire application, Puppeteer cannot be the best option. It is basically great for web scrapping and crawling SPA. However, Cypress is a tool through which you can do End-to-end tests, Unit tests, and integration tests, and it can test anything that runs in a browser. Uses Puppeteer is mainly used for automating UI testing, mouse and keyboard movement, and others. It basically tests the application developed in Angularjs and Angular. Like Cypress, it is not considered an automation tool but rather manages the internal aspects of the chromium browser. It is a development tool that is able to perform tasks by developers, like locating elements and handling requests and responses. Architecture Cypress and Puppeteer differ in their architecture. Generally, most of the testing tools work by running outside of the browser, which is executed remote commands across the network. Cypress-testing tools operate inside the browsers which execute the test codes. It allows Cypress to listen and verify the browser performance at run time by modifying DOM and altering network requests and responses on the fly. It does not require any of the driver binaries. It runs on a NodeJS server which associates with the test runner manipulated by Cypress to operate the application and test code which is another iframe in a similar event loop. The supported browser of Cypress includes Canary, Chromium, Microsoft Edge, Mozilla Firefox browsers, and electron. Relating to the Puppeteer architecture follows the DevTools protocol as mentioned above. It manages the Chromium and chrome browser with the aid of high-quality API given by the Node library. The browser platform executes the action on the browser engine with and without headless mode. Followed to this, all the test execution is done in Chromium which is a real place. Other browsers, like Microsoft edge, make use of Chromium as a browser engine. It is regarded as the package which is based on a node module and hence known as Nodejs level. With the use of JavaScript, the development of automation code is done by the end user. Testing Speed Comparing the testing speed of the Puppeteer and Cypress, Puppeteer is regarded to be much faster than Cypress. In using Cypress, the test scripts are executed in the browser, where you need to click on a particular button. This will not send the command to involve a specific driver to the browser but rather utilizes DOM events to send the click command to the button. However, Puppeteer has great control over the browser due to high-level API control over Chrome and Chromium. Further, it works with minimal settings, eliminates extras, and uses less space than Cypress, making them consume less memory and start faster. Cypress is slower when executing the run test in a larger application. The main reason is that it tends to take snips of the application state at a different point in time of the tests, which makes it take more time. However, such cases are not evident in the Puppeteer, which makes it faster than Cypress. Reliability and Flexibility Relating to the testing of the web application, Cypress can be more user-friendly and reliable in doing JavaScript framework for performing end-to-end testing as compared with Puppeteer. It is because Puppeteer is not a framework but just a chromium version of the node module. Nevertheless, Puppeteer can be a great option for quick testing; however, when we want to test the entire performance and functionality of application testing, it is better suggested to use a stronger tool like Cypress. The main reason is that Cypress has its individual assertion, but Puppeteer does not, and rather it is based on Mocha, Jasmine, or Jest frameworks. Further, Cypress has its individual IDE, and Puppeteer is dependent on the VS Code and Webstorm. In a nutshell, Puppeteer only supports Chromium engine-based browsers, whereas Cypress supports many different browsers, thus, making it more reliable and flexible. Testing Code Execution on the Client Side, Like the Web Browser Puppeteer and Cypress have aspects of the client side where they allow testing code execution on the client-like web browser. In Puppeteer, manual operation can be done in the browser, and it is easy to create a testing environment for the test to run directly. You have the option to test the front-end function and UI testing with the use of Puppeteer. Further, Cypress aims to test like anything that could be run in the browser and executed to build a high user experience. It tests the flow of the application from start to end according to the view of the user. It also works equally well on older servers for the pages and applications. Testing Behavior on the Server Side The major difference between Puppeteer and Cypress is related to the allowance of the testing behavior of the server-side code, whereas Puppeteer does not have such aspects. However, Cypress has the ability to test the back-end behavior, say, for example, with the use of cy.task() command. It gives the way to run the Node code. Through this, users can take actions crucial for the tests beyond the scope of Cypress. Test Recording Cypress comes with dashboards where you can be able to see the recorded tests and provide details on the events which happen during the execution. However, Puppeteer does not have such a dashboard, making it unable to record the test. Hence, transparency in the execution of the test is not maintained in Puppeteer. Fixtures Fixtures are the specific and fixed states of data that are test locals. This helps confirms a particular environment for a single test. Comparing the two, Cypress has the inbuilt fixtures abilities. With the use of the command cy.fixture(filePath), you can easily load a fixed set of data that is located in a file. However, Puppeteer does not have any such fixtures. Group Fixtures Group fixtures let to define particular and fixed states of data for a group of tests which helps ensures the environment for a given group of tests. For this also, Puppeteer does not have any such fixtures. At the same time, Cypress has the ability to create group fixtures with the use of the cy.fixture command. Conclusion The blog has presented detailed comparisons of the Puppeteer and Cypress, which gives you enough information for you to decide which tools will be best according to your test requirement. LambdaTest is a cloud-based automation testing platform with an online Cypress automation tool that writes simple Cypress automation testing and sees its actions. Using LambdaTest, you can also test your Puppeteer test scripts online. Both Cypress and Puppeteer come with their own advantages and limitations. It would be best if you decided which suits best to your tests.
Developers have many options to build a cloud environment using available tools, but today’s complex infrastructure involves numerous interconnected services and processes. To ensure smooth operation amidst daily changes in your production environment, it is crucial to use advanced tools to design and build an elastic cloud environment. These tools can simplify coding and prevent repetitive tasks for you and your team. Here are some tips that will simplify the code for you and prevent repetition from you and the rest of the team: Get to Know the Terraform Tool Terraform is a tool that enables developers to define the architecture of their infrastructure through a clear and readable file format known as HCL. This file format describes the elastic components of the infrastructure, such as VPCs, Security Groups, Load Balancers, and more. It provides a simple and concise way to define the topology of the development environment. Two Major Challenges When Designing Automated Cloud Environments The first challenge arises during the initial run, when there is little or no interdependence between the different resources. This is relatively easy to handle. The second challenge is more complex and occurs when new resources need to be developed, updated, or deleted in an existing cloud environment that is constantly changing. In a cloud environment, changes should be carefully planned and executed to minimize errors between teams and ensure a smooth and efficient implementation. To achieve this, simplification of the cloud environment is crucial to prevent code duplication among developers working on the same source code. How Terragrunt Solves This Problem: An In-Depth Look Terragrunt is a tool that enhances Terraform’s functionality by providing additional infrastructure management tools that help maintain a DRY (Don’t Repeat Yourself) code base, dressing up existing Terraform modules and managing the remote state of the cloud environment. An example where Terragrunt can be particularly useful is in a distributed cloud environment where multiple resources share the same values, such as subnets and security groups. Without proper management, developers may inadvertently duplicate these values, leading to errors and inconsistencies. Terragrunt helps prevent code duplication by allowing the parameters to be defined only once, ensuring there is no confusion about the shared parameters in the environment. To optimize performance, Terragrunt enforces specific development principles and restrictions on the organization of Terraform code: Terragrunt mandates the use of a hierarchical folder structure to maintain consistency and prevent errors in the Terraform code. Terragrunt promotes centralized file management for shared common variables, enabling the organization of code and changes in a single location. Optimizing Cloud Environments With Logical Organization Using Terragrunt Before using Terragrunt, it is essential to organize the logical environment by dividing it into smaller scopes through folders. This approach enables the reuse of resources and modules without the need for rewriting --> promoting efficiency and reducing the risk of errors. By organizing the workspace in this manner, Terragrunt allows for the import of entries from Terragrunt.hcl files located in other hierarchies within the cloud environment. This process avoids duplicating values required for different resources by using the “Include Block” or “Dependency Block” to import previously written values from other hierarchies in the environment. Efficient File Management in Shared Spaces for Improved Collaboration Terragrunt offers a powerful capability for sharing configuration files with ease. Much like Terraform, Terragrunt receives parameters to launch resources. However, unlike Terraform, Terragrunt allows to define these parameters at a higher level, so different resources within the environment can make use of the same parameters. For instance, if an environment is running in the us-east-1 region, defining the region value in the root directory allows any resource within the environment to inherit the value for its own use. This approach minimizes redundancy and ensures consistency throughout the environment. Terragrunt’s ability to define parameters at a higher level streamlines the configuration process and makes it easier to manage resources. For instance, consider this use case: YAML Infra VPC Terragrunt.hcl Backend Service-01 Terragrunt.hcl Service-02 Terragrunt.hcl Terragrunt.hcl Modules VPC Terragrunt.hcl Terragrunt.hcl Referring to the hierarchy, as shown above: Infra: This concept organizes our environment and its structure by arranging the infrastructure in a specific order. This order starts with the VPC and everything related to it, followed by the backend and its various service definitions, and so on. Modules: This concept connects us to a group of resources that we intend to utilize. For instance, if we have decided to use a VPC in our infrastructure, we would define its source artifactory and initialization parameters within the module scope. Similarly, if our backend includes a service like Kubernetes-dashboard, we would also define its source artifactory within the module scope, and so on. Terragrunt.hcl: This file serves as the Terragrunt configuration file, as previously explained. However, we are also using it to define common values for the environment. For instance, if Service-01 and Service-02 share some common parameters, then we would define these parameters at a higher level in the Terragrunt.hcl file, under the backend scope, which is located in the root folder of both services. Furthermore, we have created a Terragrunt.hcl file within the root directory. By doing so, we are consolidating common values that pertain to the all environment, and that other parts of the hierarchy do not need to be aware of. This approach allows us to propagate shared parameters downward in the hierarchy, enabling us to customize our environment without duplicating values. Conclusion Numerous organizations prioritize the aspect of quality as a fundamental objective that underpins the entire process. As such, it is crucial to ask the right questions in order to achieve this goal: How do we envision the design of our cloud environment, taking into account both the production and development environments? How can we model the environment in a way that minimizes the risk of errors in shared variables? How flexible is the environment when it comes to accommodating rapid changes? Based on our experience, we follow the following approach: Begin by outlining the problem on paper. Break down the problem into smaller, manageable modules. Develop each module separately, using elastic thinking that enables the module to be reused in additional use cases, either by ourselves or other developers. Abstract the implementation of the solution while avoiding code duplication, i.e., propagating shared variables for resources that have common values. Sharing our acquired knowledge with the community is a priority for us, and we are excited to offer our solutions to anyone interested in learning from them. You can easily access and gain insights into our methods, and even implement them yourself. As an additional resource, we have a Github repository that contains a multitude of examples for creating efficient and effective applications in a cloud environment using Terragrunt. You’re welcome to use this as a reference to help you develop your own solutions.
Bartłomiej Żyliński
Software Engineer,
SoftwareMill
Vishnu Vasudevan
Head of Product Engineering & Management,
Opsera
Abhishek Gupta
Principal Developer Advocate,
AWS
Yitaek Hwang
Software Engineer,
NYDIG