top of page

33 results found with an empty search

  • ELT VS ETL: ENTENDIENDO LAS DIFERENCIAS CLAVE EN LA GESTIÓN DE DATOS ELT vs ETL: ¿Cuál es la diferencia?

    En el mundo de datos, los procesos de extracción, transformación y carga (ETL) y extracción, carga y transformación (ELT) son dos de los métodos más comunes para preparar datos para el análisis. Ambos procesos tienen sus propias ventajas y desventajas, y el método que mejor se adapte a sus necesidades dependerá de los requisitos específicos que la organización requiera. Los términos ELT y ETL son fundamentales, pero a menudo se confunden. Aunque ambos se refieren a procesos de manipulación y transferencia de datos, las diferencias en sus enfoques pueden tener un impacto significativo en cómo las organizaciones almacenan, procesan y utilizan sus datos. En este blog, explicaremos las diferencias clave entre ELT y ETL. ¿Qué es ETL? ETL son las siglas de Extract, Transform, Load (Extraer, Transformar, Cargar). Este proceso ha sido durante mucho tiempo el enfoque estándar para la integración de datos. Se trata de: Extraer (extract) datos de diversas fuentes. Transformar (transform) estos datos, a menudo en un área de almacenamiento intermedio, aplicando una serie de reglas o funciones para convertirlos al formato deseado para su posterior análisis. Cargar (load) los datos transformados en un sistema de destino, como un data warehouse. Ventajas del proceso ETL Control y calidad de datos: Al transformar los datos antes de cargarlos, el ETL permite una limpieza y un control de calidad más exhaustivos. Esto puede ser importante para garantizar que los datos se transformen de manera correcta y consistente. Rendimiento: Al realizar la transformación antes de la carga, el ETL puede reducir la carga en el sistema de destino. El proceso ETL puede proporcionar un mejor rendimiento para cargas masivas de datos. Esto se debe a que las operaciones de transformación se pueden realizar en paralelo con las operaciones de carga. Seguridad: Minimiza los riesgos de seguridad al procesar los datos antes de cargarlos, lo que es crucial cuando se manejan datos sensibles. Desafíos del proceso ETL Flexibilidad: Puede ser menos adaptable a los cambios en las fuentes de datos o en los esquemas de datos. Esto se debe a que las operaciones de transformación se deben realizar antes de que los datos se carguen en el almacén de datos o el sistema de análisis. Velocidad: El proceso puede ser más lento, ya que los datos deben ser transformados antes de ser cargados. Mayor costo: El proceso ETL puede ser más costoso que el proceso ELT. Esto se debe a que se requiere más hardware y software para realizar las operaciones de transformación. ¿Qué es ELT? ELT, por otro lado, implica cargar datos directamente en el sistema de destino y luego transformarlos dentro de este sistema. Este enfoque aprovecha la potencia computacional de los sistemas de almacenamiento modernos y es eficaz para grandes conjuntos de datos, especialmente en entornos basados en la nube. Ventajas del proceso ELT Eficiencia y Escalabilidad: ELT es más eficiente en el manejo de grandes volúmenes de datos, ofreciendo una mayor escalabilidad y velocidad gracias al procesamiento en sistemas modernos de almacenamiento, como los basados en la nube. Flexibilidad: Ofrece una mayor adaptabilidad a diferentes tipos y formatos de datos, lo que es esencial en entornos donde los datos cambian rápidamente o provienen de diversas fuentes. Desafíos del proceso ELT Gestión de la Calidad de Datos: Puede presentar desafíos en la calidad de los datos, ya que la transformación ocurre después de la carga. Dependencia Tecnológica: Requiere sistemas de almacenamiento avanzados con alta capacidad de procesamiento. Diferencias Clave ETL sigue un enfoque más tradicional. En este proceso, los datos se extraen primero de sus fuentes originales. Luego, antes de ser cargados en el almacén de datos, se transforman en un sistema intermedio. Esta transformación puede incluir limpieza, normalización, agregación, y otras operaciones necesarias para garantizar que los datos sean coherentes y de alta calidad. Este método es particularmente valioso en entornos donde la calidad y la precisión de los datos son críticas, como en el sector financiero o en entornos regulados donde se requiere un alto grado de conformidad y seguridad de datos. ELT, por otro lado, representa un cambio en el paradigma impulsado por la tecnología moderna de almacenamiento en la nube. Aquí, los datos se extraen y se cargan directamente en el sistema de destino. La transformación ocurre dentro de este sistema, aprovechando su capacidad de procesamiento robusta. Este enfoque es ideal en escenarios donde se manejan grandes volúmenes de datos, como en big data y análisis en tiempo real, ya que permite una mayor velocidad y flexibilidad en el procesamiento y análisis de los datos. ¿Cuál es el mejor? El mejor método para usted dependerá de sus requisitos específicos. Si necesita un mayor control sobre las transformaciones que se realizan en los datos, o si necesita realizar transformaciones complejas o personalizadas, entonces el proceso ETL puede ser la mejor opción para usted. Sin embargo, si necesita simplificar el proceso, reducir el costo o mejorar la velocidad para cargas masivas de datos, entonces el proceso ELT puede ser una mejor opción. Ejemplos Prácticos ETL en la Industria de la Salud: Para un hospital que integra datos de pacientes de múltiples fuentes, ETL es esencial para garantizar la precisión y la privacidad de los datos antes de que se almacenen en un sistema centralizado. ETL en la Industria Financiera: Utilizado para integrar y transformar datos financieros, asegurando precisión y cumplimiento normativo. ELT en Análisis de Redes Sociales: Una empresa de marketing digital utiliza ELT para procesar y analizar rápidamente grandes volúmenes de datos de comportamiento de usuarios en redes sociales, lo que le permite identificar tendencias en tiempo real. Conclusión La elección entre ELT y ETL debe basarse en factores como el volumen de datos, los requisitos específicos de procesamiento y la infraestructura tecnológica disponible., no es simplemente una cuestión de preferencia, sino que depende de factores como la infraestructura tecnológica, el tipo y volumen de datos, y las necesidades específicas del negocio. Comprender estas diferencias y seleccionar el enfoque adecuado es crucial para maximizar la eficiencia y efectividad de la gestión de datos en su organización. Mientras que ETL se centra en la calidad y el control de los datos antes de la carga, ELT aprovecha la potencia de procesamiento de los sistemas modernos para acelerar la integración y transformación de grandes volúmenes de datos.

  • CLOUD SECURE AGENT INSTALLATION

    It is an application used for data processing. Cloud secure agents allow secure communication through the firewall between the Informatica Cloud and the Organization. HOW TO DOWNLOAD AND INSTALL CLOUD SECURE AGENT FOR WINDOWS Here are the steps to download the Cloud Secure Agent. 1. When you log in to the Informatica Cloud, you will see a window as the image below; select Administrator. 2. Once you select the Administrator option, from the left side menu choose Runtime Environments -> Download Secure Agent. 3. After clicking the Download Secure Agent button, you must select which type of operating system you will work with. In this case, we will install it on a Windows machine. Click on “Copy Install token” and paste it into a Notepad or any text editor; it will be used later. 4. Open the folder where the .exe file to install Secure Agent was saved. Right click and select Run as Administrator. 5. Click on Next. 6. Click Install 7. Once the Secure Agent has been installed, it will open a new window requesting the username and the installation token. Enter the username you used to access Informatica Cloud. Paste the installation token code and click on Register. 8. After clicking on Register, the Secure Agent will display a new window with the status; uploading all services takes a few minutes. 9. If you want to review if all services are running from Informatica Cloud, click on Administrator -> Runtime Environments -> Secure Agent Name (Machine) 10. In the Windows server, you can check if all services for Informatica Cloud are running. 11.If Administrator permissions and privileges are required, right-click on the Informatica Cloud in the Windows services and enter the username and password. 12.       Click Apply, and Ok. Restart the Secure Agent to apply the changes.

  • AI-Based Testing for Data Quality

    Role of AI in Data Quality Data quality is a crucial factor for any data-driven project, especially involving Machine Learning (ML) and Artificial Intelligence (AI). Data quality is referred to as the degree to which the data meets expectations. Poor data quality affects the performance, accuracy, and reliability of AI systems which can lead to inaccurate, unreliable & biased results of AI systems affecting the trustworthiness & value of AI systems. Traditional data quality practices are manual, time-consuming, and error-prone. They cannot handle increasing volume, variety, and velocity of data. Testing data quality is also a complex process. It involves aspects such as data validation, data cleaning, data profiling, etc. which require a lot of human effort and expertise. Therefore, testing data quality is a key challenge for data professionals. This is where AI can help us in testing data quality. Using AI and ML algorithms, it can automate and optimize various aspects of data quality assessment making the testing process smarter, faster, and more efficient. Problems that can be solved Some of the common problems that can be solved using AI-based testing for data quality are: Data validation Data validation is the process of checking whether the data conforms to the predefined rules, standards, and formats such as checking whether the data types, formats, ranges, and values are correct and consistent. AI-based testing can automate data validation by using ML models to learn the rules and patterns from the data and apply them to new or updated data. For example, an AI-based testing tool can automatically detect and flag missing values, duplicates, or invalid values in the data. Data profiling Data profiling is the process of analyzing the structure, content, and quality of the data. Data profiling helps us to understand the characteristics and behavior of the data, as well as identify potential issues or opportunities for improvement. For example, calculating the statistics, distributions, correlations, and dependencies of the data attributes. AI-based testing can automate data profiling by using ML models to extract and summarize relevant information from the data. For example, an AI-based testing tool can automatically generate descriptive statistics, visualizations, or reports on the data quality metrics. Data cleansing Data cleansing is the process of improving the quality of the data by removing or correcting errors, inconsistencies, anomalies, or duplicates in the data. Data cleansing helps us to enhance the accuracy, consistency, reliability, and completeness of the data. AI-based testing can automate data cleansing by using ML models to learn from existing or external data sources and apply appropriate transformations or corrections to the data. For example, an AI-based testing tool can automatically replace missing values based on predefined rules or learned patterns. Data Enrichment Data enrichment is the process of adding value to the data by augmenting or supplementing it with additional or relevant information from other sources. Data enrichment can help increase the richness, relevance, and usefulness of the data. For example, adding geolocation information based on postal codes or product recommendations based on purchase history. AI-based testing can automate data enrichment by using ML models to learn from existing or external data sources to generate or retrieve additional information for the data. For example, an AI-based testing tool can automatically add geolocation information based on postal codes by using a geocoding API or recommend products based on purchase history by using a collaborative filtering algorithm. Advantages of AI-based testing Some advantages of using AI for testing are: Automation AI can help in automating various tasks or processes related to data quality assessment or improvement. AI can help in validating, cleansing, profiling, or enriching the data by using ML models to learn from existing or external data sources and by applying appropriate actions or transformations. Optimization AI can help in optimizing various parameters or aspects related to data quality. AI can help in finding the optimal rules, formats, standards, or constraints by using ML models to learn from the existing or external data sources and apply the most suitable solutions for the data. This can improve the effectiveness, accuracy, and efficiency and enhance the quality of data. Insight AI can help in providing insights and feedback for data quality improvement. AI can help in generating descriptive statistics and visualizations to profile the structure, content, and quality of the data and provide insights on correlations, missing values, duplicates, etc. It can also help in identifying potential issues or scope for improvement in the data quality by providing recommendations for resolving or enhancing them. Drawbacks or Limitations of using AI Despite having its advantages, there are also some drawbacks or limitations that need to be considered. Some of the drawbacks are: Complexity Using AI requires a lot of technical knowledge to design, implement, and maintain the AI and ML models used for testing the data. It also requires a lot of computational resources and infrastructure to run and store the models and the data. Moreover, it may involve various issues such as privacy, security, accountability, and transparency for using AI and ML for testing. It can be a complex and challenging process that requires careful planning, execution, and management. Uncertainty The recommendations, assumptions, or predictions made by the AI and ML models may not always be accurate, reliable, or consistent in their outcomes. They may also not always be able to capture the dynamic or evolving nature of the data or the project requirements. Therefore, using AI for testing can bring some uncertainty or risk in the testing process that needs to be monitored and controlled. Dependency The quality, availability, and accessibility of the existing or external data sources used by the AI and ML models for learning plays a crucial role in testing. However, these data sources may not always be relevant, fair, or representative of the data or the project objectives. Moreover, they may not always be compatible or interoperable with the formats or standards used by the AI and ML models or the tools or platforms used for testing the data. Future of AI Testing Using AI for testing is a promising technique to overcome the challenges and limitations of traditional testing methods. It can automate and optimize various aspects of data quality by using AI and ML algorithms and applying appropriate actions or transformations to the data. It can also provide insights and feedback for data quality improvement by using descriptive statistics and visualizations. When it comes to testing the quality of data using AI, there are different methods and tools available. These include platforms that use AI to offer complete solutions and specific tools that use AI to address specific issues. Depending on the goals and requirements of the project, users can select the most appropriate approach or tool for their testing needs. The use of AI in testing presents a host of challenges and limitations that require careful implementation, evaluation, and maintenance of the AI and ML models. To ensure optimal performance, accuracy, reliability, and fairness, it is crucial to continually monitor and update these models. It should be noted, however, that AI cannot fully replace human judgment and intervention in guaranteeing data quality. Rather, it serves as a valuable tool to augment human efforts through automated assistance and guidance. AI-powered testing for data quality is a rapidly growing field with great potential for innovation. As technology continues to progress, so will the methods and tools for improving data quality through AI. The future of using AI for testing data quality is promising and full of possibilities.

  • Power BI vs Tableau: Who is the leader in 2023?

    Power BI (Microsoft) and Tableau (Salesforce) are both popular business intelligence (BI) tools used for data visualization and analysis. Every year, they are both positioned as leaders in the market by Gartner because of their significant adoption and widespread use across various industries. However, they have some differences in terms of features, functionality, and target user base. Here are some key distinctions between Power BI and Tableau: Ease of use: Power BI is generally considered to be more user-friendly, especially for beginners. It has a simpler interface, it’s easier to navigate and since it’s a Microsoft product, it integrates with many popular tools that are used in most companies, like Teams, Excel and PowerPoint. Tableau, on the other hand, has a steeper learning curve and can be more complex to use. Data connectors: Power BI has a wider range of data connectors and can connect to more data sources than Tableau. For example, it’s a lot easier to connect Power BI to Microsoft Dynamics 365 Business Central, a popular ERP software. On the other hand, since Tableau is part of the Salesforce group, it can access Salesforce data and reports more efficiently because it is not limited by the amount of data it can import from Salesforce. Power BI: Tableau: Pricing: Power BI has a lower entry-level price point, with a free version and a more affordable Pro version at 10$ per month per user or developer. Tableau, on the other hand, is more expensive and has a higher entry-level price point. The monthly Tableau subscription for developers is 70$ per month per developer and the license per viewer is 15$ per month per viewer. Power BI: Tableau: Integration: Both Power BI and Tableau offer integration capabilities with various data sources and other platforms. Power BI can be integrated and embedded in a wide range of applications, including Web Apps. It can also fully integrate with Microsoft Suite, such as Teams, PowerPoint, Excel and soon, Outlook. Tableau also allows users to embed dashboards on the Web and connect to a wide range of data sources, including databases, cloud storage platforms, spreadsheets, and web applications. Tableau and Salesforce seamlessly integrate with each other since Tableau is part of the Salesforce group. That enables productive data analytics that brings benefits to the users of the popular CRM. Tableau also offers many marketing and social media connectors that Power BI don’t, like Facebook Ads Power BI: Tableau: Customization: Tableau is generally considered to be more customizable and flexible than Power BI, because it has more visuals and advanced features for data analysis and visualization. However, Power BI allows you to download custom visuals created from the community of developers on the Power BI visuals marketplace. Power BI visuals marketplace: Power BI core visuals: Tableau: Collaboration: Both tools offer collaboration features, but now that Power BI has released its developer mode, it is considered to be more robust in terms of co-development, source control, Continuous Integration and Continuous Delivery (CI/CD). Ultimately, the choice between Power BI and Tableau depends on your specific needs, preferences and the other tools and software your company is already using. Power BI may be a better choice for businesses with limited budgets and less complex data analysis needs. Tableau may be a better choice for organizations with more complex data needs and a larger budget.

  • Data Observability and its Eminence

    As the importance of data takes more center stage, we have more and more businesses that claim to be data-driven. As companies increase their sources of data, their data storage, pipelines, and usage tend to grow at an enormous speed. With the growth of data, the chances of inaccuracy, errors, and data downtime grow as well. As we are much aware that the decision-making of a company springs from data and the unreliability of data is a pain point for every industry today. It is difficult to make decisions based on capricious data and hence eliminating instances of downtime, bad data, missing data, and the like is going to reach new heights by prioritizing data observability. What is Data Observability? For data engineers, the next crucial step to effectively manage any incident detection within their data pipelines is to establish data observability. In their organizations, data engineers devote half of their time to maintaining these pipelines due to frequent disruptions and breakdowns, which hinder them from effectively constructing data-driven products. This is where Data Observability comes into the picture. Data observability refers to an organization's comprehensive awareness of the well-being and condition of the data present in its systems. Ultimately, it all boils down to the ability to closely track and oversee a pipeline of data that is observed by someone. Let’s walk through the problems that data engineers face: Process quality Data quality or Data integrity Data lineage Process Quality First concern is if the data is moving, or the pipeline is operational. Speed in data processing could be core to the business. Data Integrity Once the functionality of the pipeline has been confirmed, the next step is to examine the activities occurring at the level of the data set. Imagine if data becomes vulnerable, misplaced, or corrupted. As an example, there may be a modification in the schema where we anticipate having 10 columns, but the new schema only has 9 columns. This could pose an issue as the data will have consequences for a downstream process that relies on the data set. Alternatively, if there are any modifications to the data, it will ultimately cause corruption of the subsequent data. Data Lineage This is about how things are connected to dependent pipelines and data sets downstream. The essence of data observability is captured in this statement! To put it simply, Data observability refers to the process of taking action to identify incidents in the original data source, data warehouse, or downstream at the product level. This allows the data engineers team to be promptly notified whenever there is a problem. The team would have the capability to rectify and proactively address the issue, thereby ensuring that it does not affect customers further down the line and, ultimately, avoid significant and expensive consequences for the business. The principles of data observability involve promptly identifying anomalies at their origin, resolving them quickly, understanding their exact location, and predicting their impact on subsequent individuals or processes. To proactively identify, resolve, and prevent irregularities in data, data observability tools utilize automated monitoring, root cause analysis, data lineage, and data health insights. Using this method leads to improved data pipelines, heightened team efficiency, strengthened data management strategies, and ultimately, increased customer contentment. Salient Features of Data Observability The purpose is to understand the essential changes in both organizational and technological perspectives to establish a data observability system that enables flexible data operations. To safeguard the practicality of data observability, it is vital to merge the following actions into its configuration. Monitoring A dashboard that allows a pragmatic viewpoint of your pipeline or system is referred to as monitoring. Alerting Notifications about predictable incidences and anomalies. Alerting permits you to detect complex conditions defined by a rule within the Logs, Infrastructure, Uptime, and APM apps. When a condition is met, the rule tracks it as an alert and responds by triggering one or more actions. Tracking Competence to establish and monitor specific occurrences. Comparison Observations made at different intervals will be compared and any abnormal alterations will be identified through alerts. Analysis Involuntary issue detection that regulates your pipeline and data state, referred to as analysis. Logging maintaining track of an occurrence using a standardized method to enable more rapid resolution. SLA tracking The characteristic of SLA Tracking involves measuring the cohesion of data quality and pipeline metadata to established standards. Data Observability - a future must-have The ability of data teams to be agile and make improvements to their products largely depends on their data observability. If there is no such system, a team's infrastructure or tools cannot be dependable as the identification of errors would take too long. If you do not invest in this important component of the DataOps framework, you will have reduced flexibility in creating new features and enhancements for your customers, resulting in a waste of money. Once Data observability is in place data teams will prevent time consumption in debugging and error fixing and there will be more businesses that will strive to be data driven.

  • Digital Marketers Juggling Act (Data, Solutions & More)

    In today's digital landscape, data has become the driving force behind successful marketing strategies. Digital marketers rely on a wealth of information about their ad campaigns, customers, and more to effectively target new and existing customers. Additionally, the need for advanced solutions and technology has become paramount to capitalize on this data-driven approach. In this blog post, we will delve into why digital marketers require more data and better solutions to achieve superior customer targeting and optimize their campaigns. Understanding Customers: In order to effectively target customers, digital marketers need comprehensive data to understand their target audience better. By gathering and analyzing data on customer demographics, behaviors, interests, and preferences, marketers can gain valuable insights into what motivates their customers. This enables them to tailor their campaigns with precision, delivering relevant and personalized messages that resonate with the audience. The more data available, the more refined and accurate the targeting becomes, leading to higher conversion rates and customer engagement. Refining Customer Segmentation: Data empowers digital marketers to segment their customer base into distinct groups based on various characteristics. With more data points at their disposal, marketers can create more granular and refined segments, resulting in more effective targeting. By identifying different customer personas and understanding their unique needs, preferences, and pain points, marketers can develop highly targeted campaigns that cater to each segment's specific requirements. The result is an increased likelihood of attracting and retaining customers within each segment. Optimizing Ad Campaigns: Accurate data is invaluable in optimizing digital advertising campaigns. By monitoring and analyzing campaign performance metrics such as click-through rates (CTRs), conversion rates, bounce rates, and return on ad spend (ROAS), marketers can gain insights into what works and what doesn't. Armed with this information, they can make data-driven decisions to refine their ad creatives, targeting parameters, and campaign strategies. This iterative process ensures that the marketing efforts are continuously optimized for maximum effectiveness and efficiency. Personalized Customer Experiences: Today's customers crave personalized experiences, and data plays a crucial role in delivering them. By collecting data on customer preferences, purchase history, browsing behavior, and interactions with the brand, digital marketers can create tailored experiences across various touchpoints. From personalized product recommendations to customized email campaigns and dynamic website content, data-driven personalization enhances customer satisfaction, engagement, and loyalty. Enhanced Customer Retention: Data-driven marketing goes beyond acquiring new customers; it also focuses on retaining existing ones. By analyzing customer data, marketers can identify patterns and signals that indicate potential churn or dissatisfaction. With this knowledge, they can implement targeted retention strategies such as personalized offers, loyalty programs, and proactive customer support. By leveraging data to proactively address customer needs and concerns, marketers can increase customer loyalty and lifetime value. Better Solutions and Technology: To harness the power of data effectively, digital marketers require robust solutions and technology. Advanced analytics tools, customer relationship management (CRM) systems, data management platforms (DMPs), and artificial intelligence (AI) technologies enable marketers to collect, analyze, and leverage data at scale. These solutions provide actionable insights, automate processes, and facilitate personalized interactions, empowering marketers to make data-driven decisions with agility and precision. Data has become an indispensable asset for digital marketers seeking to target new customers and optimize their campaigns. By collecting and analyzing data about their customers and ad campaigns, marketers gain valuable insights that inform their strategies and drive superior results. Additionally, access to better solutions and technology allows marketers to harness the full potential of data, delivering personalized experiences and enhancing customer retention. Embracing data-driven approaches and investing in advanced solutions is key to unlocking the true power of digital marketing in today's fast-paced and competitive. In addition, partnering with companies such as Pingahla provide the additional support system a digital marketer now needs to be successful.

  • Harnessing the Power & Benefits of a Multi-Cloud Strategy

    In today's digital landscape, cloud computing has emerged as a game-changer, empowering organizations to scale their operations, improve efficiency, and drive innovation. However, choosing a single cloud platform can be limiting, as each cloud vendor brings its own strengths and weaknesses. This is where a multi-cloud strategy enters the picture, enabling organizations to leverage the best of multiple cloud providers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). In this blog post, we'll explore the importance of a multi-cloud strategy and its key benefits for organizations. Ensuring High Availability and Failover: One of the primary advantages of a multi-cloud strategy is the ability to achieve high availability and failover capabilities. Organizations can mitigate the risk of service disruptions and downtime by distributing workloads across multiple cloud platforms. Should one cloud provider experience an outage or performance issues, the workload seamlessly transitions to another provider, ensuring uninterrupted service delivery to end-users. AWS, Azure, and GCP all offer robust infrastructure and redundancy features, making them ideal for building a resilient multi-cloud architecture. Optimizing Data Load Latency: Geographical proximity plays a crucial role in determining data load latency. With a multi-cloud approach, organizations can strategically distribute their data across cloud providers' global data centers, placing it closer to end-users or specific regions. This reduces latency and enhances user experience, particularly for latency-sensitive applications. AWS, Azure, and GCP have extensive global footprints, allowing organizations to select data center locations that align with their target audience or compliance requirements. Mitigating Vendor Lock-In Risks: By adopting a multi-cloud strategy, organizations can reduce their dependence on a single cloud vendor, mitigating the risks associated with vendor lock-in. This freedom empowers organizations to negotiate better pricing, leverage competitive advantages, and retain technological flexibility. For instance, AWS offers a wide range of services and strong integration with other Amazon offerings, Azure brings seamless integration with Microsoft technologies, and GCP offers cutting-edge machine-learning capabilities. By leveraging each vendor's strengths, organizations can design a best-of-breed architecture tailored to their specific needs. Enhancing Disaster Recovery Capabilities: Disaster recovery is a critical aspect of business continuity planning. A multi-cloud strategy allows organizations to design robust disaster recovery architectures by utilizing different cloud providers. In the event of a disaster, data, and applications can be replicated and stored across multiple cloud platforms, ensuring rapid recovery and minimal data loss. AWS, Azure, and GCP offer comprehensive disaster recovery services, including backup, replication, and failover mechanisms, making them ideal multi-cloud disaster recovery strategy components. Leveraging Specialized Services and Innovation: Each cloud provider brings unique services, capabilities, and innovation to the table. By adopting a multi-cloud strategy, organizations can tap into a wide array of specialized offerings from AWS, Azure, and GCP. For example, AWS excels in scalable computing and storage services, Azure offers seamless integration with Microsoft's extensive ecosystem, and GCP stands out with its data analytics and machine learning tools. Leveraging these strengths empowers organizations to drive innovation, meet specific business requirements, and gain a competitive edge in their respective industries. In a rapidly evolving digital landscape, a multi-cloud strategy has become essential for organizations seeking to optimize their cloud investments, enhance resilience, and unlock the full potential of cloud computing. By harnessing the strengths of AWS, Azure, and GCP, organizations can ensure high availability, reduce latency, mitigate vendor lock-in risks, enhance disaster recovery capabilities, and leverage specialized services for innovation.

  • Unlocking the Future of Data Management: Migrating from Legacy Informatica to IDMC

    In today's data-driven world, businesses rely heavily on robust data management solutions to extract actionable insights and drive strategic decision-making. Legacy systems, such as Informatica's PowerCenter, Metadata Manager, and others, have served organizations well over the years. However, the time has come to embrace the next generation of data management platforms, such as Informatica's Intelligent Data Management Cloud (IDMC). This blog post explores why current legacy Informatica customers should consider the transition and highlights the benefits of adopting IDMC. Comprehensive Solution Suite: Informatica's IDMC combines a comprehensive suite of data management products under one unified platform. With IDMC, customers gain access to a range of powerful tools, including PowerExchange (PWX), Data Quality (DQ), Data Transformation (B2B DT), Big Data Management (BDM), Data Integration HUB (DIH), and B2B Data Exchange (DX). This integration eliminates the need to purchase these products separately, streamlining the overall data management ecosystem. Enhanced Scalability and Flexibility: As organizations grow and their data management needs evolve, scalability becomes paramount. IDMC offers the scalability required to handle the ever-increasing volumes of data generated by modern businesses. Moreover, its cloud-based architecture allows for elastic scaling, enabling organizations to scale up or down based on demand. This flexibility ensures that businesses can adapt to changing requirements efficiently and cost-effectively. Next-Level Data Quality and Governance: Maintaining data quality and ensuring regulatory compliance is critical for any organization. IDMC leverages Informatica's advanced data quality and governance capabilities, empowering businesses to cleanse, standardize, and enrich their data easily. The platform's comprehensive data governance features, built-in metadata management, and business glossary capabilities enable organizations to establish a solid foundation for accurate and trusted data. Streamlined Operations and Simplified Management: One of the primary advantages of IDMC is its unified and user-friendly interface, allowing for simplified data management and operations. Organizations can leverage a centralized platform to design, develop, deploy, and monitor their data integration workflows and transformations. IDMC's streamlined management and monitoring capabilities lead to increased productivity and operational efficiency. Harnessing the Power of Big Data: In the era of big data, organizations must effectively handle vast amounts of diverse data sources. IDMC's Big Data Management (BDM) capabilities empower businesses to leverage the full potential of big data by seamlessly integrating and processing data from various sources. With IDMC, organizations can unlock insights from structured, semi-structured, and unstructured data, enabling them to make data-driven decisions confidently. Conclusion: In today's rapidly evolving data landscape, legacy data management systems can become a bottleneck for organizations striving to harness the power of their data. By migrating to Informatica's Intelligent Data Management Cloud (IDMC), current legacy Informatica customers can unlock many benefits. With its comprehensive suite of data management products, enhanced scalability, robust data quality and governance features, simplified management interface, and the ability to harness the power of big data, IDMC is poised to drive organizations toward a future of optimized data management and improved business outcomes. So, suppose you are still relying on legacy Informatica solutions. In that case, it's time to consider making the leap to IDMC and embark on a transformative journey toward intelligent and comprehensive data management such as IDMC.

  • How Pingahla implemented Informatica B2B Gateway for EDI at a client

    Implementing Informatica B2B Gateway for EDI requires several steps and specialized skills and expertise with the technology. Here is a general high-level overview of the process: Define the requirements: Determine the business requirements for your EDI implementation. This includes identifying the types of documents you will exchange, the trading partners you will work with, and the specific data elements that must be included in each document. Configure the B2B Gateway: Install and configure the Informatica B2B Gateway software according to the specifications of your environment. This includes configuring the communication protocols, security settings, and other system parameters. Create and map EDI documents: This is where the majority of heavy-lifting in development is done. Use the B2B Gateway tools to create the EDI documents you will be exchanging with your trading partners. You will need to build mappings to map the data elements from your internal systems to the EDI document format and vice versa. Test the EDI transactions: Work with your trading partners to test the EDI transactions. This includes sending and receiving test documents, validating the data, and resolving any issues that arise. Deploy the system: Once testing is complete, you can deploy the B2B Gateway system for production use. Monitor and maintain the system: Monitor the system to ensure that it is operating correctly and resolve any issues that arise. You may need to update the configuration or make changes to the mapping as your business requirements change.

  • What is EDI and how can it benefit your organization?

    Electronic Data Interchange, or EDI, is the transfer of business documents electronically between two trading partners in a standardized format. This process eliminates the need for paper-based documents and manual entry of data, reducing errors, and improving efficiency. EDI has become more widespread with the growth of the internet and the need for companies to streamline their supply chains. EDI can be used for a variety of business documents, such as purchase orders, invoices, and shipping notices. The benefits of using EDI include faster processing times, improved accuracy, and reduced costs. By eliminating paper-based documents and manual data entry, companies can process transactions more quickly and with fewer errors. This can lead to faster turnaround times, improved customer service, and increased productivity. In addition to these benefits, EDI can also help companies improve their relationships with trading partners. By using a standardized format, companies can ensure that their trading partners receive accurate and complete information, which can help to build trust and reduce the risk of disputes. EDI is not without its challenges, however. Setting up an EDI system can be complex and time-consuming, and there can be issues with compatibility between different systems. In addition, there may be resistance to change from employees who are used to working with paper-based documents. Despite these challenges, many companies have found that the benefits of using EDI outweigh the costs. If you’re interested in learning more about EDI, please reach out as we extensive experience implementing EDI solutions in various industries such as healthcare, manufacturing, retail, and more using partnered technology products such as Informatica.

  • ¿Qué es Talend CDC? - Demostración de Réplica Estándar

    Talend CDC (Change Data Capture) es una herramienta que realiza un proceso de integración de datos, que replica y además sincroniza datos desde una fuente a diferentes destinos en tiempo real. Muchas veces se requiere que mantengamos nuestras bases de datos sincronizadas y son diferentes los casos de uso que pueden ser solucionados con CDC como por ejemplo: Tener nuestro DWH alineado con los últimos datos de nuestras fuentes transaccionales. Nuestra arquitectura híbrida necesita mantener el ambiente onpremise alineado con entorno cloud. Tenemos un sistema de replicación que consume mucho rendimiento, en el que necesitamos realizar consultas complejas sobre algunas columnas específicas de nuestros datos para saber si es un nuevo registro, si ha sufrido cambios o fue eliminado. Estamos usando nuestra base de datos transaccional como fuente de reportes, y por ende afectamos su rendimiento. Le gustaría tener un backup online. Quiere una base de datos distribuida. Entre otros. CDC puede ser usado en diferentes ambientes: onpremise, híbrido o cloud, y cuenta con conectores a los diferentes proveedores cloud como son Snowflake, AWS, o Azure, y servicios gestionados como Kafka, AWS kinesis o Azure Eventhub; adicionalmente cuenta con conectores y soporte onpremise como IBM(AS400), que es usado frecuentemente por entidades financieras y lo hace único en el mercado. ¿Cómo funciona? CDC identifica los diferentes cambios que hayan en la fuente y los replica en los destinos, estos datos replicados provienen de las diferentes operaciones realizadas como: INSERT DELETE UPDATE Las fuentes de datos que están produciendo cambios, tienen la capacidad de almacenar logs con los eventos de transacciones que se han realizado sobre los datos, estos logs pueden ser en diferentes formatos, pues varían de acuerdo al fabricante. El agente CDC actúa monitoreando y recolectando esos eventos transaccionales una vez, para almacenarlos en sus propios archivos de logs (Journal), esto, en caso de que sea necesario reusarlos en diferentes Jobs de replicación o de alguna otra manera, sin embargo, este proceso es ajeno a los recursos del sistema y no interactúa con ellos. En los casos en los que ya tengamos una base de datos creada y poblada como fuente, necesitaremos hacer solo una replicación completa, ya que de ese punto hacia adelante Talend CDC capturará los cambios realizados. Es posible aplicar scripts a los datos en la fuente para seleccionarlos, transformarlos o realizar agregaciones como también aplicar cambios en el destino previo a la integración. Su configuración es realizada en una consola gráfica en la que seleccionamos nuestra fuente y destino, esta opera con diferentes roles como: Administrador Operador Espectador Las réplicas son creadas en un ambiente y allí se llamarán modelos, un modelo contiene las fuentes y destinos; una vez configurados, podemos agregar opciones adicionales para estos, como por ejemplo, programar las ejecuciones, cambiar el framework con el que hacemos la integración, o supervisar el job a través de logs dentro de la herramienta, o fuera de ella en archivos de log. Existen diferentes frameworks con los que CDC puede hacer replicaciones de datos, cada uno de ellos cuenta con capacidades adicionales como agregar columnas en el destino, brindar transformaciones, agregar identificadores de secuencias, o incluso para trabajar con destinos de big data. Demo de una réplica Estándar Configuraciones Iniciales En este blog enseñaré una réplica estándar, por lo que seleccionaremos como framework “Free”, lo que significa que no se agregarán columnas o datos adicionales a los que ya se encuentran en la fuente, estaré usando la versión de Talend Change Data Capture V 7.15.0 Para este ejemplo ya tengo creadas las bases de datos fuente y destino en SQL Server a la cuales nombré “training_source” y “training_target”, y en la fuente tengo una tabla llamada “students” mientras que en el destino no he creado nada aún. Así mismo, la configuración de Talend CDC hacia la fuente y los diferentes componentes necesarios para el correcto funcionamiento de la herramienta ya se encuentran instalados, estos son: Motor de Captura Motor de Fuente Motor de destino Gestor de Talend CDC Conector ODBC 32 bits para SQL Server en el DSN (Data Source Administrator) SQL Server Comencemos: Iniciamos configurando la fuente en el menú File -> Source subscribe y diligenciaremos los datos de acuerdo con nuestra configuración de la fuente: Name (Alias): El nombre que quiera IP address: Es la ubicación del motor de origen de CDC donde se definió la conexión de la instancia de base de datos, esta configuración se realizó durante la instalación y configuración de CDC Source type: El motor de la BD fuente Instance or Server name: El que esté configurado en tu servidor Luego notaremos como en la fuente nos mostrará “TrainingSource” con un botón verde que nos servirá para conectarnos a esta: Una vez nos conectamos y veremos cómo se muestran los diferentes ambientes configurados donde podremos crear los modelos, para este ejemplo usaremos Training como nuestro ambiente: Ahora configuramos el journal, por lo que vamos al menú Source -> Journal Management Presionar clic derecho sobre la DB creada training_source y elegimos la opción “Start DB logging process” Luego podremos seleccionar un directorio en el que queramos almacenar el journal y presionamos Ok. Aún en el Journal Management hay que seleccionar la tabla que queremos monitorear y clic derecho sobre esta y presionar la opción “Start table logging process for”: Nota: Debe existir una llave primaria configurada en la tabla para iniciar el proceso de monitoreo sobre esta Después de iniciar el proceso de monitoreo sobre la tabla “students” necesitamos verificar que el journal está funcionando, para esto vamos a hacer un leve cambio sobre alguno de los registros de esta tabla: Nota: Hemos cambiado la ciudad John Doe de New York a Miami Ahora nuevamente desde el Journal Management clic derecho sobre “training_source” y “Display Journal”: Veremos el único receiver disponible por el cambio que acabamos de realizar: Lo seleccionamos y vamos a la pestaña “Posts” y desde allí veremos los cambios que se realizaron: Nota: Se observa el cambio de ciudad, el primer registro es “New York” y el segundo es “Miami” lo que nos garantiza el correcto funcionamiento del journal. Crear Réplica: El primer paso es crear un espacio de configuración dentro de un ambiente, cada réplica dentro de un mismo ambiente comparte los mismos parámetros que las demás y pueden ser administradas juntas. En la pestaña “Environment” clic sobre “Add” y diligenciamos los campos como el siguiente ejemplo: Environment: Código único del ambiente que se creará Source code: Código único para la fuente, debe ser de 3 caracteres Environment type: Para este ejemplo es “Training” Description: Se recomienda agregar descripción Clic sobre “Add” y tendremos el ambiente creado: Nota: El ambiente contiene una una pestaña para fuente donde se crea el modelo y otra pestaña del destino, adicionalmente tres pestañas donde se mostrarán propiedades de la replicación, y elementos gráficos de la réplica para la visualización del proceso. Me aseguro de estar en la pestaña “Source” y presiono clic derecho sobre “Models” y luego en “Add”: Y configurar el modelo de la siguiente manera: Nota: Se observan dos pestañas adicionales “Table Options” y “Script” a las cuales dejaremos por default. Model: Un código único para el modelo Description: Descripción del modelo Type: Para esta replicación usamos JOURNAL, sin embargo existe EXTRACT que permite una extracción full Clic en “Add” y el modelo estará disponible dentro del folder “Models” en la pestaña “Source” del “Environment”: El siguiente paso es agregar Tablas al modelo, expandiendo el Modelo en el botón [+], luego clic derecho sobre “Tables” y finalmente clic en “Add” Una ventana aparecerá para seleccionar las tablas que agregaremos al modelo: Nota: Como se puede observar, estamos usando el framework “Free” que no genera cambios adicionales a nuestros datos Con el botón “Query” podremos listar todas las tablas disponibles Seleccionar las tablas, en este caso “students” Mover la tabla seleccionada al modelo Finalizar La tabla aparecerá creada dentro de nuestro modelo: El siguiente paso es crear el destino, desde la pestaña “Target” de nuestro ambiente damos clic derecho en “Targets” y luego “Add”: Se abrirá la ventana para el nuevo Target: Target: Código único para el destino Description: Descripción del destino Target type: valor por default Name: Nombre de la instancia de nuestra base de datos destino Address: dirección del servidor donde está la base de datos destino Exclude: se llena automáticamente en cuanto ponemos el nombre de la instancia Entonces clic en “Add” y veremos nuestro target creado: El siguiente paso es crear una distribución que significa asociar nuestro modelo a un destino, lo podemos realizar desde la pestaña “Source” o “Target”; expandir nuestro target creado y clic sobre “Distribute”: Y movemos el Modelo desde el panel izquierdo al derecho y clic sobre “Ok”: Una ventana se abrirá automáticamente para finalizar la configuración de la distribución: Seleccionar el tipo de base de datos Doble clic sobre el conector Se abrirá una ventana para ingresar los datos de nuestra base de datos de destino Clic en “Ok” para agregarla Si la conexión es correcta, una ventana de notificación nos mostrará “Connection Successful” y nos mostrará las bases de datos disponibles allí, y finalmente clic en “Add”: Podemos ver ahora desde la pestaña “Map” cómo el target T01 está asociado al modelo M01: El siguiente paso es crear la tabla destino, entonces desde la pestaña “Target” damos clic en el modelo “M01” luego seleccionar la tabla a replicar, clic derecho sobre esta y finalmente “Create Target Table”: Luego de esto veremos que una nueva ventana aparece, mostrando el DDL de la nueva tabla usando la instancia de la base de datos de destino: Clic en “Execute” y luego chequeamos en la base de datos de destino para evidenciar que la tabla fue creada allí, pero se encuentra vacía: Ejecutar Réplica: Ahora que ya está creado el target, el modelo y la distribución, debemos ejecutar la réplica; desde la pestaña “Map” clic derecho sobre el Modelo M01 y luego clic en “Properties”: Desde la pestaña “Recovery” seleccionamos el check box “Load”: Luego vamos a la pestaña “Activity”, una ventana de confirmación nos pedirá si deseamos recargar todas las tablas en esa distribución, a lo que diremos “Yes”. Ahora desde la pestaña “Activity” clic sobre el botón “Start”: Después de iniciar la replicación, los campos de la sección “Counters” y “Last operation” nos mostrarán los resultados y así mismo la distribución quedará activa y a la espera de algún cambio que ocurra en la fuente para replicarlo automáticamente a menos de que deseemos detenerla: Nota: Como se observa, tenía 3 registros en la fuente y esos mismos fueron seleccionados, enviados y agregados. Cuando el modelo esté activo, será de color verde, de lo contrario será amarillo, o puede ser rojo si se encuentran errores durante la ejecución: Ahora se debe revisar en la tabla destino para evidenciar que los registros fueron replicados: Finalmente detenemos la replicación dando clic derecho sobre el modelo que aparece en verde y clic en “Stop” y de esa manera finalizamos la réplica estándar. En resumen: Talend CDC nos permite diferentes modos y ofrece muchas más características para la replicación en tiempo real; en el ejemplo anterior realizamos una replicación simple con los mismos campos en fuente y destino dentro del mismo servidor, así como una ejecución manual dado que pueden ser también programadas. En los próximos posts, enseñaré otras características o funciones con las que Talend CDC trabaja, además de evidenciar las diferencias con otros frameworks allí disponibles. Fredy Antonio Espitia Castillo Talend Developer Certified https://www.linkedin.com/in/fredy199601/

  • What is Talend CDC? - Demonstration of a Standard Replication

    Talend CDC (Change Data Capture) is a tool that performs a data integration process that replicates and syncs data from one source to different targets in real-time. In a lot of business scenarios, it is required to keep databases synchronized, and there are many of those use cases that can be solved with Talend CDC, such as: Keeping the DWH aligned with the latest data from transactional sources Hybrid architecture needs to keep the cloud environment aligned with the on-premises data environment A very performance-consuming replication system that needs to perform complex queries on specific columns of our data to find out if it is a new, changed, or deleted record. A transactional database is used as a reporting source, thus affecting its performance. Having an online backup Having a distributed database CDC can be used in different environments: on-premises, hybrid, or cloud, and it has connectors to different cloud providers such as Snowflake, AWS, or Azure, and managed services such as Kafka, AWS kinesis or Azure Eventhub. It also has connectors and support for on-premises servers such as IBM AS400, which is frequently used by financial entities and makes it unique in the market. How does Talend CDC work? CDC identifies the different changes that are in the source and replicates them in the target source. The replicated data comes from the different operations carried out such as: INSERT DELETE UPDATE The data sources that are producing changes can store logs with the transaction events that have been carried out on the data. These logs can be in different formats, varying according to the manufacturer. The CDC agent works by monitoring and collecting those transactional events once, to store them in its own log files (Journal), in case it’s necessary to reuse them in different replication. However, this process is oblivious to system resources and does not interact with them. In cases where we already have a database created and populated as a source, we will need to only do one full replication. From that point forward, Talend CDC will capture the changes made. It’s possible to apply scripts to the data in the source to select, transform or perform aggregations as well as apply changes to the target prior to integration. Its configuration is done in a graphical console in which we select our source and target. It operates with different roles as: Administrator Operator Viewer The replicas are created in an environment, and there will be called models; a model contains the sources and targets; Once configured, we can add additional options for these, such as scheduling the executions, changing the framework with which we integrate, or supervising the job through logs in the tool or log files. There are different frameworks with which CDC can do data replications, and each of them has additional capabilities, such as adding columns in the target, providing transformations, adding sequence identifiers, or even working with big data targets. Demo of a Standard Replication Initial Settings In this blog, I will show a standard replica, so we will select the framework as “Free”, which means that no additional columns or data will be added to what is already in the source; I will be using Talend Change Data Capture version V 7.15 .0. For this example, I have already created the source and target databases in SQL Server, which I named "training_source" and "training_target" respectively; in the source, I have a table called "students" while in the target, I have not created anything yet. Likewise, the configuration of Talend CDC towards the source and the different components necessary for the correct functioning of the tool are already installed, these are: Capture Engine Source Engine Target Engine Talend CDC Manager 32-bit ODBC connector for SQL Server in the DSN (Data Source Administrator) SQL Server Let's start: We start by configuring the source in the File -> Source subscribe menu and fill in the data according to our source configuration: Name (Alias): The name that you prefer IP address: Is the location of CDC source engine where the DB instance connection has been defined, this setup was done during the CDC installation and configuration Source type: The engine of the source DB Instance or Server name: The one configured on your server Then we notice that in the source it will show us "Training Source" with a green button that will help us to connect to it: When we connect to that source, we will see that the different configured environments where we can create the models are shown, for this example, we will use Training as our environment: Now we configure the journal, so we go to Source -> Journal Management Right click on the created DB "training_source" and choose the option “Start DB logging process” Then we can select the directory in which we want to store the journal and press Ok. Still in Journal Management, we have to select the table we want to monitor and right click on it and then, press the option “Start table logging process for”: Note: A primary key must be configured in the table to start the monitoring process on this. After starting the monitoring process on the "students" table, we need to verify that the journal is working, for this we are going to make a slight change on one of the records on this table: Note: We have changed John Doe's city to Miami instead of New York Now again in the Journal Management right click on “training_source” and “Display Journal”: We will see the only receiver available, which is the one written for the change we just made: We select it and go to the "Posts" tab and from there we will see the changes that were made: Note: The change of city is shown, the first record is "New York" and the second is "Miami" which guarantees the correct functioning of the journal. Creating the Replication: The first step is to create a configuration space within an environment; each replica within the same environment shares the same parameters and can be managed together. In the "Environment" tab, click on "Add" and fill in the fields like the following example: Environment: unique code of the environment to be created Source code: Unique code for the source, must be 3 characters exactly Environment type: For this example it’s “Training” Description: It’s recommended to add a description Click on “Add” and we will have the environment created: Note: The environment contains a tab for the source where the model is created and another tab for the target. Additionally, three tabs where the replication properties will be displayed, and graphic elements of the replica to visualize the process. Now from the “Source” tab right click on “Models” and then on “Add”: And configure the model as follows: Note: Two additional tabs “Table Options” and “Script” are shown, for which we will use the configuration set by default. Model: A unique code for the model Description: Description of the model Type: For this replication, we use JOURNAL, however, there is EXTRACT that allows a full extraction Click on “Add” and the model will be available in the “Models” folder in the “Source” tab of the “Environment”: The next step is to add Tables to the model, expanding the Model on the [+] button, then right-clicking on “Tables” and finally clicking on “Add” A window will appear to select the tables that we will add to the model: Note: As you can see, we are using the "Free" framework that doesn’t generate additional changes to our data With the "Query" button we can list all the available tables Select the tables, in this case "students" Move the selected table to the model Finish The table will appear created inside our model: The next step is to create the target, from the "Target" tab of our environment we right click on "Targets" and then "Add": The window for the new Target will open: Target: unique code for the target Description: Description of the target Target type: default value Name: Name of the instance of our target database Address: address of the server where the target database is located Exclude: it’s filled automatically as soon as we put the name of the instance Then click on “Add”, and we will see our created target: The next step is to create a distribution which means associating our model to a target, we can do it from the "Source" or "Target" tab; expand our created target or source, and click on “Distribute”: And we move the Model from the left panel to the right and click on “Ok”: A window opens automatically to finish the configuration of the distribution: Select the type of database Double-click on the connector A window will open to enter the data from our target database Click on “Ok” to add it If the connection is correct, a notification window will show us "Connection Successful" and will show us the databases available there, and finally click on "Add": We can now see from the "Map" tab that the target T01 is associated with the model M01: The next step is to create the target table, then from the "Target" tab we click on the "M01" model, then select the table to replicate, right-click on it, and finally "Create Target Table": After this we will see that a new window appears showing the DDL of the new table using the target database: Click on “Execute” and then we check in the target database to show that the table was created there, but it is empty: Run Replication: Now that the target, the model and the distribution are created, we must execute the replication; from the “Map” tab, right click on Model M01 and then click on “Properties”: From the "Recovery" tab we select the "Load" check box Then we go to the "Activity" tab, a confirmation window will ask us if we want to reload all the tables in that distribution, to which we will say "Yes". Now from the "Activity" tab click on the "Start" button After starting the replication, the fields of the "Counters" and "Last operation" section will show us the results and likewise, the distribution will remain active and waiting for any change that occurs in the source to replicate it automatically: Note: As shown, we had 3 records in the source and those same ones were selected, sent and added to the target. When the model is active, it will be green, otherwise it will be yellow, or it can be red if errors are encountered during execution: Now it must be checked in the target table to show that the records were replicated: Finally, we stop the replication by right-clicking on the model that appears in green and clicking on “Stop”. Talend CDC offers many more features for real-time replication. In the previous example, we perform a simple replication with the same source and target fields, as well as a manual execution since they can also be scheduled. In the next posts, I will show other features or functions that Talend CDC works with, as well as highlighting the differences with other frameworks available there. Fredy Antonio Espitia Castillo Talend Developer Certified https://www.linkedin.com/in/fredy199601/

bottom of page