Self-Service with Talend Solutions (Stitch, Pipeline Designer, and Data Preparation)

gregolsen
Nov 19, 2019
7 min read

More and more of Pingaha's Talend customers are looking for ways for self-service around data extraction, transformation, and insights, allowing them to conduct their business in a globally competitive market. At Pingahla we are constantly educating our Talend customers on many of the current and new self-service Talend solutions. Within this blog posting, I will be discussing the high-level benefits of Talend's self-service solutions such as Stitch, Data Pipelines and Data Preparation.

Let me first start with Stitch. In Nov 2018, Talend acquired Stitch to complement its unified platform, allowing its users to have the ability for self-service for data extraction for non-technical users. Stitch is a web-based cloud EL (Extract and Load) solution in which users can quickly extract data from a wide variety of supported sources and load that data into a specific supported target with Stitch. This can be down in as little as 7 major steps.

1. Simply go to www.stitchdata.com.

2. If you have an account either Sign In or Sign Up for a Stitch account.

Note: Within these steps below, I have created my account.

2.1. Enter the required requested information to create your Stitch account.

2.2 Stitch will send an email to confirm your email address. Please click the confirmation email, as this will confirm your email address and account with Stitch.

3. Once you have either logged in or created your account, next you will want to select the data source in which you would like to extract the data. Stitch has around 108 data source connectors. Quickly find your source and select the connector. In this example, we have selected the "Microsoft SQL Server Integration" source connector.

Note: If you cannot find your source connector, you have the ability to suggest a connector to the Stitch-Talend team. If you are a paid customer, most likely your suggestion will move hirer within the development queue for Stitch-Talend.

4. Next, you will need to enter in details of your data source and select the specific data tables so that Stitch can connect and extract the data.

Note: You can also invite a member of your team to enter in the connection details if you do not have it.

5. Once you have entered your source details you will then need to select your target destination. Stitch has 8 target destination connectors in which you can choose from. In this example, we are selecting Amazon S3.

6. Once you have selected your target destination, you will then need to enter your target connection details.

7. Once you have entered in your target destination, you will set up your schedule and PRESTO, you have created your very first self-service data pipeline with Stitch.

Another truly self-service ETL (Extract, Transform and Load) solution from Talend is Pipeline Designer. Talend's Pipeline Designer is a cloud-based ETL solution that recently was released to the public in April 2019. Not only does it allow for its users to build-out ETL data pipelines using drag and drop functionality, but it also allows users to incorporate and support Python coding! A huge yay for Data Scientist. Now there are a few restrictions around the Python competent which I recommend you review the Talend documentation. But here are a few steps to get you started with Talend's Pipeline Designer.

1. To get started with Pipeline Designer you will need to first have access to Talend Cloud.

Note: Talend Cloud has made recent announcements with its latest update with Talend Cloud on Azure, so you will need to make sure you select the correct cloud provider and zone before logging in. In this example, I am logging into the Talend Cloud AWS, East Zone.

2. Once you have logged into Talend Cloud, you will be directed to Talend's Cloud portal which will be the "Welcome" page. Here you will be allowed to Launch the different Talend Cloud applications or watch specific tutorials on the different Talend Cloud applications. For the purpose of this blog posting, you will need to launch Pipeline Designer. There are two ways to launch Pipeline Designer, which include finding the application on the Welcome page and click on the Launch button, or within the left-hand corner select the "Select an App" drop-down and select Pipeline Designer.

3. Once you Launched Pipeliner Designer you will either be directed to Pipeliner Designer or a Pop Up that will request you to set up your Talend Pipeline Designer remote or cloud engine. In this blog posting, I will not be going through the steps in setting up a Talend remote/cloud engine for Pipeline designer as I will revisit this in a future post.

4. After launching Pipeline Designer you will be directed to the "Datasets" page within Pipeline Designer. You can create your data pipeline from the "Datasets" page or the "Pipelines" page. In my blog posting, I will be creating my data pipeline from the "Datasets" page by right-clicking the Customer data source and Add Pipeline.

5. Once you add your Dataset to a Pipeline or start creating a data pipeline from scratch, you will be directed to the Talend Pipeline Designer Canvas. Here you will be able to create your ETL or ELT data pipeline process. In this example, you will see we already have our Customer data source as the first component in our data pipeline.

6. Once you have established a pipeline and data source, you can then add an additional Talend processor competent for your data pipeline. To add a Pipeline processor, you will want to click the "+" button within the data pipeline flow. The data processor will allow you to transform your data. Talend offers 9 major processor components with Talend Pipeline Designer. For the full list please visit; Talend Cloud Pipeline Designer Processors Guide.

7. After adding the different Talend processor competent to your data pipeline, you will then need to add the Target destination. To add the Target destination click on the "Add Destination" button within the data pipeline flow.

Note: You have the ability to preview the data as it flows through the different Talend processors by clicking on the "Preview" button within the data pipeline flow.

8. Now that you have completed building out your ETL/ELT Talend Data Pipeline Designer data pipeline, you will want to run your pipeline. To simply run your data pipeline, click on the "Play" button on the top middle right-hand corner.

9. If the data pipeline you have completed is correct. You will receive a green pop up notifying you your data pipeline has completed and loaded successfully or unsuccessfully.

One of my favorite Talend self-service solutions is Data Preparation. Not only do I believe this is a favorite self-service solution from Talend, but many of my customers would say the same. Talend Data Preparation aka, Data Prep is a on-prem or cloud solution that is truly simple and easy to use solution that enables users to quickly identify data anomalies as well as speed up the delivery process of cleansed or enriched data by allowing non-technical users to collaborate with IT on collaborative pipeline solutions, or enrich data sets for BI reporting needs without involving IT.

Some of the major benefits of Talend Data Preparation that make it my favorite solution is the following;

* Collaboration in development with non-technical users

* Replacement of STTMs (Source To Target Mappings)

* Self-service to profile, transform, and enrich data sets

These are the three major reasons why I enjoy using Talend Data Preparation. The simple and easy to use solution looks like Microsoft Excel. So off the bat, there is nothing technical or scary about the solution. But let me break it down.

1. Talend Data Preparation is an on-prem and Cloud-based solution. Here I will focus on the cloud version. The on-prem version works very similarly, except when connecting to data sources and how to collaborate with a developer's Talend Studio data pipeline. But I will get into that in a later post.

2. Similar to steps 1-3 outlined in the Talend Pipeline Designer steps, you will need to make sure you are logged into Talend Data Preparation. Or if you are using the on-prem solution, you will need to have the on-prem solution installed with a valid license.

3. Once you are logged in, you will need to make sure you have the data sources you would want to profile, transform or enrich. Talend allows users to add different types of datasets that include local flat files (txt, Excel, csv, and etc), talend jobs, databases, Amazon S3, and Salesforce.

4. After you have established and imported your datasets, you can now create a "Preparation" in Talend's simple and easy to use grid-like graphical interface.

5. To create a "Preparation," simply click on the "Add Preparation" button. The "Add Preparation" button will allow you to profile, enrich and transform your source data.

6. Now that you have imported and set up your connections to your Datasets, and have created your first preparation you will now notice your add has been uploaded to Talend's grid-like interface. Now what you may or may not notice off the bat is that Talend Data Preparation has quickly profiled the data for you highlighting its analysis. What do I mean by this? Now that your data has been created as a preparation, take a look at the following data highlights.

7. Column Discovery: Talend has analyzed the metadata doing its best to determine the column metadata. You have the option to update this manually if the analysis is incorrect.

8. Quality Bar: Talend Data Preparation, highlights columns in different colors (Green, Black, and Orange). The different colors are used to identify valid records, empty records, and invalid records. You can also create your own rules on the analysis.

9. Talend Data Preparation, allows users to enrich and transform the data set without enriching or transforming the data directly on the source. But within Talend Data Preparation you are able to leverage "FUNCTIONS" that allow you to add enrichment or transformation rules to the data set.

10. Data Profiling: As mentioned prior, another huge benefit of Talend Data Preparation is the data profiling aspect. Users are able to showcase and quickly profile the data columns to have a better understanding of the data.

11. Data Lookup: With an self-service solution you need to have the functionality to do a lookup for either validation or enrichment, and Talend Data Preparation allows you to easily do this with the lookup feature.

12. Filters: In addition, to adding lookups you can also add filters to the data set. This can quickly be done by selecting the Filters option on the top middle-right hand corner of Talend Data Preparation.

13. Recipe: Now here comes the best feature, "Recipe." Now as you enriched and transformed your data while doing this Talend Data Preparation was documenting each transformation step allowing you to no longer needing to create lengthy STTM's as well as see how the data will change when you apply a function. In addition, this "Recipe" can be shared with a developer's Talend ELT or ETL job in Talend Studio, allowing a user to collaborate with IT.

14. Exporting: The last beneficial feature is the exporting feature. Talend Data Preparation allows users to export the data in multiple formats that include flat files, Tableau and Amazon S3.

Hopefully, you enjoyed this blog posting. Please leave a comment for any additional feedback or questions. Also, if you are interested in learning more about Pingahla's Talend services, please contact sales@pingahla.com .

Self-Service with Talend Solutions (Stitch, Pipeline Designer, and Data Preparation)

Recent Posts

Get in Touch with Pingahla

New York City, NY (Headquarters)

Pune, India

Bogotá, Colombia