How To Upload Files to Amazon S3 Bucket Using Talend Studio
Our instructional step by step video by our #Pingahla certified #Talend Kuldeep Singh will show you how to create an ETL data pipeline using Talend Studio and its #Amazon #S3 bucket connector.
Below are the steps that will show you how to build out such a data pipeline, else you can scroll to the bottom of this blog post and check out our YouTube instructional video on the following topic.
Open Talend Studio. Using the Local or select a Connection. In this example, we will select Local and select an existing project, "Local_Project." You can do the same or select any other project where you want to create the data pipeline.
Once the Talend Studio has started, make sure you are in the Integration Perspective, if it is not already selected. You can also change the perspective by going to Window > Perspective > Integration. This can also be done from the taskbar by clicking on Integration as shown in the picture below:
Create a new Job by right-clicking on Repository > Job Designs, then selecting Create Standard Job.
Enter the Job Details and click on finish when done. Always put in a description for the job, this will help others to understand the purpose of the job in a collaborative environment.
Now we will start adding the components from the Palette to the Job.
For the purpose of the guide, we will generate data using the tRowGenerator component available in Talend Studio Palette. The generated data will be written in a delimited file. The generated file will then be placed on an S3 bucket.
First, we will start creating a file with dummy data using the tRowGenerator component.
In the Palette view, begin typing the name of the component, in this case, tRowGenerator, into the Find Component box and then click the Search button. You can also press the Enter key.
You can also click anywhere in the Designer and start typing the name of the component to add it.
Once, the component is added to the designer, double click the component to open it. Here we will define the structure of data to be generated.
·Add as many columns to your schema as needed, using the plus (+) button.
Type in the names of the columns to be created in the Columns area and select the Key check box if you wish to define a Key for the generated data
Make sure you define then the nature of the data contained in the column, by selecting the Type in the list. According to the type you select, the list of Functions offered will differ. This information is therefore compulsory.
Once done, Click on OK to close the dialog box.
Now to write the generated data by the tRowGenerator to a flat file on your local system, we will use the tFileOutputDelimited component.
Just follow the same steps that we used to add the tRowGenerator component.
Once added, your Job should look like this.
Now we will connect the tRowGenerator component to the tFileOuputDelimited component. To do that, right click the tRowGenerator_1 component, then select Row > Main. Move the mouse over top of tFileOuputDelimited_1 component and click on it.
Once done, your Job should look like this.
Now let’s configure the tFileOutputDelimted component. To do that, double click the tFileOutputDelimted_1 component and you would notice that the Component View has opened at the bottom of the screen.
Provide the absolute path of the file where do you want to output the generated data.
In this example, it is "C:/Users/PINGAHLA/Desktop/Talend_Demo/demo.txt"
Now let’s add the tS3Connection & tS3Put component to the Job.
Now let’s connect the initial Subjob that we are using to create a file and connect it to the tS3Connection.
Right-click on the tRowGenerator_1 component, then select Trigger > On Subjob Ok. Move the mouse over top of tS3Connection_1 component and click on it.
Similarly, Right-click on the tS3Connection_1 component, then select Trigger > On Subjob Ok. Move the mouse over top of tS3Put_1 component and click on it. Your final Job should look like below.
Now let’s configure the tS3Connection & tS3Put component.
Double-click tS3Connection_1 to open its Basic settings view on the Component tab.
In the Access Key and Secret Key fields, enter the authentication credentials required to access Amazon S3. Ensure that the values are enclosed in double-quotes.
The tS3Connection component should have all the details as above.
Double-click the tS3Put component to open its Basic settings view on the Component tab.
Select the Use an existing connection check box to reuse the Amazon S3 connection information you have defined in the tS3Connection component.
In the Bucket field, enter the name of the S3 bucket where the object needs to be uploaded. In this example, it is talend-data and the bucket is already present in Amazon S3.
Note: Ensure that the bucket is already present in your Amazon S3 Instance.
In the Key field, enter the key for the object to be uploaded. In this example, it is demo.
Note: This field ensures the filename for the file that is being uploaded to the S3 bucket.
In the File field, browse to or enter the path to the object to be uploaded. In this example, it is "C:/Users/PINGAHLA/Desktop/Talend_Demo/demo.txt"
Press Ctrl + S to save the Job. Press F6 to run the Job.
Run details on a successful run should look like below:
Login into your S3 bucket and you will find that the file has been uploaded to your S3 Bucket. In case you want to upload the file to a particular folder in your S3 bucket, enter the entire path along with the filename in the Key field. In this example, it is “demo/demo.txt”.
Press Ctrl + S to save the Job. Press F6 to run the Job.
Login into your S3 bucket and you will find that the file has been uploaded to your S3 Bucket.
Check out our instructional video on how to create an ETL data pipeline using Talend Studio and its #Amazon #S3 bucket connector