Combine multiple files into one
This tutorial will outline the steps
required to build a Talend job to append multiple csv files of the same format
into a single combined master csv file.
Sample csv files used in this
tutorial can be downloaded from these links:
Lets get started by assuming we are
trying to load data from CSV files (like above) that contain the following
fields:
• ItemId
• ItemName
• Qty
• DateofPurchase
• Price
The file would look something like
this:
We want to load this file into a CSV
file with a schema as follows:
Next we take the following
steps to build our Talend Open Studio solution:
Step 1: Open Talend
Open Talend and create or open an existing project
Open Talend and create or open an existing project
Step 2: Create a new job
Right click on Job Designs in the Repository window and select “Create job”
Right click on Job Designs in the Repository window and select “Create job”
Name the job “item_load”
Step 3: Create a File Delimited
repository element
Now we need to create a repository item for our CSV file. To do this click the arrow next to “Metadata” in the Repository window and right click on “File delimited” and select “Create file delimited”.
Now we need to create a repository item for our CSV file. To do this click the arrow next to “Metadata” in the Repository window and right click on “File delimited” and select “Create file delimited”.
Enter a name for your example file
schema and click next:
Select your example file that we saw
earlier (or use your own) by clicking the “Browse” button.
Click Next
On the next screen select field
separator as comma (as per your file data) and select the checkbox for “Set
heading row as column names”, then click “Refresh Preview”. Click Next.
Talend has now generated an
estimated schema, review this schema and make any changes as you would like,
then click Finish.
Step 5: Design your job
In this step we are going to design
our job to connect our CSV file to our Master CSV file.
Open the job we created in Step 2 by double clicking the name of the job under Job Designs in the Repository window.
In the palette window on the right hand side type “fileinputDelimited” into the search box, then drag and drop the component “tFileInputDelimited” into the job window in the center of the screen.
Select the tFileInputDelimited
Component in the job window and then select the “Component” tab near the bottom
middle of the screen.
Click the drop down box next to “Property type” and select “Repository”
Then click the button with three dots that appears and select the delimited file from the repository that we created in Step 3.
Select a tMap component from the
palette on the right hand side of the screen and drag it into the job design
window
Right click on the tFileinputDelimited component, select Row->Main and connect a row to the tMap component.
Select a tFileoutputDelimited
component from the Palette and drag it over to the job design window.
Click on the tFileOutputDelimited
component to select it, then navigate to the Component tab in the lower middle
of the screen
Under File Name, set the file location where you want to store the master file
Set the field separator to set your required delimiter.
Set append check box to append the data to existing file instead of creating new file every time.
Note: while running job for multiple files, All input files data will be appended to master file with this append check box
Check include header to write master file with header column.
Next right click the tMap component,
Select “Row” then “New Output”.
Connect this new output to the tFileOutputDelimited component
Name the output row- “output1″
Connect this new output to the tFileOutputDelimited component
Name the output row- “output1″
Now your tMap is connected to the tFileOutputDelimited component, double click the tMap component to open up the tMap editor.
Click and drag the data fields from the “row1″ panel to the left of the screen to the corresponding “output1″ data fields. This tMap editor enables you to map to fields from the CSV input to the correct output fields in Master CSV files.
Click “Apply” then “OK” to save the
changes and return to the Job Design window
Now your job design is complete and we just need to run the job to load the file into Master CSV File
Step 6: Run your Job
In the group of tabs in the lower
middle of the screen, select the “Run” tab
Under Execution, click the “Run”
button
Your job will load and then run and you will see how many rows were processed in the job design window.
Congratulations! You have now loaded
your one file into your Master File.
This job is great but it is only set
up to process a specific CSV file which we designate. Our next step is to
extend this job so that it will process multiple CSV files of the same format
from a specified directory.
Step 7: Set up the File Directory for
Iteration
Select the component tFileList from
the Palette and drag it into the job design window
Click on the tFileList component and then click on the “Component” tab near the bottom center of the screen
Under “Directory”, select the directory that contains the excel files by clicking on the “…” button.
Step 8: Connect tFileList
Right click on the tFileList component, select Run -> Iterate
Connect this “Iterate” link to the tFileInputDelimited component
Right click on the tFileList component, select Run -> Iterate
Connect this “Iterate” link to the tFileInputDelimited component
Step 9: Edit the tFileInputDelimited
field
Select the tFileInputDelimited component and then navigate to the Component tab near the bottom middle of the screen
Select the tFileInputDelimited component and then navigate to the Component tab near the bottom middle of the screen
Change the Property type from “Repository” to “Built In”
Under “File name / Stream” delete the filepath string
Type in tfilelist then press cntrl + space
Select tFileList_1.CURRENT_FILEPATH
Step 10: Run the job
Congratulations! You are now ready to run your job.
Congratulations! You are now ready to run your job.
Make sure you have multiple CSV files in your directory
Select the “Run” tab near the bottom middle of the screen and click the “Run” button
Now you will see all your files are processed one by one and loaded into the Master File
No comments:
Post a Comment