Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.
ü Data Profiling in Talend Data Quality (TDQ):
In TDQ we can perform data profiling by using the following methods.
1.Structural Analysis
2.Cross Table Analysis
3.Table Analysis
4.Column Analysis
5.Correlation Analysis
ü Data Profiling using Data quality (For ex: using…File Delimited):
Ø First, we must save the data in .CSV (REVENUE_INSIGHTS_REPORTS.CSV) format. As show in below
Open Data quality tool
Select “File Delimited” and click “Create file delimited connection” option
Then give name of that file and click “NEXT”
Give the location Path “C:/Users/SSS2019079/Desktop/Revenue Insights Reports” to that file and give the format “WINDOWS” and click-Ã ’NEXT’--->
Give the Field Separator “,” and
Header “1” and
Format “CSV” and Text Enclosure “\”” and
Check the Set heading row as column names and click refresh Preview
And click “NEXT” option
Give the Schema name “Revenue insights” and, we must see the structure of the table in the Description of the schema. And
Click the -->”Finish” option
This is our Schema, Table and columns are available in DQ Repository and metadata
Create new analysis in “Data profiling”
z
Select Column analysis and Basic Column Analysis and
Click--> “NEXT” Option
Ø Give the name of the analysis ”Revenue_insights_CA” and
Click "FINISH” option
Give the Connection “Revenue_insights_report File delimited”
Select the columns in “SELECT COLUMN” option as show in below screenshots and click “OK”
Select Indicators in “SELECT INDICATORS” option
Select Simple statistics indicator and select columns filters
Row count
Null count
Distinct count
Unique count
Duplicate count
Blank count
Default value count as shown in below
v ROW COUNT:
For Ex:
Select all columns of row count and click
”OK” option
ANALYSIS RESULTS
v NULL COUNT:
For Ex: Select all columns of null count and click” OK” option
ANALYSIS RESULTS:
v DISTINCT COUNT:
For Ex: Select all columns of Distinct count and click” OK” option
ANALYSIS RESULTS:
v UNIQUE COUNT:
For Ex: Select all columns of Unique count and click” OK” option
ANALYSIS RESULTS:
VIEW VALUES:
VIEW ROWS:
v DUPLICATE COUNT:
For Ex: Select all columns of Duplicate count and click” OK” option
ANALYSIS RESULTS:
VIEW VALUES:
ROW VALUES:
v BLANK COUNT:
For Ex: Select all columns of Blank count and click” OK” option
ANALYSIS RESULTS:
Thanks for the post. Data profiling using Talend is very interesting.
ReplyDelete
ReplyDeleteThanks for sharing this blog this content is very significant for me I really appreciate you.
click here now
This comment has been removed by the author.
ReplyDelete