Friday, September 18, 2020

Data Profiling

 Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.

ü Data Profiling in Talend Data Quality (TDQ):

In TDQ we can perform data profiling by using the following methods.

 

1.Structural Analysis

2.Cross Table Analysis

3.Table Analysis

4.Column Analysis

5.Correlation Analysis

 

ü Data Profiling using Data quality (For ex: using…File Delimited):

Ø First, we must save the data in .CSV (REVENUE_INSIGHTS_REPORTS.CSV) format. As show in below

 

Open Data quality tool

 Select “File Delimited” and click “Create file delimited connection” option

 

 

Then give name of that file and click “NEXT”

Give the location Path “C:/Users/SSS2019079/Desktop/Revenue Insights Reports” to that file and give the format “WINDOWS” and click-à’NEXT’--->

Give the Field Separator “,” and

Header “1” and

Format “CSV” and Text Enclosure “\”” and

Check the Set heading row as column names and click refresh Preview

And click NEXT” option

 

 

 

Give the Schema name “Revenue insights” and, we must see the structure of the table in the Description of the schema. And

 

Click the -->Finish” option

 

 

 

This is our Schema, Table and columns are available in DQ Repository and metadata

 

 

 

 Create new analysis in “Data profiling

 

z

 

Select Column analysis and Basic Column Analysis and

Click-->NEXT” Option

 

 

Ø Give the name of the analysis ”Revenue_insights_CA” and

Click "FINISH” option

   

Give the Connection “Revenue_insights_report File delimited

 

Select the columns in “SELECT COLUMN” option as show in below screenshots and click “OK

 

 

 

Select Indicators in “SELECT INDICATORS” option

 

 Select Simple statistics indicator and select columns filters

 

Row count

Null count

Distinct count

Unique count

Duplicate count

Blank count

Default value count   as shown in below

 

v ROW COUNT:

For Ex:

Select all columns of row count and click

”OK” option

 

 

 

ANALYSIS RESULTS

 

 

v NULL COUNT:

For Ex: Select all columns of null count and click” OK” option

 

ANALYSIS RESULTS:

 

 

 

v DISTINCT COUNT:

For Ex: Select all columns of Distinct count and click” OK” option

 

 

ANALYSIS RESULTS:

 

 

 

 

 

v UNIQUE COUNT:

For Ex: Select all columns of Unique count and click” OK” option

 

 

ANALYSIS RESULTS:

 

 

VIEW VALUES:

 

 

VIEW ROWS:

 

 

 

 

v DUPLICATE COUNT:

For Ex: Select all columns of Duplicate count and click” OK” option

  

 

 

ANALYSIS RESULTS:

 

VIEW VALUES:

 

 

 

 ROW VALUES:

 

 

 

 

 

v BLANK COUNT:

For Ex: Select all columns of Blank count and click” OK” option

 

 

 

 

ANALYSIS RESULTS:

 

 

 

 


3 comments:

  1. Thanks for the post. Data profiling using Talend is very interesting.

    ReplyDelete

  2. Thanks for sharing this blog this content is very significant for me I really appreciate you.

    click here now


    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete