13.06.2020»»суббота

Bods Table Comparison Generated Key Column

13.06.2020

Mar 10, 2014  Open the table comparison, pull all the composite keys from the source on the left side. On the right side add all the columns you want to compare. Check for 'input keys contain duplicates' In the target table double click options use input keys Also if the record counts are more, use 'sorted comparison' Arun. Specify the column of the comparison table with unique keys i.e. By design contains no duplicate keys as the Generated key column. A generated key column indicates which row of a set containing identical primary keys is to be used in the comparison. This provides a method of handling duplicate keys in the comparison table.

Here I am going to explain Load from text file, Usage of query transform, Table Comparison ,Map operation and Key Generation in a single Job.

We have the below employee details in a text file. The information that we have is Employee ID, Employee Name, Department, Salary and Age.

We need to design a job to load these information into a table named Employee_learning with the structure shown below. Once this data is being loaded in to the table, every update on Name or Department or Salary or Age of the employee should be treated as a new insert to the table. The way by which we can accomplish this scenario is explained further.

Target Table structure.

*) Create a new job with a workflow and data flow in it.

*) Put the source file in a location.

*) Right click on the Flat files and click on New. The below window will pop up.

*) At the Data files section of the new window, Select the location as ‘job Server’ from the dropdown, specify the root directory in the job server where the text file is placed.

Once the source file is specified, BODS automatically populates the data in the flat file window with the information from the source file. Remember to set the column delimiter as per the data. Here I have chosen it as Tab. You can edit the field name, data type, length, etc. as you desire according to the data available. Click on Save and Close.

*) Drag the newly created flat file to your data flow and make it as source. Drag a Query transform and map the columns from the source files.

*) Drag a table comparison transform and connect the output of the query transform to the Table Comparison transform.

Select the target table name, mention the input primary column and the list of compare columns as shown below.

Note: Differences between the difference comparison methods.

  1. 1) In the row-by-row mode, the Table Comparison Transform executes a select statement for every single input row to lookup the value in the compare table.
  2. 2) In sorted input mode, we guarantee that the data is coming sorted by the columns listed as primary key columns ascending. Then DI will execute one select statement for the entire comparison table with an order by on the columns of the input primary key list ascending plus the generated key column descending if specified. The advantage is, just one SQL statement executed and no memory required in the engine.
  3. 3) In cached mode, this transform is just collecting the input data and indexing it in memory to later lookup the rows inside the cache.

Here I have selected ‘Cached Comparison Table’.

Map operation transform allows conversion between data manipulation operations. Map the Table comparison to a ‘Map_Operation’ Transform and change update as insert as shown below.

Map the output of the map operation to a key generation transform and select the target table, generated key column and increment value as shown below.

Now our job will look like this.

Now let’s execute the job and see the data loaded into the target table.

Let’s see what happens to the data by adding a new row in the source file as well updating an existing record in it.

Here I have added a new row and also updated the salary of row number 3.

Let’s see what happens to the data in the table by executing the job once again.

We will have the updated row and newly inserted rows added in the table as shown below.

Hello Experts,

I was trying to understand Table Comparison’s Input Contain Duplicate Keys option and after my experiments I found below results.

TABLE COMPARISON: – Input Contain Duplicate Keys

When records coming from source has multiple similar values for primary key then to process them/handle them we use Input Contain Duplicate Keys option of Table Comparison.

If column(s) specified in Input Contain Primary Key do not have unique key for every incoming row then we go for this option to handle duplicate keys.

How did table comparison process the records when this option is checked?

Initially, before job execution, both before image and after image are empty. When job is executed then data from comparison table is loaded in the before image of Table Comparison and from before image the correct record(s), as per generated OPCODES, Insert, Update or Delete, are sent to after image which is nothing but our target table. When job is executed this transform fires select statement based on columns present in Input Primary Key columns and all the records are brought to before image buffer.

select col1, col2, . from target_table where input primary key columns in (xxxx,xxxx,. ) -> If there are more than one columns specified in Input Primary Key columns section. But if there is only one column present in Input Primary Key columns then resulted sql is:

select col1, col2, . from target_table where input primary key columns = ID;

Now, before Image contains the initial target data before job execution. When job is executed then every incoming source row is compared with the records present in before image of comparison table and according to the opcodes generated (I, U or D) the result is sent to after image i.e., to final target.

When Input Contain Duplicate Keys option is checked then it means transform knows that there is duplicate data coming from source. So of all the records present in after image if any column values, based on columns present in Input Primary Key columns, changes then Table comparison will generate ‘U’ opcode for all incoming rows having same ID ( ID means column present in Input Primary Key columns).

Lil bit confused right? Let’s dig deeper with an example.

Initial target: – Data present in target before job gets executed.

Now I’ve inserted two new records with EMP_ID = 1010, 1011 as highlighted.

For such schemes the time-out period of the initialisation phase cannot be kept large because it increases the probability that a node is compromised in the initialisation phase. On the other hand if the time-out period is kept small, then the connectivity of the network is affected. All these schemes assume that a node cannot be compromised in the initialisation phase which is not true. Key generation in stream cipher.

Bods Table Comparison Generated Key Column List

Updated Source: –

Input Contain Duplicate Keys option is checked in Table Comparison: –

After job execution:-

Record with EMP_ID 1010 (EMPNAME RAJ) and 1011 (EMPNAME RAJ EY) is present in target before job is executed. Supposed two new records with same EMP_ID (1010, 1011) arrives from source. Also we’ve 1010 and 1011 already present in source too.

Existing records: 1010 RAJ…, 1011 RAJ EY….

New Records: 1010 RAJ WRITER,… 1011 RAJ MCKINSEY….

First record with EMP_ID 1010 (1010, RAJ WRITER,.) is compared with existing EMP_ID, 1010, in comparison table’s before image buffer (1010, RAJ,.). Table comparison finds that EMPNAME was initially RAJ and new incoming name for same column is RAJ WRITER so it sends an update ‘U’ opcode.

Next record with EMP_ID 1011 (1010, RAJ MCKINSEY,.) is compared with existing EMP_ID, 1011, in comparison table’s Before Image buffer (1011, RAJ EY,.). Table comparison finds that EMPNAME was initially RAJ EY and new incoming name for same column is RAJ MCKINSEY so it sends an update ‘U’ opcode.

Then comes 3rd row of source which has EMP_ID 1010 (1010, RAJ,.) and this row is compared with the existing record in target but it finds that no column value is changed. So ideally it should not send an Update but since Input Contain duplicate keys is checked so due to this the existing id will also be sent as ‘U’ opcode because for this EMP_ID earlier there was an update.

Similar is the case with 4th row which has EMP_ID 1010 (1010, RAJ EY,.).

Now you might be thinking why for other records it has not sent an update?

It will not sent an update as no column values for EMP_ID 1012, 1013, 1014… and so on have changed. Hence it no update for them.

Now comes the question out of all the updates which will be reflected in target?

Always the last record or the latest record will be reflected in target of all the Updates.

So Final Target: –

Now let’s suppose that I insert a new record with EMP_ID 1020: –

Target before job execution: –

So now TC will see that of all the id’s EMP_ID 1020 is not present in target. So it’ll fire an insert statement and will insert the records.

Hence after execution result is: –

Final Target: –

To understand how does records get inserted and how their processing happen, visit my blog

https://blogs.sap.com/2017/06/01/table-comparison-row-by-row-select-processing-of-records/

This option slower down the performance as additional memory is required to keep a track of duplicate records.

Hope it help!

Please let me know if I have missed anything or anything needs to added/delete.

Bods Table Comparison Generated Key Columns

Thanks:)