CREATE EXTERNAL TABLE IF NOT EXISTS Table1 ( UID BIGINT, ITEMS_PURCHASED ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> )
And this is the data in the above table-
This is the second table in Hive- It also contains information about the item we are purchasing.
CREATE EXTERNAL TABLE IF NOT EXISTS Table2 ( ITEM_ID BIGINT, CREATED_TIME STRING, BUYER_ID BIGINT )
And this is the data in the above second table
220003038067 2012-06-21 1015826235 300003861266 2012-06-21 1015826235 140002997245 2012-06-14 1015826235 200002448035 2012-06-08 1015826235 260003553381 2012-06-07 1015826235
**We need to compare the above two tables basis on UID( and BUYER_ID). As UID in one table (Table1) and BUYER_ID in second table (Table2), they both are same thing. So I need to see if UID and BUYER_ID gets matched, then ITEMS_PURCHASED in Table1 table should be same as ITEM_ID and CREATED_TIME in Table2 table and if they (means ITEMS_PURCHASED and ITEM_ID, CREATED_TIME) are not same, I need to do something,So Basically I need to generate a report if they gets matched or not matched, means data accuracy report, like this much percentage data is accurate and this much percentage it is not, kind of statistical analysis**
So just to make it more clear-
**ITEMS_PURCHASED is an array of Struct in Table1 table and it contains two things PRODUCT_ID and TIMESTAMPS.
And if UID and BUYER_ID gets matched then PRODUCT_ID in Table1 should be matched with ITEM_ID in Table2 and TIMESTAMPS in Table1 should be matched with CREATED_TIME in Table2.**
And one more thing these tables have millions of data in them. I have reduced it to only one record to simplify the problem so how I can do this problem efficiently.
I think I need to write some MapReduce job for this. And this is the first time I am working with Hive, Hadoop and Map Reduce. So that is the reason I am facing a lot of problem.
I was thinking two solutions-
1) check on millions of data by comparing user id's and buyer_id 2) or sample some UID and buyer_id then compare the data. 3) Any other approach?
Any suggestions will be appreciated
This post has been edited by macosxnerd101: 01 July 2012 - 12:02 PM
Reason for edit:: Renamed title to be more descriptive