September 6, 2016 at 10:06 pm #5886
Hello. I’m dealing with a very large file in power query and I would like to identity the two rows that are duplicates and delete one of the rows (based on results). The reason that I want to do this is that because it has a duplicate it won’t let me use it as the lookup column. I’m trying to do “keep duplicates” but it is taking FOREVER (the file is 400mb). I thought power query could handle large data sets like this.
Also, if I’m doing a one to one lookup, does it matter if the non-lookup column has more instances (that is, occurrences that don’t exist in the lookup table?) Thanks.September 6, 2016 at 10:31 pm #5887
A lot depends on what you mean by “does it matter”. If you mean, “will I get an error?” the answer is no, but in a pivot you will get an extra row added to the lookup table to handle the broken relationship, and depending on how important record counts are to your model, this may or may not matter.
One way to more quickly identify the offending row is to remove all columns except your candidate column for the unique row identifier, and then add a grouping column with that contains only one value, and then do a count for the values in the candidate column and filter out columns that only have a count of 1. Once you know the offending value(s), you will probably be more able to remove the duplicates with a specific task than a generic “remove duplicates”.September 6, 2016 at 10:40 pm #5888
Hi Tom, thanks for the quick response. Let me be a little more specific. I have a table that is the lookup table and it has 4 million rows AND 1 duplicate that I cannot figure out how to get rid of. The other table that I want to link has about 3 million rows and I’m not sure if there are any duplicates (and not sure how to check this in powerpivot). I’m also not sure if all of the rows in the non-lookup table column have a match in the lookup table (but the great majority should…and if there is a way to check that, that would be useful). Anyways, when I try to create the connection it says: “The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values”.
I didn’t quite understand your second paragraph. I think I tried doing that in power query (AND IT TOOK FOREVER because my file is so big, probably took 10 minutes) but it didn’t show any duplicates. However, in powerpivot, I do a count and then a distinctcount and the numbers are off by 1. Thanks.September 6, 2016 at 11:09 pm #5890
Once a relationship exists between a fact table and its lookup table, this is a way that I have used to check for fact table columns without a matching value in the lookup table: In the fact table, create a temporary calculated column with a formula like
= RELATED ( LookupTableNameGoesHere[LookupTableUniqueRowIdentifierColumnNameGoesHere] )
then click on the filter drop down button in the column header, uncheck Select All and then scroll down to the bottom of the list of values to re-check for (Blanks). If there is not a (Blanks) to check off, then all fact table rows have a matching value in the lookup table. If you are able to check off a listing for (Blanks), click the OK button and you will see the rows without a matching value in the lookup table.
Once finished, remove the temporary calculated column.September 6, 2016 at 11:13 pm #5891
Ok, great thanks. But the main problem is that I can’t create that initial connection 🙁September 7, 2016 at 12:44 am #5894
You could do something similar with LOOKUPVALUE, which does not require a relationship.September 7, 2016 at 6:23 pm #5904
Ok, thank you.
You must be logged in to reply to this topic.