Home Forums Power Pivot Dealing with very large files

This topic contains 6 replies, has 2 voices, and was last updated by  amartino 2 years ago.

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #5886

    amartino
    Participant
    • Started: 15
    • Replies: 9
    • Total: 24

    Hello. I’m dealing with a very large file in power query and I would like to identity the two rows that are duplicates and delete one of the rows (based on results). The reason that I want to do this is that because it has a duplicate it won’t let me use it as the lookup column. I’m trying to do “keep duplicates” but it is taking FOREVER (the file is 400mb). I thought power query could handle large data sets like this.

    Also, if I’m doing a one to one lookup, does it matter if the non-lookup column has more instances (that is, occurrences that don’t exist in the lookup table?) Thanks.

    #5887

    tomallan
    Keymaster
    • Started: 7
    • Replies: 2461
    • Total: 2468

    A lot depends on what you mean by “does it matter”. If you mean, “will I get an error?” the answer is no, but in a pivot you will get an extra row added to the lookup table to handle the broken relationship, and depending on how important record counts are to your model, this may or may not matter.

    One way to more quickly identify the offending row is to remove all columns except your candidate column for the unique row identifier, and then add a grouping column with that contains only one value, and then do a count for the values in the candidate column and filter out columns that only have a count of 1. Once you know the offending value(s), you will probably be more able to remove the duplicates with a specific task than a generic “remove duplicates”.

    #5888

    amartino
    Participant
    • Started: 15
    • Replies: 9
    • Total: 24

    Hi Tom, thanks for the quick response. Let me be a little more specific. I have a table that is the lookup table and it has 4 million rows AND 1 duplicate that I cannot figure out how to get rid of. The other table that I want to link has about 3 million rows and I’m not sure if there are any duplicates (and not sure how to check this in powerpivot). I’m also not sure if all of the rows in the non-lookup table column have a match in the lookup table (but the great majority should…and if there is a way to check that, that would be useful). Anyways, when I try to create the connection it says: “The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values”.

    I didn’t quite understand your second paragraph. I think I tried doing that in power query (AND IT TOOK FOREVER because my file is so big, probably took 10 minutes) but it didn’t show any duplicates. However, in powerpivot, I do a count and then a distinctcount and the numbers are off by 1. Thanks.

    #5890

    tomallan
    Keymaster
    • Started: 7
    • Replies: 2461
    • Total: 2468

    Once a relationship exists between a fact table and its lookup table, this is a way that I have used to check for fact table columns without a matching value in the lookup table: In the fact table, create a temporary calculated column with a formula like

    = RELATED ( LookupTableNameGoesHere[LookupTableUniqueRowIdentifierColumnNameGoesHere] )

    then click on the filter drop down button in the column header, uncheck Select All and then scroll down to the bottom of the list of values to re-check for (Blanks). If there is not a (Blanks) to check off, then all fact table rows have a matching value in the lookup table. If you are able to check off a listing for (Blanks), click the OK button and you will see the rows without a matching value in the lookup table.

    Once finished, remove the temporary calculated column.

    #5891

    amartino
    Participant
    • Started: 15
    • Replies: 9
    • Total: 24

    Ok, great thanks. But the main problem is that I can’t create that initial connection 🙁

    #5894

    tomallan
    Keymaster
    • Started: 7
    • Replies: 2461
    • Total: 2468

    You could do something similar with LOOKUPVALUE, which does not require a relationship.

    #5904

    amartino
    Participant
    • Started: 15
    • Replies: 9
    • Total: 24

    Ok, thank you.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.