July 26, 2019 at 4:24 pm #18096
I have what may be a bit of a tall order issue, and not sure whether it goes beyond the scope of this thread, but thought I would give it a shot.
I’m currently working that includes respondent ID (of which there are 972), Education Level, Age Group, Region, Race, and Gender.
I am looking for a way to bundle each respondent in either “Study 1” or “Study 2”, however, the part I’m having some difficulty with is that this grouping rule also needs to take stratification into account.
So for example, if I create that “Study 1” and “Study 2” grouping, and I filter only to White males from the Midwest who are age 4 to 6 (let’s say there are a total of 20 that show up using those filter criteria), the sample needs to be split evenly (or evenly-ish). So half of those respondents would need to be in Study 1, and the res in Study 2 (with no overlap, e.g., participant 1 can’t show up in both studies).
This stratification rule needs to hold true if I use different combinations of the other filters (so let’s say of those 972 respondents there are 13 Hispanic females who are from the South that have an education level of 4 and are age 7 to 9), I would need to split up that sample so that 7 of those respondents are in Study 1 and the remaining 6 are in Study 2.
I’m not sure if this is the most easily done with Power Query/Pivot, but I’ve attached a sample excel spreadsheet to (hopefully) help better visualize the set-up of the data
Again, I’m not sure if this is outside of the scope of this forum, but thought I’d check in with some experts.
DarkoJuly 26, 2019 at 7:15 pm #18098
Definitely a can-do and relatively easy.
See the attached workbook for one example.
Make your selections, choose Refresh All, and Study Group 1 and Study Group 2 are generated for you. Also includes an example of a single table “Study Groups 1 and 2” where records are differentiated by a Study Group column.July 29, 2019 at 2:20 pm #18103
This worked very well tomallan.
Much appreciated!July 29, 2019 at 2:58 pm #18104
I know I’m probably asking too much, but when I select all groups (in the X to select tables), the query will assign Study 1 or Study 2 to participants, but it won’t stratify the groups based on demographics. I can only get this when I select each group individually, is that correct?
In other words, when I place an X next to all variables in the blue “select” tables, then copy the resulting “Study Groups 1 and 2 Combined into One Table” into another workbook/sheet and apply filters on THAT to confirm participants are divided equally or as equally as possible among groups, it won’t evenly distribute participants into Study 1 and 2. I would need to select Gender = 1, Race = White, Region = Midwest, EL = 1, and Age = 7 to 9 (while ensuring everything else remains deselected) and repeat the process for the other variables to get an equal divide of Study 1 and 2 groupings yes?
Just wanted to confirm that that’s how this query worksJuly 29, 2019 at 4:31 pm #18105
The logic is that, upon Refresh All, it filters participants on all selected criteria and then separates them into two equal-ish groups (by participant count).
For this example, if “Gender = 1, Race = White, Region = Midwest, EL = 1, and Age = 7 to 9” is one variable (and not five), you would also need to repeat the process for other variables.
If the above paragraph contains a single variable, does each variable only contain 1 criterion from each of the five groups (gender, race, region, EL, and age)?July 29, 2019 at 5:33 pm #18106
Yes, it does 🙂
So by selecting individual variables, I get an equal-ish distribution of participants between both Study 1 and Study 2.
I was just wondering whether I could select all variables and have that even(ish) logic of distribution apply.July 29, 2019 at 9:28 pm #18109
The answer to you “wondering whether” is most likely a yes.
What would be the maximum number of variables that you would use to filter a data set?July 30, 2019 at 4:57 pm #18110
So I’m looking to filter on four variables (Gender, Region, Race, Age Group).
I’ve attached a workbook with the power query solution you provided, and it gets me most of the way there.
That is, it catogorizes perfectly when you select each individual category within each variable (on the “Data” worksheet”).
However, when everything is selected, that categorization gets lost.
The “ManualFilter” sheet is just a straight copy of the table in the “Data” worksheet. The blue tables show that I’ve selected all categories in each variable, copied the table from “Data” into a separate worksheet (ManualFilter) and applied manual filters to confirm that the categorization worked.
You can see that when I select Gender as “1”, Race as “White”, Region as “Midwest” and Age “10 to 12”, it doesn’t split that group up evenly anymore.
So ideally, I’d like to batch categorize, so that I wouldn’t need to select each individual category for each variable, but still holding true to that stratification rule.
You must be logged in to reply to this topic.