A client contacted me today asking how to merge files with different column headers in Power Query. The issue she's facing is that some of the files in her folder have a column called 'customer', where others have a column called 'ship to/customer'. Plainly there has been a specification change somewhere down the line, but it's causing issues in the combination - an issue that would affect either Excel or Power BI.
Combine Multiple Worksheets of a Workbook using Power Query in Excel In this blog post, I am explaining the Power Query Method to combine data from multiple worksheets. If you are a person who deals with data in Multiple Worksheets, You should definitely try this one.
- Open the Get Data pull down menu. Select From File. Select From Folder. Browse the folder path where the data files reside (or copy the path there directly) Click OK. Click “ Transform Data ” (or “Edit” in older versions of Power Query).
- In this blog post, we’ll show you how, with the newest version of the Power BI Desktop and Power Query for Excel, you can combine multiple data from Excel files into one big tall table. In this specific case, we have an Office 365 group that we’ve created where we store some external sales data that do not come from our system.
- The solution I have is the create multiple files using Office 365 that the business update themselves. Then combine the data using Power Query. The Problem: I am struggling to get the updated data from the source sheets to update into the master sheet via Refresh. One problem, is if any of the source files are Open, the query will not run.
In order to replicate this issue, I created two very simple CSV files as shown here:
I dropped these into a folder called 'Test' and then
- Created a new query From File --> From Folder
- Renamed the query to FilesList (making a query that I can use to easily sort/filter the list of files later)
- Right clicked the query in the Queries pane --> Reference
- Renamed this query to Transactions
- Clicked the Combine Binaries button
At this point I was presented with the following window:
The only thing I really want to point out here is that I choose the Example file which has the column name that I do NOT want. (I want to rename 'ship to/customer', so it's important that it show up here.)
I then clicked OK, and was presented with this:
Err.. wait… what happened to my customer column?
To understand this, we need to look at the steps in the Transaction query:
If you were to click on the 'Invoked Custom Function1' step, you'd see that it adds a new column to the Transaction query. The first table shows 3 columns where the first column is 'ship to/customer'. The second table also shows 3 columns, but in this the first is 'customer'. So all is working so far.
But then, when you get to the the 'Expanded Table Column1' step of the Transactions query, it expands to show only the 'ship to/customer' column. Why? It's because of the following M code generated by Power Query:
= Table.ExpandTableColumn(#'Removed Other Columns1', 'Transform File from Transactions', Table.ColumnNames(#'Transform File from Transactions'(#'Sample File')))
What this means in English is that it reads the columns from the table in the first sample file. That's not super helpful.
Now we could work on trying to enumerate all headers, but that would be a pain, as the code is complicated and still leaves us in a place where we would need to combine both columns anyway. Let's fix this by dealing with it at the source.
Step 1: Prepare the Transactions query:
Delete the Changed Type step at the end of Transactions query. This is because it is setting the 'ship to/customer' column to text, and by the time we're done, that column will be called 'customer'. If we leave the step as is, it will cause an error.
Step 2: Modify the Transform Sample query:
Next we need to select the Transform Sample query:
Now, what we want to do is rename that 'ship to/customer' column to make it 'customer'. So let's do that:
- Right click 'ship to/customer' --> Rename --> 'customer'
The problem here though, is that when we apply this to our other files, THIS will cause an error. Why? They don't have a 'ship to/customer' column to rename. So we need to wrap this in an error handler.
To do this, we need to adjust the formula that was just created to wrap it in a 'try/otherwise' clause. This is essentially equivalent to Excel's IFERROR() formula. If it works, it will return the result. If not, it returns an alternate item, which we will set to be the previous step in the query. In other words 'Try to rename this column. But if it fails, give me the original table.
The keys here are to
- Insert the try and otherwise in the correct location (remember they are case sensitive)
- Get the right syntax for the previous step name (remember to wrap it in #' ' if the step name has a space in it.)
In this case, it should look like this:
Step 3: Revel in your success:
You got it. At this point, returning to your Transactions query should leave you pretty pleased, as we've plainly been able to successfully merge files with different column headers into the table that we actually want:
The only thing left to do is set the data types, and we're done.-->
With Power Query, you can combine multiple files that have the same schema into a single logical table.
This feature is useful when you want to combine all the files you have in the same folder. For example, if you have a folder that contains monthly files with all the purchase orders for your company, you can combine these files to consolidate the orders into a single view.
Files can come from a variety of sources, such as (but not limited to):
- Local folders
- SharePoint sites
- Azure Blob storage
- Azure Data Lake Storage (Gen1 and Gen2)
Excel Power Query Multiple Files
When working with these sources, you'll notice that they share the same table schema, commonly referred to as the file system view. The following screenshot shows an example of the file system view.
In the file system view, the Content column contains the binary representation of each file.
You can filter the list of files in the file system view by using any of the available fields. It's good practice to filter this view to show only the files you need to combine, for example by filtering fields such as Extension or Folder Path. More information: Folder
Selecting any of the [Binary] values in the Content column automatically creates a series of navigation steps to that specific file. Power Query will try to interpret the binary by using one of the available connectors, such as Text/CSV, Excel, JSON, or XML.
Combining files takes place in the following stages:
When you connect to a data source by using any of the previously mentioned connectors, a table preview opens. If you're certain that you want to combine all the files in the folder, select Combine in the lower-right corner of the screen.
Alternatively, you can select Transform data to access the Power Query Editor and create a subset of the list of files (for example, by using filters on the folder path column to only include files from a specific subfolder). Then combine files by selecting the column that contains the binaries in the Content column and then selecting either:
The Combine files command in the Combine group on the Home tab.
The Combine files icon in the column header of the column that contains [Binary] values.
Combine files dialog box
Combine Multiple Excel Files Into One Using Power Query
After you select the Combine or Combine files command, the Combine files dialog box opens and the following occurs:
- Power Query analyzes the example file (by default, the first file in the list) and determines the correct file connector to use to open that file.
- The dialog box provides the file connector experience exactly as if you were to connect directly to that example file.
- If you want to use a different file for the example file, you can choose it from the Example file drop-down menu.
- Optional: You can select Skip files with errors to exclude from the final output any files that result in errors.
In the following image, Power Query has detected that the first file has a .csv file name extension, so it uses the Text/CSV connector to interpret the file.
Combined files output
After the Combine files process is finished, Power Query automatically performs the following actions:
Creates an example query that performs all the required extraction steps for a single file. It uses the file that was selected as the example file in the Combine files dialog box.
This example query has the name Transform Sample file in the Queries pane.
Creates a function query that parameterizes the file/binary input to the example query. The example query and the function query are linked, so that changes to the example query are reflected in the function query.
These queries are listed in the Helper queries group.
Applies the function query to the original query with input binaries (for example, the folder query) so it applies the function query for binary inputs on each row, and then expands the resulting data extraction as top-level columns.
Creates a new group with the prefix Transform file from and the initial query as the suffix, and organizes all the components used to create these combined files in that group.
You can easily combine all files within a given folder, as long as they have the same file type and structure (including the same columns). You can also apply additional transformation or extraction steps by modifying the automatically generated example query, without having to worry about modifying or creating additional function query steps.
You can modify the steps inside the example query to change the function applied to each binary in your query. The example query is linked to the function, so any changes made to the example query will be reflected in the function query.
If any of the changes affect column names or column data types, be sure to check the last step of your output query. Adding a Change column type step can introduce a step-level error that prevents you from visualizing your table. More information: Dealing with errors