Merge Two Datasets In Stata

I have tried Data - Combine Datasets - Merge Two Datasets, but this seems to only work to merge two. I was hoping if I saved these two as a new merged file, I could keep doing another 4 times to capture all of the data but it is telling me that 'variable merge already defined' and will not let me add new datasets to my combined one. Learn how to download, import, and merge multiple datasets from the NHANES website using Stata. Copyright 2011-2019 StataCorp LLC. All rights reserved. Combining Datasets in Stata Thomas Elliott January 31, 2013 Often, you will nd yourself with two or more datasets, or data les, that you wish to combine into one data le. Stata provides a couple ways to combine datasets. 1 Appending Data Appending data means you have two les of the same data, just with di erent cases. Merge 1:1 caseid using name-of-second-dataset Here, 'name-of-second-dataset' (called the 'using dataset' by the Stata people) is merged to the data in memory (called the 'master dataset'), assuming that each value of variable 'caseid' is present only once in each of the data sets. Use Stata/MP or Stata/SE.If you do not have Stata/MP or Stata/SE, please continue with this FAQ. When the number of variables in a dataset to be analyzed with Stata is larger than 2,047 (likely with large surveys), the dataset is divided into several segments, each saved as a Stata dataset (.dta file).

    • Apr 2016
    • 2

    Merging more than two datasets in Stata

    Hello: I am a beginner in Stata, and currently working with NHANES data in Stata, my question is can you combine more than two data sets on stata? I tried to use the merge command and the 'combine data' tab but seems to merge only two data sets.
    any help would be appreciated.
    • Feb 2016
    • 99
    Hi Hadeel,
    The 'merge' command should be used when the two databases have the same variables. If this is the case of its three or more databases, you can use the merge more than once. In the first you group databases 'A' and 'B' generating a database 'C'. Then you group data base 'C' to the third data base (D), generating a new database, and so on.
    If, your databases do not have the same variables you should use the 'appending' command, following the same reasoning above to unite their three or more databases.
    kind regards
    Girlan Oliveira


    • Feb 2016
    • 99
    Let me make a small correction, the 'appending' command is that is used when the databases have the same variables and not the 'merge' command.


    • Apr 2016
    • 2
    Hi Girlan:
    Thank you for your response. Yes I think the merge is the command that you use when you have different variables as you merge on a certain 'variable'. Still I can't seem to merge more than two datasets.
    In the NHANES tutorial the command is simply 'merge varlist using filename [, options]' and this would merge multiple datasets however whenever I enter I get an error message that this is an old syntax.


    • Apr 2014
    • 20458
    It is old syntax. Perhaps the NHANES tutorial goes back some years? I'm not familiar with it.
    Anyway, before you do any merges you need to know what the merge key variable(s) is(are), and whether they uniquely identify observations in the data in memory, and also in the using data set. So if the key variables (varlist) uniquely identify the observations in both data sets it's
    If, say the varlist variables uniquely identify the observations in the data in memory, but not in the using data set, then it's:
    Similarly, if varlist uniquely identifies observations in the using data set, but not the data in memory, it's
    If the variables in varlist don't uniquely identify the observations in either data set, then you probably shouldn't be using -merge- at all. It is greater than 99.99999% likely, in that case, that there is an error in the data, or you are misunderstanding what you are trying to do with the data sets, or you should be using some other command. There is such a thing as -merge m:m- but it is almost never the correct thing to do.
    Have a look at the manual section on -merge-. There are a lot of options that have been added since the 1:1/1:m/m:1 syntax was added, and they can be very useful--some of them might be helpful to you, too.


    • Apr 2019
    • 1
    Hello, I am also trying to merge 3>
    • Apr 2014
    • 20458
    Nobody can possibly help you without example data from the three data sets. Use the -dataex- command to do this. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
    -merge m:m- just produces data salad. Don't use it. If it appears to be the only possibility for the -merge- it means either that your data sets are not -merge-able or you don't understand the structure of your data and are overlooking the right key for using -merge 1:m- or -merge m:1-.
    It is quite difficult for me to imagine how -joinby- could result in no observations. So in addition to showing example data, please show the code you tried.


Here's what you must know about the two datasets you are about to merge.

  • What is the identifier variable on which the files should be combined?
  • Is each observation (row) of the identifier variable unique? In other words, does each row value for the identifier variable occur only once? The answer to this question matters for how you would merge the two datasets, as you will see.

Let's evaluate the two items above in turn.

  • Since we wish to combine data on a person's age and data on a person's sex, the identifier variable is person.
  • In Dataset 1, each person appears only once, so person uniquely identifies each person in the dataset. Likewise for Dataset 2. This means that we should perform a one-to-one merge of the two datasets based on person.

Merge Two Files In Stata

Before merging, it is good practice to verify whether or not your identifier variable/s is/are unique across observations with duplicates report. Here you would type duplicates report person.

Here's what we want to do:

Merge two datasets in stata

Merging Datasets In Stata

Merge multiple datasets in stata
Dataset 1
Dataset 2
Merged data