(v2011.3.3) Notes and data layout for: lwc_72_107_sic_20110303.dta lwc_89_107_naics_20110303.dta These files record, for each SIC (or NAICS) industry, the share of U.S. manufacturing imports that originate in countries that maintain less than 5% of U.S. per capita GDP between 1972 and 2005 (variable "lwc5"). They also record the share of U.S. manufacturing imports in each industry that originate in China (variable "chn"). Some things to keep in mind: 1. The data for sic extend from 1972 to 2007. They are constructed using the TSUSA-SIC and HS-SIC mappings noted in Feenstra, Romalis and Schott (NBER 9387, 2002) and Pierce and Schott (NBER 15548, 2009), respectively. As described in the former, some SIC codes may not be available. As discussed in both, the SIC codes are according to the 1987 revision. 2. There are three issues related to the 1972 to 2007 SIC series. The first is that product categories used in the trade data change in 1989 between TSUSA and HS. The second is that the country codes used to record imports' origin changed from relatively broad six-digit UN country codes to relatively narrow four-digit US country codes. Both of these switches can affect how well lwc5 and chn fit together between 1988 and 1989. The third is that the procedure for mapping trade codes to SIC changed; for more on that see the discussions in the two papers cited in bullet 1 above. To see how these issues affect the data series, you can download the lwc_graphs_20110303.zip file located near this document and browse through the graphs of each variable for each industry (you can also check out how well the import and exports series lines up). Note that the series I report below splice the data together in in 1989. Note also that the NAICS series is only avaiable after 1989. 3. The data on countries per capita GDP are from the WB's WDI database. 4. The definitions of each variable are as follows. Note that if a variable is followed by the extension _7294 (_89107) it records data from the TSUSA (HS) eras, referened above, respectively. If a variable does not contain such an extension, it contains TSUSA data until 1988 and HS data thereafter. m5: level of US imports from low-wage countries in millions of nominal USD m: total US imporrts in millions of nominal USD mCHN: level of US imports from China in millions of nominal USD x: level of US exports from low-wage countries in millions of nominal USD vship: total value of U.S. shipments in nominal USD from Bertelsman et al., taken from the nber website (may 2009 revision). lwc5: m5/m chn: mCHN/m pen5: m5/(vship+m-x) penCHN: mCHN/(vship+m-x) 5. The variables in each dataset are as follows: 6. The data here when combined with production data from Bartelsman et al. available on the NBER website can be used to compute import penetration by low-wage countries and China. 7. I would like to thank Chris Kurtz at the Board of Governors of the U.S. Fed as well as Justin Pierce at the U.S. Census Bureau for comments and feedback that helped me put these data series together.