A Very Few Useful Programming and Data Tips.
Translating Financial Statement Variables Into COMPUSTAT Data Items |
|
Financial Statement Variable | COMPUSTAT Data Number |
Long Term Debt | DATA9 |
Property Plant and Equipment (PPE): Wrong Sign! | DATA30 |
Current Portion of Long Term Debt | DATA44 |
Sale of Property Plant and Equipment. (Does not appear to match values in the 10-K. If you can figure out how to calculate the sale of assets from the available COMPUSTAT data items please let me know.) | DATA107 |
Sales (Total Revenue) | DATA117 |
Net Income Before Taxes | DATA122 |
Depreciation and Amortization | DATA125 |
Cost of Goods Sold minus Depreciation and Amortization | DATA131 |
Selling, General and Administrative Expenses (SG&A) | DATA132 |
Provision for Income Taxes | DATA135 |
Notes Payable | DATA206 |
Standard and Poors (S&P) Bond Rating: Translated from their letter code to a numerical value. See the WRDS online documentation to translate back. | DATA280 |
Change in Accounts Receivable | DATA302 |
Change in Inventories | DATA303 |
Change in Accounts Payable | DATA304 |
Cash Interest Paid | DATA315 |
Cash Taxes Paid | DATA317 |
R Programming Package Hints, Tips and Manual Clarifications |
|
Flexclust Package |
|
Flexclust is an R package for creating clusters of similar objects based on their characteristics. It has the ability to call a user supplied distance function and use it to calculate the “distance” between objects in a cluster and the cluster center. I put distance in quotes since it does not need to be a physical distance. It could be, for example, the difference in per capita GDP between a country and the rest of the countries in a cluster.You call the distance function through a command that looks like:
kmFam <- kccaFamily(dist=mydist, cent=colMeans, groupFun = “minSumClusters”).In this example the distance function is mydist. It is not clear from the manual, but this function needs to take as inputs a matrix of values x and matrix of centers c. The x matrix should contain one row per observation. Each row of the centers matrix c is a row vector of centers from candidate clusters. The user supplied distance function should take the matrices x and c and then calculate a distance from each candidate center for each observation in x and then return the result. For example, if x is a nxk matrix and centers matrix is 2xk then the returned matrix has to be nx2. Since flexclust determines how many rows will be passed in the centers matrix the distance function must be able to handle an arbitrary number of them. Each row in the centers matrix. |
|
Matching Data Across Data Frames: Find the First Date in One After Dates in a Column of Another |
|
#You have two data frames. You need to find the largest value of some column #in the second data frame less than or equal to each value in the first data frame #and do this ID by ID of some sort. For example, the last rating date prior to a #trade date bond Cusip by bond Cusip. You then want to record some information from the #second data frame. In the example, the bond’s rating. After the code is finished #the resulting data frame will NOT be sorted. If you needed it back in original #sort order you need to add a line to do that. #You need the dplyr library for this to work. In this sample code uses an inner_join #so data is only kept if it is in both data frames. library(dplyr) #Some sample data, say bond trades. #More sample data, say bond ratings. #Create the new data with the columns containing the value in table 2 from the column rate in r that corresponds with the #Turn the output back into a data frame and sort by cusip and td. |