{"id":14,"date":"2018-12-30T20:47:12","date_gmt":"2018-12-30T20:47:12","guid":{"rendered":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/?page_id=14"},"modified":"2019-11-04T22:13:30","modified_gmt":"2019-11-04T22:13:30","slug":"programming-and-data","status":"publish","type":"page","link":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/programming-and-data\/","title":{"rendered":"Programming and Data"},"content":{"rendered":"<div class=\"wpb-content-wrapper\"><p>[vc_row][vc_column][vc_column_text]<\/p>\n<h2 style=\"text-align: left\">A Very Few Useful Programming and Data Tips.<\/h2>\n<p>&nbsp;<\/p>\n<table border=\"0\" width=\"90%\" cellspacing=\"0\" cellpadding=\"2\" align=\"center\">\n<colgroup>\n<col width=\"50%\" \/>\n<col width=\"50%\" \/> <\/colgroup>\n<tbody>\n<tr>\n<td colspan=\"2\">\n<h3>Translating Financial Statement Variables Into COMPUSTAT Data Items<\/h3>\n<\/td>\n<\/tr>\n<tr>\n<td>Financial Statement Variable<\/td>\n<td>COMPUSTAT Data Number<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Long Term Debt<\/td>\n<td>DATA9<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Property Plant and Equipment (PPE): Wrong Sign!<\/td>\n<td>DATA30<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Current Portion of Long Term Debt<\/td>\n<td>DATA44<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Sale of Property Plant and Equipment. (Does not appear to match values in the 10-K. If you can figure out how to calculate the sale of assets from the available COMPUSTAT data items please let me know.)<\/td>\n<td>DATA107<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Sales (Total Revenue)<\/td>\n<td>DATA117<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Net Income Before Taxes<\/td>\n<td>DATA122<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Depreciation <b>and<\/b> Amortization<\/td>\n<td>DATA125<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Cost of Goods Sold <b>minus<\/b> Depreciation and Amortization<\/td>\n<td>DATA131<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Selling, General and Administrative Expenses (SG&amp;A)<\/td>\n<td>DATA132<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Provision for Income Taxes<\/td>\n<td>DATA135<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Notes Payable<\/td>\n<td>DATA206<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Standard and Poors (S&amp;P) Bond Rating: Translated from their letter code to a numerical value. See the WRDS online documentation to translate back.<\/td>\n<td>DATA280<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Change in Accounts Receivable<\/td>\n<td>DATA302<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Change in Inventories<\/td>\n<td>DATA303<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Change in Accounts Payable<\/td>\n<td>DATA304<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\" bgcolor=\"#D3D3D3\">\n<td>Cash Interest Paid<\/td>\n<td>DATA315<\/td>\n<\/tr>\n<tr align=\"left\" valign=\"middle\">\n<td>Cash Taxes Paid<\/td>\n<td>DATA317<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<hr \/>\n<table style=\"width: 90%\" border=\"0\" width=\"90%\" cellspacing=\"0\" cellpadding=\"2\" align=\"center\">\n<colgroup>\n<col width=\"50%\" \/>\n<col width=\"50%\" \/> <\/colgroup>\n<tbody>\n<tr>\n<td style=\"width: 7.24638%\" colspan=\"2\">\n<h3>R Programming Package Hints, Tips and Manual Clarifications<\/h3>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 7.24638%\" colspan=\"2\" align=\"left\">\n<h4>Flexclust Package<\/h4>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 7.24638%\" colspan=\"2\" align=\"left\">Flexclust is an R package for creating clusters of similar objects based on their characteristics. It has the ability to call a user supplied distance function and use it to calculate the &#8220;distance&#8221; between objects in a cluster and the cluster center. I put distance in quotes since it does not need to be a physical distance. It could be, for example, the difference in per capita GDP between a country and the rest of the countries in a cluster.You call the distance function through a command that looks like:<\/p>\n<p>kmFam &lt;- kccaFamily(dist=mydist, cent=colMeans, groupFun = &#8220;minSumClusters&#8221;).In this example the distance function is mydist. It is not clear from the manual, but this function needs to take as inputs a matrix of values <i>x<\/i> and matrix of centers <i>c<\/i>. The <i>x<\/i> matrix should contain one row per observation. Each row of the centers matrix <i>c<\/i> is a row vector of centers from candidate clusters. The user supplied distance function should take the matrices <i>x<\/i> and <i>c<\/i> and then calculate a distance from each candidate center for each observation in <i>x<\/i> and then return the result. For example, if <i>x<\/i> is a nxk matrix and centers matrix is 2xk then the returned matrix has to be nx2. Since flexclust determines how many rows will be passed in the centers matrix the distance function must be able to handle an arbitrary number of them. Each row in the centers matrix.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 7.24638%\" colspan=\"2\">\n<h4>Matching Data Across Data Frames: Find the First Date in One After Dates in a Column of Another<\/h4>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 7.24638%\" colspan=\"2\"><span style=\"color: #008000\">#You have two data frames. You need to find the largest value of some column<\/span><br \/>\n<span style=\"color: #008000\">#in the second data frame less than or equal to each value in the first data frame<\/span><br \/>\n<span style=\"color: #008000\">#and do this ID by ID of some sort. For example, the last rating date prior to a<\/span><br \/>\n<span style=\"color: #008000\">#trade date bond Cusip by bond Cusip. You then want to record some information from the<\/span><br \/>\n<span style=\"color: #008000\">#second data frame. In the example, the bond&#8217;s rating. After the code is finished<\/span><br \/>\n<span style=\"color: #008000\">#the resulting data frame will NOT be sorted. If you needed it back in original<\/span><br \/>\n<span style=\"color: #008000\">#sort order you need to add a line to do that.<\/span><br \/>\n<span style=\"color: #008000\">#You need the dplyr library for this to work. In this sample code uses an inner_join<\/span><br \/>\n<span style=\"color: #008000\">#so data is only kept if it is in both data frames.<\/span><\/p>\n<p>library(dplyr)<\/p>\n<p><span style=\"color: #008000\">#Some sample data, say bond trades.<\/span><br \/>\nb &lt;- data.frame(cusip=c(rep(&#8220;A&#8221;,30),rep(&#8220;B&#8221;,10),rep(&#8220;C&#8221;,5)),td=c(62:91,173:182,266,268,272,279,281),stringsAsFactors=F)<\/p>\n<p><span style=\"color: #008000\">#More sample data, say bond ratings.<\/span><br \/>\nr &lt;- data.frame(cusip=c(rep(&#8220;A&#8221;,10),rep(&#8220;B&#8221;,5),rep(&#8220;C&#8221;,7)), rd=5*c(12:21,33:37,52:58), rate=c(101:122), stringsAsFactors=F)<\/p>\n<p><span style=\"color: #008000\">#Create the new data with the columns containing the value in table 2 from the column rate in r that corresponds with the<\/span><br \/>\n<span style=\"color: #008000\">#largest value in r from the column rd that is less than or equal to td for each cusip in b.<\/span><br \/>\nvv &lt;- inner_join(b,r) %&gt;% filter(rd&lt;=td) %&gt;% group_by(cusip,td) %&gt;% filter(rd==max(rd))<\/p>\n<p><span style=\"color: #008000\">#Turn the output back into a data frame and sort by cusip and td.<\/span><br \/>\nvv &lt;- data.frame(vv)<br \/>\nvv &lt;- vv[with(vv,order(cusip,td)),]<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>[\/vc_column_text][\/vc_column][\/vc_row]<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>[vc_row][vc_column][vc_column_text] A Very Few Useful Programming and Data Tips. &nbsp; Translating Financial Statement Variables Into COMPUSTAT Data Items Financial Statement Variable COMPUSTAT Data Number Long Term Debt DATA9 Property Plant and Equipment (PPE): Wrong Sign! DATA30 Current Portion of Long Term Debt DATA44 Sale of Property Plant and Equipment. (Does not appear to match values [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":0,"menu_order":4,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"class_list":{"0":"post-14","1":"page","2":"type-page","3":"status-publish","5":"entry"},"_links":{"self":[{"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/pages\/14","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/comments?post=14"}],"version-history":[{"count":5,"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/pages\/14\/revisions"}],"predecessor-version":[{"id":130,"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/pages\/14\/revisions\/130"}],"wp:attachment":[{"href":"https:\/\/faculty.som.yale.edu\/matthewspiegel\/wp-json\/wp\/v2\/media?parent=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}