Thursday, July 26, 2012

Space and Tab: How to Process Data into Excel

See previous related post: Processing List of Numbers into Excel

For anyone who has done analysis in Microsoft Excel, it should obvious that being able to successfully process raw data, whether from websites or text files, into the spreadsheet is the first crucial step. Each discrete entry should be in its own cell. Unfortunately, copying and pasting will not always work. Often times upon pasting, everything goes into the one currently highlighted cell. Often times, spacing and tabbing issue cause improper processing of data.

We'll work with these samples of data: historical S&P 500 index prices from Google Finance, and stats on Zack Greinke, a Milwaukee Brewers pitcher, from ESPN. In both cases, a mere selecting, copying, and pasting the data into an Excel sheet will result in inscrutable mess of everything in the one cell. Instead, select and copy the original table from the websites. Open Notepad, and paste the values. Being simplistic in nature, Notepad is able to rescind the formatting of the text pasted into it. Then, highlight these pasted items on the Notepad, and copy. Although the the values of these items are the same as the original, the different formatting will make all the difference. Now go to the Excel cell, make sure the formula bar is not selected (make sure the cursor is not flashing anywhere). Paste, and the values will be nicely displayed in the table.

Why does this work? There are no visual ways to distinguishing tabbing and multiple spacings, but use the left and right arrows to traverse through the values in Notepad, and if the cursor jumps over the blank spaces, the empty space is due to tabbing. If the cursor only moves one position per arrow click, the spacing is just a space. Either way, Excel recognizes tab as sign to break the data into separate columns. Spaces will not trigger splitting the data separate cells. Instead, the text, which includes the spaces, will be pasted into the one cell. Lastly, line break trigger breaking the data into separate rows.

Sometimes, text-to-columns processing may be needed (see the previous related post). But often while retrieving data from websites, this method of copy from website, paste onto Notepad, copy from Notepad, paste onto Excel allows for the processing of data necessary to perform further analysis.

Sources: