Saturday, December 22, 2012

Bicycle vs. Scooter: Usability vs. Storage

Bicycle and scooter are two simple non-motorized modes of private transportation. Each has its advantages and disadvantages, and in the end it comes down to a trade-off of usability and storage.

It's not surprising that bicycle is by far the more convenient mode to get from point A to point B. It is also better equipped to handle inclines, and has the better stopping mechanism with the brakes. The biggest drawback of bicycle is storage. Point B as a destination may or may not have convenient storage of the bicycle. Despite locks, bikes secured outside are not immune to damages from other people or the weather.

Scooter is the alternative to the storage issue, given that it is relatively easy to pack it in a small unit that can be carried by one hand into facilities like restaurants, which would not be able house a bicycle. However, on average the scooter may go only half as fast as the bike. It is also slowed down much more on upward inclines, and tires the leg (the standing leg in particular, not as much for the kicking leg) much more.

What are the implications of the trade-off of usability versus storage? Bicycles have comparative advantages in longer trips, while scooters have comparative advantages in shorter trips. There is no cutoff distance, as it will vary for each person's strength, the terrain of the path, and the surroundings of potential bike storage locations, but 1-2 miles serve as a reasonable estimate. For those more physically-fit, costs of using scooter is lowered. Smoother terrain will benefit the scooter, while inclines benefit bikes. Finally, of course better storage options benefit bicycles.

Thursday, December 13, 2012

Life Lessons from Longboarding

Believe it or not, there may be some lessons from longboarding that can be applied to real life. For background, longboard is just like a skateboard, except that it is bigger and goes faster. Comfortable cruising speed on a flat surface can hover from 8 to 10 mph. As the wheels are larger than those of the skateboard, the ride is much smoother. That said, the wheels are still small compared to those of devices like the bicycle, and as such, the longboard is still anything but immune to cracks on the ground.

Lesson one: focus on the present, with a sight for the future. While longboarding, it is imperative to focus on the current ground terrain and the terrain directly in front. Always be on the lookout for cracks on the ground, for they can and will trap the wheels and cause accidents if the cracks are too big for the wheels. No matter how smooth the ride it currently is, always focus on the present and the immediate future, and be ready to act on it.

Lesson two: uphold moderation. It is so much more fun to go faster on the longboard. However, with speed comes a lack of stability. Upon going down a hill, it may be tempting to accelerate down. But this is not like a bicycle where it comes with brakes. It becomes exponentially harder to stop when the speed is high, so it's always better to keep in moderation by maintaining a steady speed while going down hills. Do this by manually stepping off with one foot to decelerate before the speed is too high.

Friday, November 30, 2012

Spurs Fined for a Strategic Move

On Friday, NBA commissioner David Stern announced that the San Antonio Spurs will be fined $250,000 for sending 4 key players home, instead of playing them in a highly-anticipated nationally-televised game against the Miami Heat on Thursday. Stern claimed that the Spurs "did a disservice to the league and [the] fans" by taking its stars out in its only visit to Miami this regular season "without informing the Heat, the media, or the league office in a timely way." Spurs head coach Gregg Popovich's decision called for sending Tim Duncan, Tony Parker, Manu Ginobili, and Danny Green back to San Antonio to get some ahead of Saturday's home game against Memphis.

Heat star LeBron James summarized the collective disapproval of Stern's decision across the sports realm in very simple words: "it's not in the rules to tell you you can't send your guys home." Certainly some audience of the game may have been disappointed at not seeing the superstars from the two teams battle on Thursday night, but the fact of the matter is that Popovich and the Spurs had a broader goals in mind: best prepare for the entire season, not just this one game, and he had very justified reasons to rest his players. Thursday was the team's 6th road game in 9 days, and the 4th game of the week. On the flip side, Miami Heat was playing its first game since Saturday, and only its 5th game in 2 weeks. Given the severity of how lopsided the teams' schedules were coming into the game, the Spurs knew they were at a disadvantage, and it was strategically in their interest to rest their stars. Popovich said he had made this decision "when the schedule came out in July."

Stern didn't buy any of such, calling the action an "unacceptable decision." Yet, this is vastly different from the badminton scandal during the Olympics this summer, during which several badminton players from China, South Korea and Indonesia were accused of intentionally losing matches, so that "they could face easier opponents in future matches." There was little disagreement that such act of deliberately losing games was in the violation of the Olympic and sports spirit, and the players were expelled from their blatant actions.

What the Spurs did though, was drastically different from the actions of those badminton players. Many playoff-bound teams rest their star players in the final games of the regular season to give them rest. Sure, it's still early in the season in November, but the lopsided nature of the schedules of the two teams heading into Thursday's game was beyond the control of the Spurs, and they had to deal with this disadvantage. The Spurs made a strategic move, not a move that demeaned the spirit of the game. If anything, the short-handed Spurs over-delivered on expectations, as its usual bench players fought neck-to-neck with the defending-champions. It wasn't until 22.3 seconds left in the game that Heat delivered what proved to be the game-winning shot. The Spur took a gamble that nearly exceeded all expectations. Even if the game wasn't close, their decision was highly calculated, and NBA's decision to fine them dispirits its own sports of strategic maneuvers.

Sources:

Tuesday, October 30, 2012

Sandy Aftermath Pictures from Upper Manhattan

Here are some pictures taken today in Manhattan during a quick bike ride, one day after Hurricane Sandy tore through New York City and the Northeast region.

Southbound on Lexington Ave, around E. 79th: hardly looks different

Northbound on Central Park West, around W. 72nd: destruction aftermath in view

Central Park West, around W. 82nd: downed tree

Hudson Greenway: open for a stretch

Hudson Greenway: closed at around 79th

Cherry Walk was anything but cherry-like, rather all muddy

Southbound Henry Hudson Parkway closed due to cleanup

Even McDonald's on W. 125th & Broadway was closed

Most city parks remain closed

Southbound Riverside Drive, around W. 116th St: with southbound Henry Hudson closed, cars jam Riverside Drive

Saturday, October 6, 2012

Frick Collection, Frisk House, Frick Mansion

See previous related post: Alpine, NJ 07620: Nation's Most Expensive Zip Code

The Frick Collection is an art museum located on Manhattan's Upper East Side, at the corner of 5th Avenue and East 70th Street. The Frick House houses the Frick Collection today. The Frick Mansion is the 30,000 square-foot house, listed for $68 million in 2010, across the Hudson River in Alpine, New Jersey, country's most expensive zip code. All of these entities relate to Henry Clay Frick, "the Pittsburgh coke and steel industrialist," who lived from 1849 to 1919.

Frick was an art patron who collected paintings and other art objects. The Frick House was built in 1913 at its present location, and was built with Frick's intention that he would eventually "leave his house and his art collection to the public." The Frick Collection opened to public in 1935, 16 years after the death of Frick himself, and presently contains over 1000 works of art, "from the Renaissance to the late nineteenth century." The Frick Mansion in Alpine traces back to an estate the Frick family built in the 1930s. In 2006, real estate investor Richard Kurtz paid $58 million to acquire the 60-acre estate, and built the 30,000 square-foot mansion on the property. Immediately after completion, Kurtz put it on the market. The Frick Mansion, located on a completely gated drive, contains "12 bathrooms, 19 bedrooms, a library, a ballroom, a main kitchen, a catering kitchen, a basketball court, a movie theater and an 11-car garage," and furthermore can be controlled from anywhere with a smartphone.

Sources:

Monday, October 1, 2012

Basic MATLAB Demonstrations of Markov Chain and Limiting Probabilities

Consider the following 3-state Markov chain denoted p:

    0.1000    0.4000    0.5000
    0.3000    0.6000    0.1000
    0.7000    0.2000    0.1000

The limiting probabilities (stationary vector) denoted by π satisfies the equation π = π*p. While the steps aren't shown here, π is calculated to be:
    0.3269    0.4423    0.2308

In the long-run, 32.69% of the time will be spent in state 1, 44.23% in state 2, and 23.08% in the rest. In the subsequent codes, the stationary vector is denoted by s, since MATLAB cannot work with the symbol π as a variable name.

s*p
    0.3269    0.4423    0.2308
This demonstrates that π*p indeed equals the original π stationary vector.

s*p(:,1)
    0.3269
s*p(:,2)
    0.4423
s*p(:,3)
    0.2308
Each of the operations s*p(:,i) represents multiply the 1x3 π stationary vector by the 3x1 i-th column vector of p. Each of the operations yielded the i-th entry of π. This demonstrates the property that π_j = Σ π_i * p_ij, summed over i, where j represents the column number of the matrix p. That also translates to π * (j-th column of p). The physical interpretation of s*p(:,1) is the long-run proportion of transitions into state 1. That number equals s(1), which is the proportion of times spent in state 1. It makes sense that the proportion of transitions into a certain state is also the proportion of times spent in that same state.

Saturday, September 8, 2012

Toss Coin Until Head Appears Twice In a Row

Take a fair coin and toss it until a head appears twice in a row. There is no finite sample space for this experience, for it theoretically can go on infinitely. That said, the first few elements are: {H,H}, {T,H,H}, {T,T,H,H}, {H,T,H,H}. To determine the probability of tossing exactly 4 times, we can take the elements in the sample space that corresponds to only 4 tosses. In that case, they are {T,T,H,H}, {H,T,H,H}. The probability of tossing either combination is 1/16, and two such combination gives a total probability of 1/8.

To see the result simulated, the following codes on MATLAB can help to illustrate that with 1 million trials. Documentations are added to clarify the algorithm.
probability = 0.5;      %prob of tossing head
numTrials = 1000000;    %number of simulation trials
trialResults = zeros(1,numTrials);

for trial = 1:numTrials
    done = false;
    lastHead = false;
    toss = 0;
    while ~done         %this loop controls each trial
        roll = rand;
        toss = toss + 1;
        if roll < probability && lastHead
            trialResults(trial) = toss;
            done = true;    %each trial over when last trial was H, and this trial is also
        elseif roll < probability && ~lastHead
            lastHead = true;    %last trial was not H, but this trial is
        else
            lastHead = false;   %did not toss a H
        end
    end
end

sum(trialResults == 4) / numTrials

The results vary very little from 0.1250.

Tuesday, August 21, 2012

Showcase Showdown Analysis: Circular Reasoning and Oscillating Nash Equilibrium

The Showcase Showdown is a portion of the game-show The Price Is Right, in which the contestants spin the Big Wheel that has 20 sections randomly distributed, from 5 cents to $1.00 in 5-cent increments. The objective is to get as close to $1.00 as possible without going over, with one initial spin and an optional second spin. In the game, three contestants play the game to determine who has the highest value. What's the strategy in this game for the first contestant? An intuitive response may be to spin again for 50 or less on the first spin; keep the first spin if it's 55 or greater. Unfortunately, it's not as simple as that.

One strategy would be to spin the wheel the second time, only if the first spin resulted in less than the expected total outcome from the game, conditional to the outcome from the first spin. We can analyze the extreme discrete cases first to get a better understanding. If the first spin is 100, all second spins will make the total go over, resulting in total score of 0. In that case, the contestant will definitely keep the first spin of 100. On the other extreme, the expected total from the two spins, conditional to the first being 5, is 52.25. In that case, taking the second spin is better off. Here is the complete table, with the higher result reflecting the course of action pursued:

First Spin 2 Spins Avg Total Higher Result
5 52.25 52.25
10 51.75 51.75
15 51 51
20 50 50
25 48.75 48.75
30 47.25 47.25
35 45.5 45.5
40 43.5 43.5
45 41.25 45
50 38.75 50
55 36 55
60 33 60
65 29.75 65
70 26.25 70
75 22.5 75
80 18.5 80
85 14.25 85
90 9.75 90
95 5 95
100 0 100

If the first spin were 40 or less, having a second spin will on average produce a better result. The average of the third column gives the expected result from the game: 63. However, there is one hole in this reasoning when applied to the game. This would work perfectly fine if the first contestant played the game for himself or herself, only concerned about maximizing the individual score given the risk-reward offset. Instead, of the three contestants, only the one with the highest result wins. If the first contestant got anything from 45 to 60, inclusive on the first spin, it would've been strategically better to keep it in the aforementioned reasoning. However, since the objective is to beat all other contestants, rather than maximizing individual scores, at that point of the game it may be more reasonable to spin again nevertheless.

This is where the circular reasoning kicks in. If the contestant's decision is to spin the wheel the second time if the first spin were less than 63, then the total expected value drops to 59.95. Essentially, while the contestant tries to base the individual decision given the overall expected result, the overall expected results depend exactly on the individual contestants' decision. It's a circular route of logic, and also illustrates game theory being applied. Having 40 and 45 be the cutoff would the best risk-reward optimization decision on the individual level. However when that is the baseline, the dominant strategy is then to use 63 as the decision's critical point. When players do that, they all incur more risk and distort the overall expected result downward. Would contestant then use 59.95 as the decision's critical point?

If the contestants did, the next critical point would be 61.3. Here, if 61.3 were the next critical point, it forces 59.95 to again be the overall expected value, just like 63 did. Therefore, the "Nash equilibrium" is a perpetual oscillation between 59.95 and 61.3. In the end, the only definitely conclusions are to spin again if the first spin is 40 or lower, and to keep the first spin if it's 65 or greater. Having 45 and 50 as the first spin stands in the grey area, and 60 is dead in no man's land, caught in between the oscillating equilibrium.

Saturday, August 4, 2012

Evaluating Cleveland RTA Park-and-Ride Locations

See previous related post: 10PM, Tower City Terminal

Having convenient and pleasant park-and-ride rail stations is the key to developing effective public transportation systems that reach out to those in the suburbs. In Cleveland, many stations come with free parking along the rail lines. Park-and-ride can save two types of costs: a fixed cost of parking, and the variable cost of driving, which includes gasoline and depreciation. Merely accounting for parking, even for two people park-and-ride is a cost-effective alternative ($9 round-trip) to most  parking garage rates downtown, particularly on event nights. However, the system doesn't serve all regions of the metropolitan area evenly.

Bypassing Downtown
If one is to stop at downtown but ultimately bypass it on the overall route, East 55th Street Rapid Station strategically has the best location for park-and-ride. It is located right off the junction of I-490 and I-77, offering quick access back onto the highway. Furthermore, all three (Blue, Green, Red) Lines go from E. 55th to Tower City, minimizing the expected wait-time to catch any of the trains. This quick stretch of 7 minutes is particularly unique in that heavy-rail (Red) and light-rail (Blue & Green) lines run side-by-side. Despite these advantages and its recent renovation completed in 2011, E. 55th Station does offer several important challenges. The parking lot is quite small, capable of holding fewer than 50 cars. The bigger challenge is the fact that maybe it's too close to downtown, with its inner-city location potentially deterring suburban riders who prefer stations in areas they may be more familiar with. Adding onto the problem, coming off I-490, the parking lot is not easily accessible. Bower Ave, which has direct access to the main lot on the eastern side of E. 55th, is restricted one-way lane. The easiest lot to get into is located across the street. Crossing the busy street requires a bit of walking to and waiting at the street lights.

From East Side
For those coming from the Southeast, the terminus of either Blue or Green Lines offer that opportunity. While both lines take around the same time getting to Tower City (approx. 30 min), and around the same distance to I-271, Green Line holds a slight advantage. The terminus of Blue Line, Warrensville, does not offer direct parking. Riders have to park at the adjacent Farnsleigh station. This would be fine, except that upon getting out of the lot onto Van Aken Blvd, drivers can only turn westbound. As a result, leaving the station is more burdensome than the terminus at Green Road. From the Northeast, only two stations: Superior and Windermere along the Red Line, both in East Cleveland, offer park-and-ride. Windermere Station, however, is over 2 miles off I-90 through East Cleveland. As a result, options are more sparse for those coming from the Northeast.

From West Side
Despite Red Line being the sole rail system west of Cuyahoga River, the West Side has more convenient choices. Every station from Brookpark to West Blvd has free parking exceeding 100 spaces, and the choice of station depends on whether one is predominantly coming along the direction of I-90 or I-480. Brookpark is conveniently located near the junction I-71 and I-480, offering the best alternative for those coming from the Southwest. From the Northwest, Triskett and W. 117th Stations are convenient due to their proximity to I-90, along the Cleveland-Lakewood border, and these stations are only around 15 min away from Tower City.

South?
With Lake Erie off to the north of the city, the RTA system currently leaves the South devoid of rail connection. For those predominantly coming from the direction of I-77, there is not an effective park-and-ride location. E. 55th would be the natural choice from the map, but given the reasons described above, may not be the ideal choice. Plus, going up to I-490 is almost going into downtown anyways; barely any money is saved on the gas. The RTA does have several Park-N-Ride routes using express buses. They originate from southern suburbs like Parma and Brecksville, but these programs only operate on weekdays, during rush-hours, and only in the rush-hour direction. As a result, they are not viable for night or weekend events. The current system favors West (whether Northwest from I-90 or Southwest from I-480) and Southeast the most, while Northeast and particularly the South don't get nearly as much benefit.

Sources:

Thursday, August 2, 2012

Olympics Basketball Record

Team USA was the heavy favorites to win the Men's Basketball tournament, and today it went onto making records in its Nigeria 156-73 win against Nigeria. What's more impressive about the score, particularly putting it into perspectives for American fans used to the NBA style, is that the Olympics games are only 40 minutes long in regulation, as opposed to the 48 in NBA. Keeping the constant ratio, the 156 points would've translated to 156 / 40 * 48 = 187.2 points over a 48-minute period.

The NBA record for the most number of points in one game is 186, scored by Detroit against Denver in 1983. Furthermore, that game went into the third overtime period. Detroit won 186-184, as the teams combined for 370 points, another NBA record. The margin of victory from today's game (83) also shattered the NBA record of 68. The NBA record for the fewest number of points in a game of the modern era is 49, recorded by Chicago in 1999. That is nearly only a quarter of (specific factor difference of 3.82) the number of points Team USA scored today, by using the projected score over 48 minutes.

Sources:

Monday, July 30, 2012

10PM, Tower City Terminal

On a Monday night at 10 PM, the RTA Green Line departs from Tower City Terminal, sending many fans, who had just witnessed the Cleveland Indians avoid a sweep in the hands of Baltimore Orioles, back home away from downtown. It is packed. It is loud. It is lively. Passengers who just barely made it to the train cannot find a seating spot. But at least, standing was better than sitting in a car jam trying to get onto the highway, right? The train goes on, and the speedometer on the iPhone measures it at over 45 mph during the peak. Barely anyone gets out of the train in the first few steps, during the inner-city portion of the track. At Shaker Square, the first station outside of the Cleveland city proper, a noticeable number of passengers leave. Then another bunch gets off at Warrensville Station, which houses a sizeable parking lot. Finally, just under 30 minutes since departure, the remaining people get off at the Green Station, the terminus.

Despite being named the best public transportation system in North America in 2007, RTA still remains far removed from many Greater Clevelanders who don't live within the city proper or inner-ring suburbs. A reaction I received recently, upon suggesting driving from the a western suburb and park-and-riding at one of the Red Line stations, was a candid "Cleveland has trains?" It was hard to blame that reaction. Unless ones goes along Shaker or Van Aken Blvd occasionally, or is truly interested in this field, it is plausible for suburbians here to not even know the extent of the rail system. Does anyone here know that Cleveland was the first North American city to have a direct rail line between the international airport and downtown? Speaking from experience, the journey on the Red Line from Case Western to Hopkins Airport is beyond comparison to heading to LaGuardia or JFK from Columbia University. But as population density usually dictates the usage of public transportation system, Cleveland is not New York. Rail lines are more limited, the trains less frequent, stations less crowded, seemingly reinforcing what introductory economics book would say, that public transportation is an inferior good; that is, the demand for it increases as wage decreases.

But the scene aboard that Green Line Monday night told of a different story. These people, myself included, just didn't want to deal with hassle, not to mention cost, of parking and construction. It may have meant a faster journey overall, but the 30-minute ride was reasonable, coupled with the free park-and-ride system. Every year, there would be a few events, St. Patrick's Day most notably, that cause massive inflow of people downtown, leading to overflow of passengers on the RTA, that ultimately make the newspaper headlines. But this wasn't one of those nights. It was just another Indians game, and a pretty mediocre one too, in terms of attendance: 18,264. While the Indians haven't fared well in overall attendance, it truly is capable of drawing huge crowds: just three days later, over 34,000 fans attended the Thursday game against Detroit Tigers. This was also Monday night, not the most lively day-of-the-week when it came to downtown entertainment that drew crowds, such as the casino. Furthermore, this game on Monday night was before ODOT closed the Ontario Street ramp to westbound Interstate 90 for a year, causing potential havoc for drivers coming from Indians games. All these are to say that the scene on Monday night is by far from the busiest downtown Cleveland can see.

The passengers on the Green Line that night, most of whom got off at the park-and-ride stations in Shaker Heights, illustrated that the traffic was predominantly comprised of those who had driven from their home to the stations. Few took the train to get home directly. This is crucial in understanding the usage and service of the rail lines. They do indeed appeal to, even if still to a small portion of, the suburban residents. RTA recently initiated program to provide ticket discounts to families going to the games, trying to address the issue that carpooling will reduce the costs of driving. It may take more awareness, or continued frustration of parking downtown, to convince more suburbanites to ride the train. But until then, those who do enjoy the system got a glimpse, on what but an average Monday night, of what it has to offer. This may not be New York, and people aren't out at Public Square at midnight like they are in Times Square. But nevertheless, there is indeed an urban livelihood to be experienced.

Crowded Green Line train departing Tower City immediately following Indians game on 7/23/12
Sources:

Sunday, July 29, 2012

NBC's Disgraceful Decision to Skip Opening Ceremony Tribute to Terror Victims

Having exclusive rights to broadcast Olympic coverage in the United States is a double-edged sword for NBC. It draws massive viewership, but its decisions are heavily scrutinized, particularly due to the time difference between London and United States, and hence NBC's ability to control what is broadcast to the American audience. Many objected to NBC's decision not to live stream the Opening Ceremony, which happened in the early afternoon hours in the United States. Instead, NBC tape-delayed the show for prime time on both coasts, explaining that the Opening and Closing Ceremonies are "entertainment spectacles best shown at a time when friends and family are able to gather together to watch." Criticized or not, the decision paid off for NBC. The Opening Ceremony Friday drew "the best overnight ratings ever for an Olympics held outside the United States," a 7% increase in ratings from the Beijing Opening Ceremony.

The debates will continue, but one of its decisions Friday night deserves heavy criticism. When a tribute was played during the ceremony for terror victims, NBC instead decided to showcase an interview with Michael Phelps. This was purely a monetary decision aimed to boost its ratings, as the cost of US TV rights fees topped $1.18 billion. Upon being questioned for this particular decision, NBC spokesman Greg Hughes admitted that its "programming is tailored for the US audience." However, the crucial issue is that the tribute was not irrelevant to United States. In fact, the tribute was more relevant to United States than any other countries of the world. Great Britain has been United States' staunchest ally in the War of Terror. In Afghanistan alone, over 400 British troops have sacrificed their lives, more than the toll Britain suffered from its recent Falklands War. The tribute on Friday was widely interpreted as a memorial for the 52 victims of terrorist bombings in 2005, which occurred on July 7, the very next day after London was awarded these 2012 Olympic Games. The 2005 bombings were believed to be al-Qaeda retribution for British involvement in the conflicts in Afghanistan and Iraq.

While it is understandable that NBC has financial incentives to boost its audience with its selection of content exclusively broadcasted to Americans, this particular decision was ignominious. July 7 bombings were the 9/11 attacks for Britain, and occurred while the country was still in jubilation knowing it would host the 2012 Olympics. For any country to purposefully avoid such dedicated tribute at that moment for the marginal financial gain is classless, but in particular for NBC on behalf of the United States to do so was even more disgraceful.

Sources:

Friday, July 27, 2012

2012 Olympics Athlete Rate Per Country Population

As athletes match into Olympics Stadium in London for the Opening Ceremony, here's a look into the number of athletes by country or territory. Specifically, which countries have the most and least number of athletes, per population of the country? A report by Guardian has compiled the raw data of the athletes for analysis.

The top countries of territories on the list of athletes per population are dominated by very small countries that send a few athletes. Atop the list of Cook Islands, an associated state of New Zealand located in the South Pacific. With 8 athletes representing its population of 10,777, according to CIA World Factbook as of July 2011, Cook Islands have one athlete representing around 1,350 people. At the bottom of the list is Bangladesh. With 5 athletes representing the country of more than 161 million people, that is a rate of over 32 million people per athlete. The factor of difference between the rate of Cook Island and that of Bangladesh is nearly 24,000.

Of the countries or territories on top of the per capita list, if a filter of 100,000+ population is enacted, Iceland tops the list. With 28 athletes for a population of over 300,000, the rate is over 11,000 people per athlete. When a filter of 1,000,000+ population is enacted, New Zealand tops the list. In comparison, the United States, with 534 athletes, has over 580,000 people per athlete.

Sources:

Thursday, July 26, 2012

Space and Tab: How to Process Data into Excel

See previous related post: Processing List of Numbers into Excel

For anyone who has done analysis in Microsoft Excel, it should obvious that being able to successfully process raw data, whether from websites or text files, into the spreadsheet is the first crucial step. Each discrete entry should be in its own cell. Unfortunately, copying and pasting will not always work. Often times upon pasting, everything goes into the one currently highlighted cell. Often times, spacing and tabbing issue cause improper processing of data.

We'll work with these samples of data: historical S&P 500 index prices from Google Finance, and stats on Zack Greinke, a Milwaukee Brewers pitcher, from ESPN. In both cases, a mere selecting, copying, and pasting the data into an Excel sheet will result in inscrutable mess of everything in the one cell. Instead, select and copy the original table from the websites. Open Notepad, and paste the values. Being simplistic in nature, Notepad is able to rescind the formatting of the text pasted into it. Then, highlight these pasted items on the Notepad, and copy. Although the the values of these items are the same as the original, the different formatting will make all the difference. Now go to the Excel cell, make sure the formula bar is not selected (make sure the cursor is not flashing anywhere). Paste, and the values will be nicely displayed in the table.

Why does this work? There are no visual ways to distinguishing tabbing and multiple spacings, but use the left and right arrows to traverse through the values in Notepad, and if the cursor jumps over the blank spaces, the empty space is due to tabbing. If the cursor only moves one position per arrow click, the spacing is just a space. Either way, Excel recognizes tab as sign to break the data into separate columns. Spaces will not trigger splitting the data separate cells. Instead, the text, which includes the spaces, will be pasted into the one cell. Lastly, line break trigger breaking the data into separate rows.

Sometimes, text-to-columns processing may be needed (see the previous related post). But often while retrieving data from websites, this method of copy from website, paste onto Notepad, copy from Notepad, paste onto Excel allows for the processing of data necessary to perform further analysis.

Sources:

Wednesday, July 25, 2012

Correlations Among Facebook and Zynga

See previous related post: Facebook: Two Months Since IPO

After the market closed Wednesday, Zynga (NASDAQ: ZNGA) delivered a disappointing Q2 earnings report and outlook. Revenue was $332.5 million, short of the $343.1 million predicted by analysts. Profit was 1 cent a share, short of the 6-cent average estimate. Furthermore, Zynga cites that due to "delays in launching new games, a faster decline in existing web games due in part to a more challenging environment on the Facebook web platform, and reduced expectations for Draw Something," 2012 bookings will be around $1.15 billion to $1.225 billion, short of the projection from April of $1.43 billion to $1.5 billion. In the after hours, shares of Zynga have fallen by over 38%, but it wasn't the only one negatively affected by this news. Facebook saw its shares dip over 7% in the after hours.

The link between Zynga and Facebook (NASDAQ: FB)  is evident in the latter's mention in Zynga's report today. Just how correlated have the stock movements of the two companies been? How about their correlations to LinkedIn (NYSE: LNKD)? Historical prices, since Facebook's IPO in mid-May, were compiled from Google Finance, and the daily changes were documented:

Date  Zynga Facebook LinkedIn Boeing
25-Jul 3.25% 3.13% -0.78% 2.78%
24-Jul -3.34% -1.04% -1.16% -1.21%
23-Jul 6.04% -0.03% -1.94% -1.33%
20-Jul 5.26% -0.83% -2.20% -1.30%
19-Jul -1.08% -0.38% 3.86% 1.31%
18-Jul 0.66% 3.63% 0.74% 1.07%
17-Jul -5.18% -0.53% 0.61% 0.19%
16-Jul -1.43% -8.07% -2.25% -0.73%
13-Jul -2.39% -0.29% 0.91% 2.51%
12-Jul 1.41% -0.52% 3.52% 0.27%
11-Jul -1.20% -1.59% 1.56% -2.32%
10-Jul -4.39% -2.18% -3.35% -1.09%
9-Jul -2.24% 1.39% -5.40% 0.46%
6-Jul -1.47% 0.83% 0.57% -1.01%
5-Jul 0.74% 0.87% -0.28% 0.23%
3-Jul -2.88% 1.40% 0.85% 1.49%
2-Jul 2.21% -1.06% 1.28% -1.51%
29-Jun 1.12% -0.83% 3.44% 3.80%
28-Jun -4.44% -2.70% -2.31% -0.40%
27-Jun -2.26% -2.63% -1.17% 1.33%
26-Jun -4.95% 3.24% 3.27% -0.17%
25-Jun 1.00% -3.00% -3.56% -1.26%
22-Jun 4.90% 3.80% 4.56% 0.83%
21-Jun -2.89% 0.76% -2.93% -2.25%
20-Jun -1.34% -0.97% -0.07% 0.12%
19-Jun 3.29% 1.59% 0.17% 1.42%
18-Jun 3.96% 4.67% 3.09% -0.13%
15-Jun 10.76% 6.08% 3.78% 0.19%
14-Jun -0.40% 3.74% 2.89% -0.29%
13-Jun 1.20% -0.47% 1.13% -0.72%
12-Jun -10.27% 1.48% 0.18% 3.52%
11-Jun -8.26% -0.37% -2.05% 0.24%
8-Jun 0.33% 3.00% 2.26% -0.01%
7-Jun -2.11% -1.86% 1.13% 1.35%
6-Jun 7.50% 3.63% 0.09% 2.13%
5-Jun 0.35% -3.83% 2.10% 0.12%
4-Jun -4.99% -2.96% -0.46% 0.39%
1-Jun -3.99% -6.35% -4.78% -3.40%
31-May 6.64% 5.00% -2.07% 0.32%
30-May -3.61% -2.25% -1.81% -1.43%
29-May -7.87% -9.62% 1.44% 0.57%
25-May -2.79% -3.39% -0.28% -1.95%
24-May -3.82% 3.22% -4.60% -0.25%
23-May 3.97% 3.23% 2.20% 0.13%
22-May -4.09% -8.90% 4.64% -0.42%
21-May -0.98% -10.99% -2.20% 3.80%
18-May -13.42% 0.61% -5.65% -0.83%

As a control group, Boeing (NYSE: BA) was also selected to represent a company from the vastly different sector of industrial. The correlations among these vectors were calculated using the =CORREL() Excel function, as the results are as follows:
  • ZNGA / FB: 0.41262114
  • ZNGA / LNKD: 0.327412705
  • FB / LNKD: 0.187973772
  • ZNGA / BA: 0.067813629
  • FB / BA: 0.112054524
  • LNKD / BA: 0.285873686
Among these results, Facebook and Zynga did have the highest correlation, greater than either one's to LinkedIn. At the same time, Boeing had the highest correlation to LinkedIn, compared to either Zynga or Facebook. Given the high correlation between Zynga and Facebook, the latter of which is due to deliver its first earnings report as a public company on Thursday, it looks like Zynga, right after its disappointing report, has a good immediate chance to rebound back up or dip further.

Sources:

Tuesday, July 17, 2012

Facebook: Two Months Since IPO

See previous related post: Facebook Ready for IPO

To truly visualize the disappointment of Facebook (NASDAQ: FB) two months after its initial public offering, compare it with nothing else but LinkedIn (NYSE: LNKD) during the same period. From the IPO price of $38 launched on May 18th, Facebook has closed as low as 25.87. It made modest gains in June, climbing back as high as $33.10 on June 26th, before falling back down to close Tuesday at $28.09. This comes over the news that users in both the United States and Europe had fallen over the past six months. The research by Capstone, reported that US users declined by 1.1%, while 14 of 23 countries in which Facebook had over 50% penetration experienced "fewer users or saw little change." This report comes days after Warren Buffett, who's known to hold onto stocks for the long-term value, stated that "investors frustrated with the stock’s decline since its public offering are paying the price for betting on a short-term rally." Given the closing price of $28.09, that is a 26.08% drop during the period.

LinkedIn, the professional network, in the meantime saw its shares fall from the May 17th closing price of $104.95 to the present $103.84. That is only an 1.06% drop. What may be surprising is that the financial statements of Facebook look much more healthy. While people may claim is overvalued at its P/E ratio approaching 90, LinkedIn still has its ratio soaring above 600. Facebook's Q1 profit margin also trumped that of LinkedIn, 19.38% to 2.65%. But it's the growth that's raising the concern. Facebook's profit margin is actually down from 2011, when it was 26.95%, while LinkedIn has seen it soar from 2.28%. It's not just Wall Street that is punishing Facebook. In the latest American Customer Satisfaction Index E-Business Report, Facebook sank to a record low, falling behind Google+ and LinkedIn. Among the complaints of Facebook included "an excess of ads and privacy concerns."

Tumbling with Facebook over the past two months is Zynga (NASDAQ: ZNGA). From the $8.27 closing price of May 17th, it is currently at $4.58, representing a whopping 44.62% decline. Zynga is heavily reliant on Facebook to launch its social game services, and the Q1 profit margin was a dismal -26.59%.

Sources:

Sunday, July 15, 2012

Location of MLB Stadiums with Respect to City Centers

Sports stadiums are usually located at the center of metropolitan areas, but that is not always the case. Some are outright not even located in the city proper of metropolitan area's largest city. Here we examine the 30 Major League Baseball stadiums and their location with respect to the city center. While there may be difficulty in pinpointing one single city center, we will use the specific location given by Google Maps upon typing merely the city name. That yields good approximations of the core of the cities' downtown area. The distance from the baseball stadium to that area is calculated for each of the 30 stadiums.

There are a few cases that necessitate some remarks, in which the city that the ballpark is located in isn't the largest in that city's metropolitan area. Texas Rangers is based in Arlington, the third largest city in the Dallas-Ft. Worth area. Home of the Rays, St. Petersburg, is smaller than the city of Tampa. Anaheim is also part of the Los Angeles area. Nevertheless, because all of these cities are of fairly large size (all around 200K - 300K in population), it's fair to consider the distance from their center to the baseball for this exercise. Finally, it's worthy to note that the center location for New York was given as Broadway & Chamber St. in Lower Manhattan. While this location was indeed used for this exercise, it's important to note that there are tenable arguments for pushing Midtown Manhattan as the "center" of the city.

Here are the ranking in distance from the baseball stadium to the chosen city center:

Team City BallPark Mile
Mets New York Citi Field 10.1
Yankees New York Yankee Stadium 9.3
Royals Kansas City Kauffman Stadium 7.6
Athletics Oakland The Coliseum 5.8
Cubs Chicago Wrigley Field 5.1
Brewers Milwaukee Miller Park 4.2
White Sox Chicago US Cellular Field 3.8
Phillies Philadelphia Citizens Bank Park 3.6
Angels Anaheim Angel Stadium 3.5
Nationals Washington, DC Nationals Park 2.6
Rangers Arlington Rangers Ballpark 2.3
Red Sox Boston Fenway Park 2.2
Dodgers Los Angeles Dodgers Stadium 2.1
Giants San Francisco AT&T Park 2.0
Rockies Denver Coors Field 1.4
Blue Jays Toronto Rogers Centre 1.2
Mariners Seattle Safeco Stadium 1.1
Braves Atlanta Turner Field 1.0
Astros Houston Minute Maid Park 1.0
Marlins Miami Marlins Park 1.0
Rays St. Petersburg Tropicana Field 1.0
Orioles Baltimore Oriole Park 0.9
Twins Minneapolis Target Field 0.9
Pirates Pittsburgh PNC Park 0.8
Indians Cleveland Progressive Field 0.7
Padres San Diego PETCO Park 0.7
Tigers Detriot Comerica Park 0.6
Diamondbacks Phoenix Chase Field 0.4
Cardinals St. Louis Busch Stadium 0.4
Reds Cincinnati Great American Ballpark 0.3

Over half of the stadiums are located within 2 miles of the city center. At a first glance, it seems as though big cities have stadiums further out of the city center. This certainly is true for New York and Chicago. Indeed, Yankees and Mets have stadiums in The Bronx and Queens, respectively, away from the central business locations of Midtown or Downtown. White Sox and Cubs are known for being the team of "South Side" and "North Side," respectively. It may seem intuitive that for cities like New York and Chicago, with already heavily crowded downtown centers and limited parking spots, it may be more reasonable to have the ballpark crowds away from these centers. For smaller cities like Cleveland, sport events are great sources to attract people to be around downtown.

This is merely a hypothesis. In any event, even under such assumption, cities like Kansas City and Milwaukee are high on the rankings, seeming to contradict the trend. Bigger cities like Houston and San Diego are near the bottom of the rankings. It's important to note that other factors play into the location of the stadiums. While this exercise may give some insight, it's far from the comprehensive picture.

Source:

Saturday, July 14, 2012

Distribution of S&P 500 Constituent Market Cap

The S&P 500 index, according to Standard and Poor's website, "remains a leading indicator of U.S. equities, reflecting the risk and return characteristics of the broader large cap universe on an on-going basis." Presently, companies with market cap over $4 billion are considered. But given the 500 companies, what exactly is the distribution of their market cap like?

The short response is enormously positive skew. The data used for this analysis was as of June 28, 2012, factoring in the recent inclusion of Monster Beverage Corp. (NASDAQ: MNST) over Sara Lee Corp. (NYSE: SLE). As a note of interest, after the trading day of June 29th, Progress Energy (NYSE: PGN) was removed from the index, while Seagate Technology (NASDAQ: STX) was added, but this transition had not been factored into the data for this analysis.

The 500 companies totaled over $12.89 trillion in market capitalization. However, the top 10 companies comprised over 20% of that figure, led by Apple, Exxon Mobil, and Microsoft at $554B, $398B, and $255B, respectively. The average market cap of the 500 companies was over $25.7 billion, while the median figure was only around $12 billion, giving a skew value of over 5.48. In fact, the average value of $25.7 billion fits into the 77.7% percentile. Further remarkable is that even while taking the logarithm of all of the market cap values, the skew is still positive at 0.668.

Here is the summary table of these 500 market cap values, along with the box-and-whisker plot of the logarithm of the values:

Max: $544,206,062,500.00
95% P-tile: $91,226,411,328.12
75% P-tile: $23,949,059,570.31
Median: $11,998,596,191.41
25% P-tile: $6,461,437,377.93
5% P-tile: $2,973,694,091.80
Min: $1,301,555,053.71
Sum: $12,892,289,301,269.50
Average: $25,784,578,602.54
P-tile of Avg: 77.70%
Stdev: $46,745,814,552.27
Skew: 5.484828298
Skew of Log: 0.668860873
Top 10: 20.93%

The market values were obtained through Bloomberg on June 29th.

Saturday, June 30, 2012

Making Box and Whisker Plot in Excel

There's a simple method to construct a box and whisker diagram in Excel, by taking advantage the fact that box and whisker diagram is just a scatter plot with values fixed on one of the axis to give height to the box. Given a sample of data in column A of an Excel worksheet, here are the parameters necessary, along with the formula needed to obtain those values:
  • Minimum: =MIN(A:A)
  • 1st quartile:  =QUARTILE(A:A,1) or =PERCENTILE(A:A,0.25)
  • Median: =MEDIAN(A:A)
  • 3rd quartile:  =QUARTILE(A:A,3) or =PERCENTILE(A:A,0.75)
  • Maximum: =MAX(A:A)
Once those values are obtained, here is the most crucial component of the diagram: pasting the values into this table for a horizontal box and whisker diagram. Simply reverse the columns for vertical diagram:
Min 1
Min 3
Min 2
Q1 2
Q1 3
Q1 1
Median 1
Median 3
Q1 3
Q3 3
Q3 1
Median 1
Q3 1
Q3 2
Max 2
Max 1
Max 3
Once the numerical values are filled, this is simply a set of coordinate points. Select the table, and choose Scatter Plot > Scatter with Straight Lines and Markers. As imagined, the values 1, 2, 3 are arbitrarily chosen; since we are dealing with one-dimensional data, and just want the other axis to demonstrate some height to the boxes, the values 1, 2, 3 are the easiest to work with. In fact, as soon as the scatter plot is made, it's best to delete the values of that axis. Here's a box-whisker-plot of 50 randomly chosen numbers between 0 and 1:

To see that this is indeed the result of the table from above, trace the points on this plot, by going down the rows of the table. The labels on the plot should help to trace. Start from (min, 1) at the bottom left of the plot. The next row of the table is (min, 3) at the top left of the plot. In order to move onto the Q1 value on the plot, we must go down to (min, 2) and then to (Q1, 2) in order to preserve the straight edges of the box and whisker plot. And so on, the table lists the points that must be traced over to ultimately make the plot.

Wednesday, June 27, 2012

Introduction to Array Formulas in Excel: Transpose

Standard formulas in Excel begin at something as simple as =SUM(A1:A3), typed into a single cell. Suppose that formula was written in cell B1; in this case the sum of the numerical values from cell A1 to A3 is returned in cell B1. Among their many capabilities, array formulas can return multiple results, into multiple cells. Let's illustrate that with a simple case of transposing, or converting a row of array into a column, or vice versa.

Suppose that we have a table with the integers 1 through 5 listed in the first column, and the square of those integers in the second column, calculated through formula: for example, cell B1 contains =A1^2, and so forth. In the end, the table looks like this:

1 1
2 4
3 9
4 16
5 25

Now say that we want to transpose this table, so that instead of a 5x2 table, it'll be 2x5 table with the integers 1 through 5 in the top row, and the square of those numbers in the bottom row. We can actually do this easily by copying and instead of simply pasting, right-click the house and select Paste Special. When the option box comes up, check the Transpose box and hit OK. The table should look like this as envisioned:

1 2 3 4 5
1 4 9 16 25

However, these two tables are not linked by any formulas. Suppose that in the original table, we want to replace the value 5 with 10. Next to it, 100 will automatically appear since the calculation is based on the formula. However, the transposed table will not automatically change. This may be problematic if the data need to be linked.

Fortunately, there is a way to do transposing but through copy / paste special, but rather through formulas, just like =SUM() or A1^2. However, do realize that the output is going to be a 2x5 table, not a single cell. Here is where array formulas can perform this task. To begin, select and activate a range of 2 rows by 5 columns. While those cells are highlighted, begin typing =TRANSPOSE(A1:B5). As before, the range may be selected by the mouse. Now, if we hit enter right at this stage, we would get an error. The key to using array formulas is to press Control + Shift + Enter to enter the formula. Upon hitting those three keys, the transposed table will appear. Not only will it appear, it will be linked to the original table. Any change in the original table will automatically be reflected on the transposed table.

If we click on any cells that comprise the transposed table, we see that their formula all say {=TRANSPOSE(A1:B5)}. The curly brackets are indicative of the cell being part of an array. Another characteristic of an array is that its cells can't be modified. Try editing the content of any of the cells and an error message appears. The only way to exit this message is to press Esc.

In future posts, we will see how array formulas can perform multiple calculations on one or more items.

Tuesday, June 26, 2012

Quick Summary on Options

Options are the most common forms of derivative, financial instruments that derive their values from those of other assets. Call options give the bearers the ability to buy an asset for a specified price, called the exercise or strike price, on or before the expiration date. Put options, on the other hand, give bearers the ability to sell an asset for a specified price on or before the expiration date. For a call option, here are the possible outcomes:
  • If exercise price = current stock price, the option is "at the money" call.
  • If exercise price > current stock price, the option is "out of the money" call.
  • If exercise price < current stock price, the option is "in the money" call, as the bearer can purchase the stock at the lower exercise price and sell at the higher current stock price.
Buyers of call options are betting that the stock price will be higher than the exercise price by the expiration date. The seller, or writer, or the call option bets that the stock price will fall below the exercise price. For put options, it's the opposite. Those who buy put options bet that the stock price will fall, so profit can be made by buying the security at the lower price and selling at the higher exercise price. On the other hand, writers of put options bet that the stock price will increase.

Numerous factors go into how options are priced. If the price of the underlying security increases, so will the price of call options given a fixed exercise price. This is because there will be a greater range such that the option will be "in the money." By similar arguments, the lower the exercise price, the higher the value and price of the option. The results are opposite for put options in these cases. However, if volatility of the underlying security increases, so will the value of both call and put options. This is because the downside loss is fixed; bearers of options do not need to buy or purchase the asset if the price is unfavorable. Potential upside, on the other hand, increases the value of the option. Similarly, time to expiration increases the value of both options. Finally, rising interest rates depress the present value of exercise price and makes the call option more valuable. Dividends depress the expected capital gain of stocks and the value of the call option.