Thursday, May 24, 2012

Evaluating Methods of Communication: Rules of Phone Calls?

A conversation with a friend led to the question of "are there rules for when one can call someone?" Of course there are not, unless we are dealing with restrictive orders. But in today's living, where the modes of communication range from making phone calls, sending text messages, leaving Facebook posts, to sending instant-messages, is there a respectable convention?

No matter how technology progresses, the modes of communication can be categorized on a spectrum of attention-seeking. Off at the highest end is calling on the phone or video-calling on Skype; it's hard to multi-task while doing that. Off to the other end of the spectrum are methods like sending emails (suppose the recipient checks email consistently), leaving Facebook posts, or even sending text messages. For these, at the moment the information is passed on, the recipient can choose whether or not to put his or her attention onto it. He or she also chooses to reply at a moment of convenience. In-between is sending instant-messages. While the recipient can choose to pay attention and response at his or her discretion, there is a modest amount of attention beheld, for a response is expected soon; otherwise, it turns into relaying messages back-and-forth.

The most crucial component of a piece of information that dictates its communication method is its urgency. If it's an urgent information, it justifies seeking the immediate attention of the other. Here's an illustration. Suppose you're meeting a friend at 12:00. If you realize at that day 11:50 that you can't make it today, the best bet is to call, lest the friend is already on his or her way. This is an urgent message, and that justifies seeking the attention of the friend, regardless what the friend was doing at the time. Now suppose it's one week before the scheduled meeting, and it needs to be rescheduled. All of the forms of communication are acceptable, but methods toward the bottom of the attention-seeking spectrum are most appropriate. These non-attention-seeking methods will be able to convey the necessary information in time to serve its purpose (to schedule a new meeting time in the upcoming week).

Of course, there is nothing inherently "wrong" with calling the friend one week in advance purely to say that the meeting needs to be canceled. Instead, it's a probabilistic inconvenience. Go back to the moment one week before the scheduled meeting. The friend may be idle at the moment you call, in which case there's no real detraction from calling. However, what if the friend is really busy at the time, such that taking the non-urgent call disturbs his or her actions? Let's use the concept of utility, and specifically, the mutual utility of having the information successfully communicated, and the change in personal utility from picking up this information in the particular method. In the disturbed case, the friend's personal utility decreased due to the disturbance, but no mutual utility is gained from having the information successfully conveyed instantly. This wouldn't be true in an urgent situation, where if the information isn't successfully conveyed within time, the mutual utility plummets. Back to the case though, there can be no gain from using the attention-seeking communication method, given that the probability of non-attention-seeking methods getting the information across successfully is almost certain. Instead, regardless of what the probability that the friend is super busy at the time, the total change in combined utility will be non-positive. It could be unchanged, but it can decrease; by no way will it increase.

That's why making a non-urgent phone or video call is inefficient for both parties. If the information to-be-conveyed truly is not urgent, there are no benefits from utilizing the attention-seeking method of communication. Of course, some can argue for the increased heartfelt happiness as a result of those more intimate communication methods. Well, here we are assuming that there are no such effects. Amorous environments are of another realm, because when one tries to quantify love, well, we lose quantifiable results.

Tuesday, May 15, 2012

Misrepresentation of GPA

Looking back at this semester for me, there is quite a strong negative correlation between the amount of effort and commitment put into the classes, and the final grades received from the classes. It's a clear demonstration that GPA cannot capture the amount of take-away for the future, which just happens to be the most crucial element of education.

It's an unfortunate truth that the grades we get are predominantly based on how we perform on the one or two particular exams, relatively to the others in the class. Fine, let that be because there usually is no better way to go about it. But of course in the long run, it's the personally absorbed knowledge and future take-away that actually matter. Cramming all the knowledge the night before, acing the exam, and subsequently forgetting everything afterwards have little value in the future. Knowing actually how to use and apply the learned knowledge in the subject matter will go much further, whether or not we remember how to solve that one particular problem on that particular day on that particular exam.

Sunday, May 13, 2012

Dow Jones and S&P 500 Daily Change Correlations

Do the daily fluctuations in one stock index align well with the change observed in another index? For this exercise, let's take the Dow Jones and S&P 500 index for April 2012. For each index, after extracting the closing numbers for each day, the daily percentage change was computed, summarized below:

Date Dow %Δ S&P %Δ
30-Apr-12 -0.11% -0.39%
27-Apr-12 0.18% 0.24%
26-Apr-12 0.87% 0.67%
25-Apr-12 0.69% 1.36%
24-Apr-12 0.58% 0.37%
23-Apr-12 -0.78% -0.84%
20-Apr-12 0.50% 0.12%
19-Apr-12 -0.53% -0.59%
18-Apr-12 -0.63% -0.41%
17-Apr-12 1.50% 1.55%
16-Apr-12 0.56% -0.05%
13-Apr-12 -1.05% -1.25%
12-Apr-12 1.41% 1.38%
11-Apr-12 0.70% 0.74%
10-Apr-12 -1.65% -1.71%
9-Apr-12 -1.00% -1.14%
5-Apr-12 -0.11% -0.06%
4-Apr-12 -0.95% -1.02%
3-Apr-12 -0.49% -0.40%
2-Apr-12 0.40% 0.75%



Average 0.004% -0.034%
Stand Dev 0.008693038 0.009248878

When the pairs of daily % changes are plotted, there exists almost an 1:1 relationship, with a strong correlation:


Sources:

Wednesday, May 9, 2012

4 Home-Run Game Probability

On Tuesday night, Texas Rangers outfielder Josh Hamilton hit 4 home-runs in one game. This was only the 16th time in Major League Baseball's history to see the accomplishment of such a feat, and the first since 2003. Hamilton also had a double in the game, going 5-for-5 overall and totaling 18 bases in a single game, which was only one shy of Major League record. The Rangers won the game 10-3 against Baltimore Orioles.

To best estimate the probability of this event, the following data for Hamilton during each of the past four seasons at the Rangers were retrieved:

Year At Bat Home Run
2008 624 32
2009 336 10
2010 518 32
2011 487 25
Sum 1965 99

Since Hamilton hit 99 home-runs in 1965 at-bats during the past four seasons, let's assume that the probability of hitting a home-run is 99/1965 = 5.038%. On Wednesday night, Hamilton had 5 at-bats, and hit 4 home-runs. The probability of hitting 4 home-runs in a game with 5 at-bats will be the product of the following terms:
  • nCr(5,4) to indicate the number of combinations
  • (5.038%) ^ 4 to indicate the 4 home-runs
  • (1-5.038%) to indicate the 1 non-HR at-bat
Using this MATLAB line, the product is calculated to be nchoosek(5,4)*(99/1965)^4*(1-99/1965) = 3.0592e-005. That is around 1/32,688. Let's not forget the double he had. If we factor that into the calculation as well, the probability turns out to be 1/487,968.

To put in perspective, each season is 162 games. During each game across the league, there are 30 teams and 9 players who come to bat. That is 30*162*9 = 43,740 total player-performances in a season, assuming uniformity. Already that number is greater than 32,688. This means that if every player in MLB had the HR-hitting probability as Hamilton, the league should expect to see occurrence like Tuesday night's every season. However, clearly that isn't the case. Four-HR games are more rarer than perfect games or no-hitters. In baseball, only 20-K games and unassisted triple players have been less frequent.

Sources:

Thursday, May 3, 2012

Simple Linear Regression Illustration

Given a set of coordinate points, how do we find the linear regression using least square method? Imagine the set of 10 points like this:

Xi Yi
215 30.8
201 32.5
196 35.4
226 28.1
226 24.4
348 24.1
226 28.5
348 24.2
148 32.8
226 28.0

The regression line will take on the form of y = B0 + B1*x, with variance σ^2. The equations for the variables B0, B1, and σ^2 are:
  • B1 = (Σxi*yi - Σxi*Σyi/n) / (Σxi^2-(Σxi)^2/n)
  • B0 = ӯ(n) - B1*x̄(n)
  • σ^2 = (Σyi^2-n*ӯ(n)^2-B1*(Σxi*yi-Σxi*Σyi/n)) / (n-2)
Now create columns for the square of the x- and y-values, as well as the products between them. Sum each column as well:

Xi Yi
Xi^2 Yi^2 Xi*Yi

215 30.8
46225 948.64 6622

201 32.5
40401 1056.25 6532.5

196 35.4
38416 1253.16 6938.4

226 28.1
51076 789.61 6350.6

226 24.4
51076 595.36 5514.4

348 24.1
121104 580.81 8386.8

226 28.5
51076 812.25 6441

348 24.2
121104 585.64 8421.6

148 32.8
21904 1075.84 4854.4

226 28.0
51076 784 6328







Sum 2360 288.8
593458 8481.56 66389.7
  • B1 = (66389.7-2360*288.8/10) / (593458-2360^2/10) = -0.0484
  • B0 = (288.8/10) - (-0.0484)(2360/10) = 40.3024
  • σ^2 = (8481.56-10*(288.8/10)^2-(-0.0484)*(66389.7-2360*288.8/10)) / 8 = 6.9360
At last, let's compare the results to Excel:


Although the value of the coefficient of determination is quite low, the equation of the least-square best-fit line corresponds with the numbers we obtained. Also, using the function =STEYX(C2:C11,B2:B11) in Excel, the sample standard deviation was calculated to be 2.632951. Squaring that number, we get 6.932, nearly identical as the value we calculated above as well.

Wednesday, May 2, 2012

Calculating Type II (β) Error

In hypothesis test, β is the probability of failing to reject the null hypothesis when it is actually false. Let's first consider the two-sided case to see how it is calculated. Begin with the following given conditions:
  • Let x̄(n) = 174.5, s(n) = 6.9, n = 50, α = 0.05
  • H0: μ = 175, H1: μ ≠ 175
  • Compare against alternative hypothesis μ = 173
At α = 0.05, we are concerned with z = 1.96 and z = -1.96. Turn these z values into critical values for x̄:
  • Recall that for samples (for both two- or one-sided cases), z = (x̄-μ) / (s(n) / sqrt(n))
  • 1.96 = (x̄ - 175) / (6.9 / sqrt(50)) --> x̄ = 176.913
  • -1.96 = (x̄ - 175) / (6.9 / sqrt(50)) --> x̄ = 173.087
Under standard conditions, the null hypothesis would be accepted if x̄ is between those values. But what if the mean is really 173? That is the alternative hypothesis we are considering. To do that, use a normal distribution with 173 as the mean. We then need to find the z-values for 173.087 and 176.913. To do that:
  • (173.087 - 173) / (6.9 / sqrt(50)) = 0.0892
  • (176.913 - 173) / (6.9 / sqrt(50)) = 4.0100
From here, β is just the area under the normal distribution curve between the two critical values. Using MATLAB, normcdf(4.0100)-normcdf(0.0892) = β = 0.4644. The power of the test, or probability of correctly rejecting the null hypothesis when it is false, is 1-β = 0.5356.

What about an one-sided case? It is similarly computed. Consider the following given conditions:
  • Let p̂ = 0.535, n = 1000, α = 0.05
  • H0: p = 0.50, H1: p > 0.50
  • Compare against the alternative hypothesis p = 0.52
Remember that for the one-sided case, α = 0.05 gives the critical value of z = 1.645. Again turn the value into a value for p̂, similar to the way we solved for x̄ in the example above :
  • Recall that for proportions (for both two- or one-sided cases), z = (p̂-p) / sqrt(p*(1-p)/n)
  • 1.645 = (p̂-0.5) / (sqrt(0.5^2/1000)
  • p̂ = 0.52601
Under standard conditions, the null hypothesis would be accepted if p̂ is 0.52601 or less. Now use a normal distribution with 0.52 as the mean. We want the area to the left of 0.52601, because this is the area that will not be rejected. Now, carefully note that if our H1 were p < 0.50, we would be concerned with the area to the right, instead of left, of 0.52601. Back in the scenario here, we now need to turn 0.52601 to a z-score.
  • Do not forget that p = 0.52 now, instead of 0.50
  • (0.52601 - 0.52) / sqrt(0.52*0.48 / 1000)
  • = 0.3804
To obtain the area to the left of that z-score, simply run normcdf(0.3804) = 0.6482 = β. The power, or the probability of correctly rejecting the null hypothesis when it is false, would again be 1- β = 0.3518.