Wednesday, August 31, 2011

What I've learned...

Any fool can make a task harder.

Prayer to action

Based off the triplet found in Reinhold Neibuhr's Serenity Prayer:

Prayer to action

God, grant me the clarity to see,
the boldness to walk,
and the discipline to stay the right path.

Tuesday, August 30, 2011

4 stages of competence

Condensed, from Wikipedia here:

The Four Stages

Unconscious Incompetence
You don't know you suck.

Conscious Incompetence
You know you suck, but also where you suck. (I.e., where to improve).

Conscious Competence
You've got something going, but it takes an aneurism and a half to get things done right.

Unconscious Competence
You're awesomeness is practically in autodrive.

Monday, August 29, 2011

Ggplot2 Panels

I've been trying to use R to print out columns of data each in their own panel form for some time now. They all use the same time frame, of course, so I wanted to arrange by X-axis.

ggplot2 has a "facet" grammar you can use to plot different aspects of your data in a meaningful way -- i.e., different parts of an observation. The trick is to contort your data with the "melt" function, which will create two new variables per observation.

The first variable is the name of one of the columns.
The second variable is the value that corresponds to said variable.

I tried taking a peek at the data length -- observations should increase by a factor equal to the number of columns you are melting, i.e. if I'm melting by 3 variables


>>> data = melt(factor.data, measure.vars=c("variable_one", "variable_two", "variable_three"))
>>> length(factor.data$val)
85
>>> length(melted.data$val)
340


The data length increases by 3. But of course.

Here's the code I'm using to format, melt, plot, and save the data to pdf:


process.open <- function(filename) {
return(read.csv(file=filename, head=TRUE, sep=",", dec="."))
}

filename <- "z1_output_quint.csv"
output <- "z1_output_quint_panel.pdf"

factor.data <- process.open(filename)
factor.data$date <- as.Date(as.character(factor.data$date), format="%Y%m%d")
factor.data$val <- factor.data$value
factor.data$value <- NULL

vars = c("val", "momentum", "size")
vars_2 = c("dividend_yield", "profitability", "growth")
vars_3 = c("earnings_variability", "trading_activity", "volatility", "leverage")

pdf(output)
data = melt(factor.data, measure.vars=vars)
ggplot(data, aes(date)) + geom_hline(yintercept=0)+
geom_line(aes(y=value))+
facet_grid(variable ~ .)

data = melt(factor.data, measure.vars=vars_2)
ggplot(data, aes(date)) + geom_hline(yintercept=0)+
geom_line(aes(y=value))+
facet_grid(variable ~ .)

data = melt(factor.data, measure.vars=vars_3)
ggplot(data, aes(date)) + geom_hline(yintercept=0)+
geom_line(aes(y=value))+
facet_grid(variable ~ .)

dev.off()

Sunday, August 28, 2011

Extend vs Append

Could be useful.


>>> x = [1,2,3]
>>> y = [4,5,6]
>>> x.append(y)
>>> x
[1, 2, 3, [4, 5, 6]]
>>> x.remove(y)
>>> x
[1, 2, 3]
>>> x.extend(y)
>>> x
[1, 2, 3, 4, 5, 6]
>>>

Friday, August 26, 2011

Words from a Quant

The head of Director of Quantitative Research at my firm offered wonderful advice on understanding macro trends in finance:

1) A strong (formal) economic background
2) Great market experience
3) Patience and intelligence

One thing I asked him about was physical fitness and rest. He said because of the high level of concentrated study, adequate sleep and physical training (he did weightlifting) was necessary. I will take these words to heart when I go to study for my MSCS next year.

As an added bonus, he said that being the opposite of "scatter-brained" was one of the important qualities of a Ph.D. candidate. I think it is definitely something for a short-attention-span person like me to think about.

Thursday, August 25, 2011

A stitch in time

Saves nine.

Was chasing down a bug in my factor model for a week. Went through a lot of stuff.

I decided to look at one day that seemed out of wack. Surely, if this simple calculation was wrong, it must be the data. Otherwise, I would have to keep looking in my own code.

Surprise surprise, the period i was looking at did not match up with my expectations. When I inspected the FactSet code with the resident quant, I instantly saw the problem...

Was using P_TOTAL_RETURNC(0/0/-2,0/0/-1) rather than P_TOTAL_RETURNC(0/-2/0,0/-1/0)

I was grabbing yearly returns, instead of monthly!!!

Lesson: Be more thorough, use methods to narrow in on the problem.

Wednesday, August 24, 2011

Unit Tests and Python floats

I realized my downfall with using Python's easy dynamic typing: unit tests.

Since I use ints to simplify things, I forget that special "." that makes 1./3 different from 1/3 (hint: the difference is a third)

Which really means, remember to use . anytime you don't specifically mean an integer!


class TestNormalizeFunction(unittest.TestCase):
def setUp(self):
## Setup for weight test
self.dateList2 = [1,1,1,2,2,2]
self.countryList2 = ["us","us","mx","us","us","mx"]
self.list2 = [1.,3.,3.,1.,3.,3.] ## <--- NOT THE SAME AS [1,3,3,1,3,3]!

def test_weights (self):
self.assertEqual(process.calculateWeights(
self.dateList2, self.countryList2, self.list2),
[self.list2[0] / sum(self.list2[:2]),
self.list2[1] / sum(self.list2[:2]),
1.,
self.list2[3] / sum(self.list2[3:5]),
self.list2[4] / sum(self.list2[3:5]),
1.])

if __name__ == "__main__":
unittest.main()

Monday, August 22, 2011

The Basel Multiplier

Chebyshev's Inequality says:
Chebyshev's inequality

Suppose we wanted the highest VaR at the 99 percent confidence level.

That means, only 1 percent of samples taken should have VaR that exceed this level. Without making any assumptions about the distribution, Chebyshev's inequality says that we can expect K = 10 at the 1% tail end.

Granted, that's pretty big, but if we assume the distribution is symmetric then we can divide the righthand of the equation by half and the new k becomes about 7.

So if a financial institution had assumed a normal distribution, the k value for which VaR is not exceeded with 99% confidence is closer to 2.36. Dividing our earlier k (=7) by this k value, we get ~3. Thus the correction multiplier is about 3.


Example taken from Jorion

Tail events

There are two ways to measure VaR (Value at Risk):
  • Finding quantiles using empirical data
  • Matching a parametric distribution to data
ETL (Expected Tail Loss) takes averages of the tail value at risk to provide better perspective for tail risk outcomes.

There are still drawbacks to using empirical data.
"The most powerful statistical techniques cannot make short histories reveal once-in-a-lifetime events." (Jorion, Chapter 5)
For this, we move on to Stress Testing.