Saturday, July 24, 2010

Seven reasons to move to California

7-Day Forecast for Brisbane, CA
7-Day Forecast for Corvallis, OR
7-Day Forecast for Charlotte, NC
7-Day Forecast for Atlanta, GA

The fun part about this post is that the NOAA forecasts will update as the date changes, and every day my argument will get a little stronger.

Friday, July 23, 2010

Fixing the lack of music on my computer

I don't really want to load this baby down with MP3's (like any OCD kid once I get started with something, like downloading MP3's... legally... I can't stop), but I had to get some music.  So I'm picking some favorites from what I have... and downloading a few "old classics" I forgot about.

Specifically today I've been really thinking hard about starting to train my muscles like martial arts again, which involves much stretching and easy calistenics for a while until I can handle flexible things, which relates to music because for about 4 years every afternoon I did the exact same calisthenic workout to the exact same CD, so I am downloading a few songs from that CD, which is called Naveed.  It's pretty great if you like our lady peace, but if you aren't into them, most people think they sound not-so-great. Maybe it's just memories of childhood musical enjoyment, but oh well.

I think I'm too tired to be blogging right now, having some trouble thinking straight.

Museum of Meme

http://www.youtube.com/watch?v=Ll-lia-FEIY

It's located in Wasilla, AK.

Wednesday, July 21, 2010

Once upon a time I was athletic

Small Torso Physics

... but actually I just had a competitive advantage because of the small torso.

Tuesday, July 13, 2010

not a nested for loop--- but I got a regression loop unnested!!!

oh the happiness-- I still must figure out how to nest the columns, but i managed to tell it to do one column versus another FOR a whole transect..... want to see? prepare for the ugliest code ever, which could probably be summarized by something... but hell, i made it work! summary later! work now!

look at that loop right there... mmmm....

for(i in c(1:11)) print(folbio<-nls(transect1[[i]]${YearsACC~nonlin(transect1[[i]]}$NbioF, intercept, power), data=transect1[[i]], start=list(intercept=0, power=0.5), trace=T))

Friday, July 09, 2010

Thursday, July 08, 2010

from hja

did not bring my camera (crap!) but i just saw the avalanche maker go off!  it was actually being a dam failure experiment this time, but WOW! i am really amazed at how the thing worked-- they built a small "landscape" at the bottom of the avalanche maker and set off the water.  In the water were various types of confetti with different colors and weights (masses? weights? i am not sure if i know which is the right word here) then when it went off it washed out the fake landscape at the bottom. now they are sorting through it and finding the confetti to see how the water went down the "flume" and how it destroys landscapes.

they are doing it again in august! i think with mud-- that would be amazing to see.

list of things to try to figure out how to work into my research:
1. go to the antarctic lter
2. go to the arctic lter
3. use the avalanche machine
4. take the snowmobiles up into the mountains-- the three sisters are the back of our basin and are still snow covered, even today, when it is 100 degrees in the cascades.  but they are about 8000-10000 feet high-- sounds like heaven!
5. go to the canopy crane, and go "canopy camping"-- you put your hammock up-- 500 ft. above the ground.  holy crap, it would be terrifying.. and awesome!!

Wednesday, July 07, 2010

wt....t?

it is 102 degrees in corvallis today. so readeth my thermometer. which leads me to ask, what the... temperature?

Sunday, July 04, 2010

Half of the below, in "R" language

For my own sake, which is right now in the R language, I've begun to take the logic of the below information and move it into R... here's a sort of sloppy thing I'm putting together for my metadata, which will of course get cleaner later.  But I thought it would be interesting as we can now see a direct difference between R and MatLab!... I think MatLab may be cooler.
# R uses # to denote comments.

# Fitting Commands used in R (for NPP Model). NPP is net primary production, essentially it is biomass-mortality calculated using forester's F, which is  an expansion of trees per plot to plot size in hectares. Biomass is in Megagrams, the standard unit for carbon (Kg is also used sometimes).
# Opening the files s in R—select what you want by hand. I generally copy and paste this command from a website with a tutorial about residents of a village, so my initial file name is often "village."  I changed it to "file" here, but you could use anything you want. The most important part is telling it, if you have a csv, to read.csv.  If you have a dat or txt file, you can use read.table. R knows about the delimiters in csv when you do this so you don't have to specify.
>file=read.csv(file=file.choose())
#look at your file. R hates NaN's so you need to do something about them at this point. I took care of mine in the original data.
>file
#rename your file. "npp2007"-- net primary production, 2007. 
>npp2007=file
#look at certain columns. This command tells me to look at the column of Median values (with a dimname "Median").  R automatically assumes you have a header in your table, so if you don't, at some point you need to tell it "dimnames=FALSE".
>npp2007$Median
#create a dataframe. This is an all important step that converts the information into whatever R can read. R usually refers to lists of stuff as "vectors" and it reads columns, not rows. You can tell it to read rows, though. Or just transpose. 
Data1<-data.frame(npp2007)
# Write your model. Explanation below.
m<-nls(npp2007$NPPT~I(npp2007$Median^power), data=data1, start=list(power=0.5), trace=T)
# m is the name of my model.  nls is the code for non linear regression.  the model form is my response variable (npp2007) is related to a constant I times the dependent variable (npp2007$Median) raised to a power). because of what I am modeling, and what I have read in the literature about this particular model, I designed a form with no intercept. I will show an intercept form below. the data i am using is "data1" and the iterations will begin with my guess of power as 0.5. The model itself will guess the "I" for me. I don't know what trace=T does, but I read that you will have a better fit if you put it in, so I do.
summary(m)
#this command gives you your summary statistics.  In this case it is an ANOVA of the predicted model versus your real data, coefficients, confidence intervals, iterations to convergence, P values, F values, and a few things about how it converged. 
> summary(m)$coefficients
# the command above just prints the coefficents--- better for copying and pasting with, my dear.
# the next command is just you making a solution vector with your model.

> solution1<-(village$Median^0.5966045)
# the next command performs the ks test. there is also an option of using a upper-tail or lower-tail version. Since I am fitting to the Median already, I did not deal with the robust statistics, but that is also an option throughout (different places you can say robust=TRUE). Exact=NULL just speeds up the time it takes to calculate. If i remember, this means that as it converges in on a solution, it doesn't bother solving things that aren't the solution? I think... I don't really know... 
> ks.test(village$Median, solution1,exact=NULL)
#plot your model to see if it looks good-- if you have a p-value < 0.05 or whatever you desire, sweetness! The plot thing is really straight forward, the hardest part is to remember: when you regress, regress Y on X, when you plot, plot X on Y.  I have a tendency to accidentally regress X on Y because of the plot command.
> plot(village$Median, village$NPPT)
# the lines command adds a fit line to the model without destroying the original plot. 
> lines(village$Median, village$Median^0.5966045)
For models of a non-linear form other than power form. Okay so originally I wrote this with theProcessing language comment thing, which //, and it's a pain the butt to change this, so I am going to leave it.
//plot your data. Same as above. 
> plot(village$Median, village$NPPF)
//make your function. to the left of the arrow is what you name your function. Your function is a FUNCTION of parentheses (put your parameters in here), and takes on the form of {in this case, exponential form} 
> f<-function(med, b0, b1) {b0+med^b1}
This next thing runs the function on some data-- again we use the non-linear fit, but note that this time we regress village$NPPF (the foliage data vector) onto f(med, intercept, power), which we are making guesses for to the right. 
> foliage<-nls(village$NPPF~f(med, intercept, power), data=data1, start=list(intercept=0, power=0.5), trace=T)
// show and plot your function
> summary(foliage)
> lines(village$Median, predict(foliage, list(med=med)), col="blue")
//ks test
> solutionf<-(0.64052+0.33563*village$Median)
> ks.test(village$Median, solutionf, exact=NULL)
Using these I was able to get a "good fit" according to the KS statistic... now to understand the automation part. WHich mean first I need to clean my data to automate. 

Nonlinear curve fitting tutorial

Linear fitting is easy. Nonlinear curve fitting is not, and can be a giant headache. I wrote a sort of tutorial/outline for how to do this, for our resident forester, and I thought I'd post it here, since I was puzzling out how to do this last year and I remember it be really annoying to learn.

Curve fitting seems like it's almost never the interesting part, but it's important to get right just the same. Which is what makes it frustrating -- you're thinking, come on, I have this function, it has two or three adjustable parameters, all I want to know is what combination of these parameters makes this curve look like my data? But, since it's nonlinear, generally there's no way to guarantee you've got the best answer, so lots of smart people have invested an enormous amount of time figuring out 1000 different ways to get something that's a pretty good approximation of the best answer.

Anyway, in the hopes that this can help someone else who is about to enter the baffling and not-at-all-fun world of nonlinear curve fitting, here's a tutorial I wrote this morning for how you can do this. (Using mostly Matlab code as examples, but also some Python, because I didn't plan out this thing too well before I started, and I'm way too lazy to go back and edit it...heh.)

# This is "pseudocode" (sort of like a skeleton for how to write a program like this,
# rather than actual code that you could run).
# Lines starting with a pound sign are comments, and are ignored by Python.  Matlab has
# a similar feature (the % sign is used instead of #), and I'm sure R does too.
#
# Step 1: import your data
# You can do this by hand, or by typing in the proper commands.
# In Python (and in Matlab) this is very straightforward, and you just need to specify
# where the files are located.
# For example, to load a file "data.txt" located in the C:/type/kristin/file/location/here/
# directory, you would type:

INFILE = open("C:/type/kristin/file/location/here/data.txt",'r')

# Python now can access data.txt through the variable INFILE.
# (The option 'r' specifies that you're only interested in reading from the file.  You can
# use option 'w' to specify that you will be writing data back to the file.)
#
# Depending on what format data.txt is in, how you will need to process the file will vary.
#
# You'll have to look up how to do this in R.  In Python, if you wanted to just copy the first
# line of the file into a variable and store it:

kristin_data = INFILE.readline()

# The variable kristin_data now is storing the contents of the first line of data.txt.  Suppose
# that's all the data in the file, and we're going to fit a curve to the data in that line.  Now
# close the file.

INFILE.close()

# Python is very flexible, but you have to tell it how to process the data by hand.  There are
# various ways to do this; the best one will depend on the nature of your data files.
#
# However, Matlab was designed with tasks like this in mind, and it is very simple to import
# data from flat files.  ('Flat file' is a generic term for any text file containing delimited data
# -- for example, comma-delimited (csv, which stands for 'comma separated values'), tab-delimited,
# space-delimited, etc.)  You just have to specify what the delimiter is.  You would type:

kristin_data = importdata('C:/type/kristin/file/location/here/data.txt',',')

# This specifies to Matlab that you want to import data.txt, and the delimiter is a comma.  If it
# was instead tab-delimited, you would type:

kristin_data = importdata('C:/type/kristin/file/location/here/data.txt','\t')

# The data is now stored in kristin_data, ready to be analyzed.
#
# Note -- \t is the 'escape character' for tab -- escape characters show up a 
# lot in programming, presumably because the compiler wouldn't know what to do
# with the word "tab."  Another commonly used escape character is \n, which
# is "newline," meaning start a new line.  (i.e., the equivalent of enter.)
#
# Now you're ready to actually work with the data!  (All code will be Matlab code.)
#
# The first thing I usually do when trying to analyze more than one data set is try and
# make my life easier by figuring out what they have in common.  For example, if your
# data sets have wildly differing amplitudes, normalizing them might be a good idea.
# (In the sense of dividing by the sum, not converting to a normal distribution.)
#
# Say kristin_data has all your x-axis values in the first row, and all the y-axis values
# on the second row.  Let's move these into separate variables, for clarity:

x = kristin_data(1,:);
y = kristin_data(2,:);

# In Matlab, the : means "take everything here."  That is, x = kristin_data(1,:) means take
# data from all columns of row 1, and place them into the variable x.
#
# (If you wanted to only take part of row 1, you can use : to specify a range -- for
# example, kristin_data(1,3:7) would mean select elements 3 through 7 on row 1.  This
# syntax is enormously useful, and once you learn it, it will annoy you to no end when
# you find yourself working in a language that doesn't have a similar feature.)
#
# For a SINGLE data set , here's how you would do a linear fit in Matlab:

[linear_fit_parameters, errors] = polyfit(x, y, 1);
R_squared = 1 - errors.normr^2 / norm(y - mean(y))^2;

# First, notice that an initial guess is not necessary; this is a nice feature of linear
# fitting.  You can use the built-in polyfit() function for any polynomial fitting,
# too, not just linear -- for example, a quadratic fit would be polyfit(x, y, 2).
#
# Here, the square brackets indicate that the function polyfit() will return more than
# one value, so we need to give it two variables to get all the information we need.
# The first variable, linear_fit_parameters, is a vector containing the two fitting parameters
# for a linear fit (the slope and intercept).
# The second variable, errors, is complicated-looking data structure containing details
# of how Matlab did the fit.  (It uses a matrix factorization called the QR decomposition to
# do this kind of fit.)  The second line of code uses this information to calculate an
# R-squared value.
#
# (If you're ever confused about a particular built-in function, you can just type
# help polyfit (or other function) at the command line, and Matlab's extensive built-in
# documentation usually has all the info you'll need.)
#
# For nonlinear fitting, it's slightly more complicated.  Say you also want to check and
# see if your data fits to an exponential function, a log-normal function, and a power law.
# These need to be stored in SEPARATE files (called m-files), which just contain the
# mathematical form of the function.  If you actually use Matlab to do this, I'm happy to
# show you exactly how to do this; for the purposes of this explanation, I'll just assume
# that they exist.  The syntax to access functions in separate m-files is @function_name.
# The @ symbol indicates that you want to access a function stored in a separate file,
# and function_name should be replaced by whatever the m-file name is.
#
# If you want to enable robust fitting (to minimize the impact of outliers), you need to
# specify that using the statset() function.  We'll create a variable called options,
# containing this information.

options = statset('Robust','on');

# By default, robust fits use Tukey's bisquare to reweight a least-squares fit.  You can
# specify if you want a different weighting function; Matlab has a bunch of them built-in.
#
# The nonlinear fitting function is nlinfit().  Unlike a linear fit, you need to tell it
# your initial guesses for the parameters, and these are important.  Define a vector p0
# which contains your initial guesses for the parameters (here, I'll assume you have 2
# parameters, but of course you might actually have more):

p0 = [1, 2];

# The command to do a nonlinear fit an exponential function stored in my_exponential_function.m
# is:

[P, r, J, SIGMA, mse] = nlinfit(x, y, @my_exponential_function, p0, options);

# P now contains the best-fit parameter estimates for this fit.  The other variables contain
# various estimates of how good the fit was.  Matlab has a built-in function nlparci() for
# calculating confidence intervals for fits done using nlinfit().  To take advantage of this, you
# would use:

CI = nlparci(P, r, 'covar', SIGMA);

# CI now contains confidence intervals for your fit.
#
# You can do identical commands for your lognormal and power law fits (or whatever else
# you want to try).  You just need to specify the file containing your fitting function,
# and of course initial parameter guesses.
#
# Ok, so suppose (as is likely) that you have a rough idea of what the parameters should be
# but not so close that you're confident that you can get a good fit on the first try.  You
# could automate the guessing procedure as follows:
#
# p0 is the initial guess that you make:

p0 = [1,2];

# The function rand(1,2) will create a 1-by-2 matrix (that is, a vector with the same
# dimensions as p0) containing random decimals between 0 and 1.  If you subtract 0.5 from
# both elements, then you'll have random decimals between -0.5 and 0.5.  If you then add
# this to p0, you've got a nice, automatic way of randomly adjusting p0 up and down from
# your initial guess:

p0 = p0 + rand(1,2) - 0.5;

# You can use a for loop to do this multiple times.  p0 is a vector, but suppose we lack
# confidence in our guessing capabilities, so we want to try 1000 different guesses, adjusted
# using the random number generator, as discussed above.  To store all these initial guesses,
# let's turn p0 into a 1000-by-2 matrix, so that our randomized guesses are stored on each row:

p0 = zeros(1000,2);
p0(1,:) = [1,2];
for i = 1:1000
 p0(i,:) = p0(1,:) + rand(1,2) - 0.5;
end

# What did we just do there?  p0 = zeros(1000,2) just creates a 1000-by-2 matrix of all zeros,
# and assigns it to the variable p0.  p0(1,:) = [1,2] assigns the values 1 and 2 to the first
# row of p0.
#
# The next three lines are a "for loop," meaning that they execute the command on the middle
# line a specified number of times automatically.  for i = 1:1000 means to do whatever is
# inside the loop 1000 times.  The variable i is the "loop index," and it keeps track of
# how many times the program has executed the loop so far.
#
# The middle line does the same thing that we discussed above (adjusts the guesses using random
# numbers).  The big difference here is that it does it 1000 times, and stores the results
# automatically on the 1000 rows of p0!  Notice that we're actually using the loop index i
# to specify which row of p0 we want to assign guesses to.  This is a very handy trick which
# you'll come back to again and again when you make use of for loops.
#
# Now you can use these 1000 guesses to do 1000 separate fits of all of your fitting functions.
# We'll ignore the linear case, since the guesses don't matter there.
#
# To do an exponential fit:

parameters = zeros(1000,2);
confidence = zeros(1000,2);
for i = 1:1000
 [P, r, J, SIGMA, mse] = nlinfit(x, y, @my_exponential_function, p0(i,:), options);
 CI = nlparci(P, r, 'covar', SIGMA);
 parameters(i,:) = P;
 confidence(i,1) = abs(P(1) - abs(CI(1,1)));
 confidence(i,2) = abs(P(2) - abs(CI(2,1)));
 clear P r J SIGMA mse CI;
end

# Notice how the only difference in the call to nlinfit() is that we're now specifying p0(i,:)
# as the rows of our initial guess matrix!  Also notice how we've repeatedly used the trick
# mentioned above, where we can use the loop index to specify positions in our matrices.
#
# Now you've got a 1000-by-2 matrix of parameter estimates, a 1000-by-2 matrix of confidence
# intervals for each parameter, and a 1000-by-2 matrix of your initial guesses.
#
# Since these are all indexed identically, you can go back and pick which fit you think is
# the best using a method of your choosing!  For example, suppose you just want the fit
# with the smallest total confidence intervals (I'm sure you have a better way of doing this,
# but this is just for example):

[smallest_CI, smallest_CI_position] = min(sum(confidence')');

# The sum() function sums down columns of a matrix; therefore, we need to transpose the
# confidence matrix.  In Matlab, transpose is done with an apostrophe: confidence' is
# the transpose of confidence.  So sum(confidence') sums across rows, which is what
# we want, and then we need to transpose it back, so another apostrophe
# sum(confidence')'.  We've just summed together both elements of all rows of the
# confidence matrix, so we now have a 1000-by-1 matrix (a vector), and we want to know what
# the smallest number is in this vector.  The min() function returns two pieces of 
# information: first, the value of the smallest number in the vector, and second, the
# position in the vector where that number is located.  So
# [smallest_CI, smallest_CI_position] will store those pieces of information.
#
# Armed with this information, if you're really in a hurry, you can just say:

final_parameters_exponential = parameters(smallest_CI_position,:);
final_CI_exponential = confidence(smallest_CI_position,:);

# This takes the parameters and confidence interval which we selected according to the
# rules above, and places them in two vectors (each containing only two entries).
# Alternately, you might select, say, the best 10 fits, then actually open those up in
# R, and eyeball the fits to see if they're reasonable.
# You can more or less copy-and-paste this code to do fits to other functional forms
# of interest.  In this way, you can test many nonlinear fits in a very short time!
#
# Realistically, you'll probably have more than just a single data set that you need to
# do these fits to.  You can use another for loop to repeat this entire procedure
# for multiple data sets.  Say you've got 500 files, all in the same directory,
# named as data1.txt, data2.txt, data3.txt, etc.

filepath = 'C:/type/kristin/file/location/here/';
filename = cell(500,1);
kristin_data = cell(500,1);
final_parameters_exponential = cell(500,1);
final_CI_exponential = cell(500,1);
options = statset('Robust','on');

for j = 1:500
 name = strcat(strcat('data', int2str(j)), '.txt');
 filename{j} = strcat(filepath, name);
 
 kristin_data{j} = importdata(filename{j}, ',');

 x = kristin_data{j}(1,:);
 y = kristin_data{j}(2,:);

 parameters = zeros(1000,2);
 confidence = zeros(1000,2);
 
 for i = 1:1000
  p0(i,:) = p0(1,:) + rand(1,2) - 0.5;
  [P, r, J, SIGMA, mse] = nlinfit(x, y, @my_exponential_function, p0(i,:), options);
  CI = nlparci(P, r, 'covar', SIGMA);
  parameters(i,:) = P;
  confidence(i,1) = abs(P(1) - abs(CI(1,1)));
  confidence(i,2) = abs(P(2) - abs(CI(2,1)));
  clear P r J SIGMA mse CI;
 end
 
 final_parameters_exponential{j} = parameters(smallest_CI_position,:);
 final_CI_exponential{j} = confidence(smallest_CI_position,:);
end

# Ok, there's kind of a lot to unpack there -- in fact, with the exception of the m-file
# (which will be in a separate file, my_exponential_function.m), that's a complete
# Matlab program to do your fits for you!
#
# First, cell arrays.  cell(500,1) creates a data structure called a 'cell array' that is
# possibly unique to Matlab, and is enormously useful.  Think of it as a list of matrices
# that you can store anything in that you want.  In other words, if you have 500 data sets,
# you can store them as 500 'cells' within a single cell array variable.  This is extremely
# useful for indexing purposes, as you can imagine!  The cells within a cell array are
# accessed using curly braces {}.  So for example, to access matrix element (1,2) within
# the second cell of the filename cell array, you would type: filename{2}(1,2).
#
# So right up front, we create four cell arrays, all capable of holding 500 matrices.
# filename will just be our way of storing the names of the data files; it's a lot easier
# doing it this way than clicking on 500 files separately.  kristin_data is the cell
# array that will hold all your data sets, once we've imported them.  
# final_parameters_exponential and final_CI_exponential are cell arrays to hold the matrices
# containing our best fit paramters and confidence intervals, respectively.
#
# Now, notice that we have two for loops, one inside the other.  This is called a 'nested
# loop,' and if you need to manipulate elements of a matrix (for example) one by one, you'll
# find yourself using nested loops a lot!  The loop indices of course are different (we're
# using j for the 'outer' loop and i for the 'inner' loop, but they can be any variable).
#
# Ok, now we're just doing that same code that we wrote before (for a single data set) over
# and over again.  We're making use of the outer index j to select the cell that we're
# interested in.  The following two lines:
# 
# name = strcat(strcat('data', int2str(j)), '.txt');
# filename{j} = strcat(filepath, name);
#
# are sort of cryptic looking, but all they're doing is setting up the file names auto-
# matically.  strcat() is "string concatenation," meaning that it takes two words and
# concatenates them.  So strcat('why', 'hello') would produce the output 'whyhello'.
#
# In this case, the path of the directory where all the data files are stored is already
# stored in the filepath variable; we just need to generate the data#.txt extension.
# So the way we do this is: strcat('data', int2str(j)) takes the word 'data' and
# attaches the number j on the end of it (int2str() converts an integer to the string
# 'version' of the integer -- this is done simply because you can't concatenate different
# data types together).  So say j = 1 (remember j is the loop index), then 
# strcat('data',int2str(j)) will produce:
#
# data1
#
# but we still need the .txt extension, so we then concatenate '.txt' to it with another
# strcat(): strcat(strcat('data', int2str(j)), '.txt').  This produces the output:
#
# data1.txt
#
# which we assign to the variable name.  We can then strcat() once more with filepath: 
# strcat(filepath, name).  This gives us the full name we need:
#
# C:/type/kristin/file/location/here/data1.txt
#
# ...which seems like a lot of pointless work, but it's really nice because it creates
# all filenames data1.txt, data2.txt, ..., data500.txt automatically.  The full file
# name is then stored in cell 1 of the filename cell array, filename{1}.
#
# Everything else is pretty straightforward -- we're just doing the same thing as before,
# but storing everything as cells within cell arrays, so that the program just runs
# while we sit around and play video games, and then after a bit we'll have fitting
# results for all 500 data sets tested against 1000 possible initial parameter values.

Saturday, July 03, 2010

Burritos

Clearly this is going to be at least 200000 times as deep as the post below, as you can see by the title. Maybe deeper.

Burrito joints are quite attractive to me.  They smell nice, you have no-meat and no-cheese options, and the spices are natural (not MSG type stuff that upsets the stomach).  Additionally it's a pretty darn good balanced meal, if you order the right thing.

But there is one problem, that of all things, should not be a problem for me.  The Spanish. Right, I know. I speak Spanish! So here's the deal-- I've been to the typical burrito joint, the corporately owned ones, like Q-dog (that's what the runner circle in Atlanta called it), and the Baja Fresh, Moes, you know. They are good, I guess.  Baja Fresh has a good "feel" to me because I know a certain physicist likes their fish tacos... hmmm... But I've been getting exposed to Oregon's sustainability culture, and one thing that is really easy to do, and also kind of fun, is to try to avoid visiting chain joints and go instead to individual local places.  Of course, I still end up at Fred Meyer to buy stuff... but trying to go to the co-op for groceries when I need them, or eating foods at local places when possible, I think that's a nice life change.  Which is a long story short to say that I really wanted to go to the Conga. It's a decent deal, not as good as the 3.00 place in the forestry building, but more salsa variety. And besides that a giant burrito for me is at least two meals (it would be more, but when you eat mostly veggies the bulk consumed increases while the density decreases).

So I went to the Conga this morning for 4.00 burrito to last me all day, and ordered what I thought was explicitly a veggie burrito.  It was pretty evident that the person behind the counter did not speak English.  It would have been incredibly easy for me to express what I wanted-- just a burrito shell with vegetables, a little beans, and salsa-- nothing else-- in Spanish.  But no, I did not.

So I got my epic burrito home and I ate just some of the end part where all the shell folds up, which was pretty much lettuce and salsa.  I was in heaven.  Nothing like a fresh, warm burrito shell (I wonder if it is a local brand) and salsa, plus those vinegar carrots on the side.  Then for dinner I was all excited to eat another 1/3 or so of the burrito, but when I bit in, it just tasted very funny and tangy, really unpleasant.  I opened the thing up and lo.. sour cream.  K does not really like sour cream... in fact, K hates the taste of sour cream.  And it wasn't just localized sour cream, it was all throughout, in the beans, etc.

Not wanting to waste the burrito, I dissected it and removed the sour cream when I could-- which meant I lost a good bit of the salsa.  My sour cream less burrito was, I am not kidding, 1/2 the size of the prior one (that's a lot of sour cream). It was a damn good burrito... but it shrunk on me.

The lesson of the day is: do not be too embarassed of my horribly americanized Spanish accent to be to proud to say, "Por favor, no me gustan productos de los animales. No carne ni queso ni crema ni guacamole. Solamente tortilla, pico o salsa, lechuga, y unas poquitas habichuelas!"

Now please continue thinking of deep things.  Clearly this was the most important part of my very non-productive day.

Friday, July 02, 2010

A (possibly) less cryptic way to calculate the cross product

Here is the latest entry in my "how to take simple, well-understood problems and convert them to horrible-looking exercises in matrix multiplication" file. (I'd say about 99% of my graduate work so far should also be stored in this rather unwieldy file...)

The cross product (which I think only exists in ordinary, three-dimensional space) of two vectors $\mathbf{a}$ and $\mathbf{b}$, written $\mathbf{a}\times\mathbf{b}$, is a way of combining $\mathbf{a}$ and $\mathbf{b}$ to produce a new vector, oriented perpendicularly to both $\mathbf{a}$ and $\mathbf{b}$. In symbols, this is
\[ \mathbf{a} \times \mathbf{b} = |\mathbf{a}| |\mathbf{b}| \sin \left(\theta \right) \mathbf{\hat{n}}, \]
where $|\mathbf{a}| = \sqrt{{a_1}^2+{a_2}^2+{a_3}^2}$ is the length of $\mathbf{a}$, and $\mathbf{\hat{n}}$ is an as-yet-unknown unit vector orthogonal to both $\mathbf{a}$ and $\mathbf{b}$. The standard way of calculating the components of $\mathbf{a} \times \mathbf{b}$ is by constructing a very strange-looking determinant:
\[ \mathbf{a} \times \mathbf{b} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix} \times \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} = \left| \begin{matrix} \mathbf{\hat{e}}_1 & \mathbf{\hat{e}}_2 & \mathbf{\hat{e}}_3 \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3\end{matrix} \right| = \begin{bmatrix} a_2 b_3 - a_3 b_2 \\ a_3 b_1 - a_1 b_3 \\ a_1 b_2 - a_2 b_1 \end{bmatrix}. \]
I remember feeling puzzled when I first learned this as an undergraduate. It does give you the right answer, but seriously, what's going on with that determinant? Its first row is a set of unit vectors, and its second and third rows are populated by the elements of the vectors $\mathbf{a}$ and $\mathbf{b}$. There's probably some obvious explanation that I'm missing, but to me, that determinant has always been maddeningly cryptic.

Anyway, I was waiting for Maple to grind through some calculations for me this afternoon, so I thought I'd take a crack at writing the cross product in a somewhat less confusing way. Looking at the result of the above equation, it's clear that we're looking for the following three inner products:
\[ \mathbf{a} \times \mathbf{b} = \begin{bmatrix} a_2 b_3 - a_3 b_2 \\ a_3 b_1 - a_1 b_3 \\ a_1 b_2 - a_2 b_1 \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} a_2 & -a_3 \end{bmatrix} \begin{bmatrix} b_3 \\ b_2 \end{bmatrix} \\ \begin{bmatrix} -a_1 & a_3 \end{bmatrix} \begin{bmatrix} b_1 \\ b_3 \end{bmatrix} \\ \begin{bmatrix} a_1 & -a_2 \end{bmatrix} \begin{bmatrix} b_2 \\ b_1 \end{bmatrix} \end{bmatrix}. \]
It should be possible to form a matrix $\mathbf{A}$ from the components of $\mathbf{a}$, while leaving $\mathbf{b}$ untouched. (Or vice-versa.) Clearly, $\mathbf{A}$ will be a $3 \times 3$ matrix, and, since $\mathbf{a} \times \mathbf{b}$ is orthogonal to $\mathbf{a}$ and $\mathbf{b}$, it should have zeroes on its diagonal. After a bit of tinkering,
\[ \mathbf{A} = \begin{bmatrix} 0 & -a_3 & a_2 \\ a_3 & 0 & -a_1 \\ -a_2 & a_1 & 0 \end{bmatrix}, \]
which allows us to compute the cross product by multiplying $\mathbf{b}$ on the left by the matrix $\mathbf{A}$:
\[ \mathbf{A} \mathbf{b} = \mathbf{a} \times \mathbf{b}. \]
Two immediately evident properties of $\mathbf{A}$ are that it is traceless (since its diagonal elements are all zero),
\[ \text{tr}\left(\mathbf{A}\right) = 0, \]
and it is skew-symmetric (equal to the negative of its transpose):
\[ \mathbf{A} = -\mathbf{A}^\mathrm{T}. \]
A quick calculation reveals that $\mathbf{A}$ only has two eigenvalues,
\[ \lambda_{\pm} = \pm i | \mathbf{a} |. \]

I don't know if this is otherwise useful, but it seems to me that it's quite a bit less confusing than the determinant method of calculating $\mathbf{a} \times \mathbf{b}$. One possible application might be to define the cross product in higher-dimensional spaces, using a larger matrix $\mathbf{A}$ with the same internal structure.

Thursday, July 01, 2010

Math factoid of the day

Here's an odd tidbit that I came up with: a way to write a multinomial expansion with all coefficients equal to 1. It turns out this factorization was not very useful for my particular problem, but I thought maybe it could be useful to someone, so I'm posting it here.

First, a quick bit of background math. An ordinary binomial is just a sum of two variables raised to some power, $(a+b)^N$. The binomial theorem is a way of calculating the coefficients of the series expansion of a binomial,
\[ (a+b)^N = \sum_{i=0}^N \frac{N!}{i!(N-i)!} a^i b^{N-i}, \]
where $\frac{N!}{i!(N-i)!}$ is called the $i^{\text{th}}$ binomial coefficient. There are well-known ways of writing similar sums for longer multinomials, and obtaining the cofficients of the sum that way.

I had an unusual problem involving a sum that was kind of like a multinomial sum, but without the coefficients. That is, in the two-variable (I guess it would be called 'unit-coefficient binomial?') case, I had a sum of the form
\[ \sum_{i=0}^N a^i b^{N-i}, \]
which is on first glance actually a simpler sum, but has the great disadvantage that it can't be easily condensed into a binomial $(a+b)^N$. Since I was ultimately interested in taking derivatives of combinations of these types of sums, I thought it would be nice to have a way of expressing the sum more compactly.

There probably is a more efficient way of doing this, but the method I came up is to rewrite the sum as a product of relatively simple matrices. It seems to work for arbitrarily long multinomials raised to any power. It's probably easiest to explain by example. For instance, consider a cubed unit-coefficient binomial (that is, the series expansion of $(a+b)^3$ with all coefficients equal to one),
\[ a^{3} + a^{2} b + a b^{2} + b^3 = \sum_{i=0}^3 a^i b^{3-i}. \]
For some reason, my mind tends to try and turn any math problem into an exercise in linear algebra. So naturally, I noticed that this sum may be rewritten as an inner product of vectors, with elements given by powers of $a$ and $b$:
\[ \sum_{i=0}^3 a^i b^{3-i} = \begin{bmatrix} a^3 & a^2 & a & 1 \end{bmatrix} \begin{bmatrix} 1 \\ b \\ b^2 \\ b ^ 3 \end{bmatrix}. \]
This works for higher powers than three; the only difference is that the vectors are longer. This immediately suggests that a unit-coefficient trinomial (here, $(a+b+c)^2$ with all coefficients equal to one),
\[ a^{2} + a b + b^{2} + a c + b c + c^2 = \sum_{i=0}^2 \sum_{j=0}^i a^j b^{i-j} c^{2-i}, \]
may be rewritten by inserting a triangular Toeplitz matrix between the two vectors:
\[ \sum_{i=0}^2 \sum_{j=0}^i a^j b^{i-j} c^{2-i} = \begin{bmatrix} a^2 & a & 1 \end{bmatrix}\begin{bmatrix} 1 & 0 & 0 \\ b & 1 & 0 \\ b^2 & b & 1 \end{bmatrix} \begin{bmatrix} 1 \\ c \\ c^2 \end{bmatrix}. \]
In this case, the unit-coefficient trinomial is squared (that is, the upper limit on the sum is $2$), so it is a $3 \times 3$ matrix; in general, the size of the Toeplitz matrix would be $(N+1) \times (N+1)$, and each vector would also of course be $N+1$ dimensional.

This factorization holds generally for unit-coefficient multinomial expansions. For example, the cube of the unit-coefficient quadrinomial $\left( a + b + c + d \right)^3$ factors into a pair of $4 \times 4$ matrices:
\[ \sum_{i=0}^3 \sum_{j=0}^i \sum_{\ell =0}^{3-i} a^j b^{i-j} c^{\ell} d^{3-i-\ell} = \begin{bmatrix} a^3 & a^2 & a & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ b & 1 & 0 & 0 \\ b^2 & b & 1 & 0\\ b^3 & b^2 & b & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ c & 1 & 0 & 0 \\ c^2 & c & 1 & 0\\ c^3 & c^2 & c & 1 \end{bmatrix} \begin{bmatrix} 1 \\ d \\ d^2 \\ d^3 \end{bmatrix}. \]
For larger multinomials, each extra term in the unit-coefficient multinomial results in an extra Toeplitz matrix in the above product, with the dimensions of the matrices and vectors set by upper limit on the sum (as before). No idea if this factorization strategy is useful (turns out in my case it wasn't), but I thought it was sort of neat regardless...

Kava!!

I was looking for some tea today and lucky me, they have KAVA tea now for sale. It is my favorite...

That is all.