# R uses # to denote comments.
# Fitting Commands used in R (for NPP Model). NPP is net primary production, essentially it is biomass-mortality calculated using forester's F, which is an expansion of trees per plot to plot size in hectares. Biomass is in Megagrams, the standard unit for carbon (Kg is also used sometimes).
# Opening the files s in R—select what you want by hand. I generally copy and paste this command from a website with a tutorial about residents of a village, so my initial file name is often "village." I changed it to "file" here, but you could use anything you want. The most important part is telling it, if you have a csv, to read.csv. If you have a dat or txt file, you can use read.table. R knows about the delimiters in csv when you do this so you don't have to specify.
>file=read.csv(file=file.choose())
>file=read.csv(file=file.choose())
#look at your file. R hates NaN's so you need to do something about them at this point. I took care of mine in the original data.
>file
#rename your file. "npp2007"-- net primary production, 2007.
>npp2007=file
#look at certain columns. This command tells me to look at the column of Median values (with a dimname "Median"). R automatically assumes you have a header in your table, so if you don't, at some point you need to tell it "dimnames=FALSE".
>npp2007$Median
#create a dataframe. This is an all important step that converts the information into whatever R can read. R usually refers to lists of stuff as "vectors" and it reads columns, not rows. You can tell it to read rows, though. Or just transpose.
Data1<-data.frame(npp2007)
# Write your model. Explanation below.
m<-nls(npp2007$NPPT~I(npp2007$Median^power), data=data1, start=list(power=0.5), trace=T)
# m is the name of my model. nls is the code for non linear regression. the model form is my response variable (npp2007) is related to a constant I times the dependent variable (npp2007$Median) raised to a power). because of what I am modeling, and what I have read in the literature about this particular model, I designed a form with no intercept. I will show an intercept form below. the data i am using is "data1" and the iterations will begin with my guess of power as 0.5. The model itself will guess the "I" for me. I don't know what trace=T does, but I read that you will have a better fit if you put it in, so I do.
summary(m)
#this command gives you your summary statistics. In this case it is an ANOVA of the predicted model versus your real data, coefficients, confidence intervals, iterations to convergence, P values, F values, and a few things about how it converged.
> summary(m)$coefficients
# the command above just prints the coefficents--- better for copying and pasting with, my dear.# the next command is just you making a solution vector with your model.
> solution1<-(village$Median^0.5966045)
# the next command performs the ks test. there is also an option of using a upper-tail or lower-tail version. Since I am fitting to the Median already, I did not deal with the robust statistics, but that is also an option throughout (different places you can say robust=TRUE). Exact=NULL just speeds up the time it takes to calculate. If i remember, this means that as it converges in on a solution, it doesn't bother solving things that aren't the solution? I think... I don't really know...
> ks.test(village$Median, solution1,exact=NULL)
#plot your model to see if it looks good-- if you have a p-value < 0.05 or whatever you desire, sweetness! The plot thing is really straight forward, the hardest part is to remember: when you regress, regress Y on X, when you plot, plot X on Y. I have a tendency to accidentally regress X on Y because of the plot command.
> plot(village$Median, village$NPPT)
# the lines command adds a fit line to the model without destroying the original plot.
> lines(village$Median, village$Median^0.5966045)
For models of a non-linear form other than power form. Okay so originally I wrote this with theProcessing language comment thing, which //, and it's a pain the butt to change this, so I am going to leave it.//plot your data. Same as above.
> plot(village$Median, village$NPPF)
//make your function. to the left of the arrow is what you name your function. Your function is a FUNCTION of parentheses (put your parameters in here), and takes on the form of {in this case, exponential form}
> f<-function(med, b0, b1) {b0+med^b1}
This next thing runs the function on some data-- again we use the non-linear fit, but note that this time we regress village$NPPF (the foliage data vector) onto f(med, intercept, power), which we are making guesses for to the right.
> foliage<-nls(village$NPPF~f(med, intercept, power), data=data1, start=list(intercept=0, power=0.5), trace=T)
// show and plot your function
> summary(foliage)
> lines(village$Median, predict(foliage, list(med=med)), col="blue")
//ks test
> solutionf<-(0.64052+0.33563*village$Median)
> ks.test(village$Median, solutionf, exact=NULL)
Using these I was able to get a "good fit" according to the KS statistic... now to understand the automation part. WHich mean first I need to clean my data to automate.
No comments:
Post a Comment