I hereby summon ye, knower of all that is having to do with making computers do menial tasks I don't want to do by hand, to give thy best advice regarding a data situation:
Suppose that I have 3000 text files (saved as whatever format you want, right now in .txt) which each represent one day (for example, 01.01.2005) with a format like this inside of each:
PLOT ID X Y CARBON WATER
101 100 200 0.009 3.00
102 106 201 0.003 5.00
And each has say, 11000 lines in it. So each text file is showing me data for 11000 locations on a specific day.
This is not what I am interested in, though. What I want to know is, how does the CARBON on a specific locaiton, like PLOT ID 102, change over the course of the 3000 days (each of which are in a different file). Notice that each file has these two things that uniquely identify the PLOT-- the first is the PLOT ID and the second is the X and Y (together) coordinates. The X and Y are more descriptive, of course, than the PLOT ID, but also in two different columns.
I am not sure where to start on this. Any ideas? This is what I have thought of so far-- but I think I am still in the mindset of "think like an accountant" and not "think like a ph.d. student who needs to learn to think in complex ways." this is summarized as: "lulz i r n00b." here's my n00bish thoughts:
1. put all data from one year into a single spreadsheet. use the sort function in excel to sort that year by plot ID.... then do something (??) to separate each of those plot ID's into separate files-- maybe a command in MatLab can do this?
2. do this for all years (let's say we are using 10 of them)
3. when trying to use matlab, just find all of those files using the command similar to the one about data1.txt, data2.txt that is in the worlds most awesome tutorial for MATLAB use that you made for me (also known as "reference manual of the gods").
I like this idea, but I know i will be updating the file information a lot over the next however many years-- is there a way to develop an automated "process" that the computer can run to do this for me? it sounds to me like a task for python but I'm still on the "Hello WORLD" level with that one.
Since you have MS Access (I think you do?), the easiest thing to do is probably to create a database, with a table for each spreadsheet. So you'll have tables file01, file02, file03, etc. Then you can do an "inner join" to retrieve only the rows with the correct plot ID's, from each table:
ReplyDeleteSELECT * FROM file01 INNER JOIN file02
ON file01.PlotID = file02.PlotID;
Alternately, you could just import all the data into Matlab, then tell Matlab to search for the corresponding plot ID's in the different data structures you've imported, then place the matching results into another matrix. Since you're going to be using Matlab to analyze the data anyway, that might be easier than doing it in Access.
HAVE I MENTIONED THAT YOU ARE AWESOME! MANY THANKS!!!
ReplyDeleteHappy to help! Let me know if you have trouble getting it to work.
ReplyDeleteyes to the above... i am having trouble figuring out how to automate access bringing in each of the files-- there are several thousand of them, and i think there must be a better way to do this than by hand, but no such luck!
ReplyDeletethe access is a really cool thing because as i have learned, ArcGIS can read Access data bases, which would make my life much easier in the future.. :)
ReplyDeleteI haven't used Access in a long time; is there an "import directory" option? I remember it being fairly natural to interface with Excel, since they're both MS programs. It may be easier for you to just process all the data in Matlab, especially since you'll be using Matlab to analyze it anyway (or R?). You should be able to use that code I wrote (in the blog post tutorial) to have Matlab automatically set up file names for you, provided the files are all named in a similar way. Then do importdata on all the files.
ReplyDeleteOnce you've got everything in data matrices in Matlab, you can just have it match up PlotID's for the different matrices. Are all the PlotID's arranged in exactly the same way in each file? (Like could you safely say that if line 2 is PlotID 102 in file 1, it's that way for every other file?) If not let me know, it's a little tricky but still doable. First get all the data into one giant cell array. You should be able to do something like:
uniquePlotIDs = unique(dataCell{1}(:,1));
for i = 1:length(uniquePlotIDs)
for j = 1:length(dataCell)
matches = (find(dataCell{j}(:,1) == uniquePlotIDs(i)));
if j == 1
matchedDataMatrix(1:length(matches),:) = matches;
sumLengths = length(matches);
else
matchedDataMatrix((sumLengths+1):(sumLengths+length(matches),:) = matches;
sumLengths = sumLengths + length(matches);
end
clear matches;
end
end
This will create a big matrix with all the 100's stacked on top of each other (from each file), then the 102's, etc. If you wanted them all to be arranged on the same row you could modify the code a little bit and do that too! Let me know if the code doesn't work. I just typed it here and don't have time to test it at the moment so it might have errors in it!
Crud, brain-fart in the above comment -- in the matches assignment line, it should read:
ReplyDeletematches = dataCell{j}((find(dataCell{j}(:,1) == uniquePlotIDs(i))),:);
That should work...hopefully...
yes, i think i found the problem... access actually doesn't import directories (neither does excel), which many on internet forums have bitcheth about. the file was saved as a txt file when it came out of the program, and it has several long and terrible headers. i went back into the program and changed it to csv and changed the headers-- i'm going to see if that will let matlab import it in the way you taught before! (i hope!)
ReplyDelete