I attended a VASC talk today at Carnegie Mellon. Aleix Martinez presented "The Secret Life of Linear Methods: Why Linear Methods Work, Do Not."

I guess the main idea here was that we are looking for heuristics to classify real-world data. Usually it's image data, but the example of genes was also mentioned. The data is expressed as a vector, and then we choose some heuristic like LDA, and have at it. Aleix mentioned that as an undergraduate, he was frusted by the poor performance of a robot he built to navigate hallways and such, which used LDA.

Despite this undergraduate experience, Aleix is a big fan of linear methods, because, he points out, they are much more intuitive than non-linear ones. I can testify to this-- I was able to follow (most) of the math, after taking a reasonable class in linear algebra. I'm not confident that I would have the same understanding if the topic was non-linear methods. I need to brush up on my statistics, though.

Anyway, he presented some nifty methods of enhancing LDA. Apparently the main thrust of LDA is that we would like to know which dimensions in our n-dimensional dataset are "the important dimensions." The important dimensions are the one that can distinguish data points from one dataset from those of another. The implicit assumption here, of course, is that there are such dimensions. Apparently plain vanilla LDA can become confused by certain datasets, and produces an incorrect result. His method involves deciding if LDA would become confused, and if so, partitioning the datasets. Man, I hope I got that right! There's probably some details I'm missing or getting wrong.

I hope I didn't stand out too much at the talk. It was a pretty small audience, and the posted date was early by 10 minutes, so I ended up coming too early. I think they would probably just assume that I'm an undergraduate, though. Which is pretty much correct.

Anyway, it was an interesting talk. I thought Aleix was pretty down-to-earth, despite the complexity of the topic involved. And as any EE student knows, linear methods are still worthy of respect, even in this day of desktop supercomputers.

He mentioned that his current funding is coming from the NIH. Maybe there's some overlap with all the biotechnology stuff that's been going on. I've been hearing that biotechnology is going to be huge in the future. I guess with our aging population, and increasing standard of life... Definitely worth thinking about. Also, I'm going to grab some stats books next time I visit the library. I need to get at least Gaussians down cold.

Edit: Arthur pointed out to me that in image processing, image data is generally expressed as a vector, not as a matrix.