Talk:Document-term matrix

Latest comment: 14 years ago by Danielx

Comments edit

I need some help here:

Do you think I focused to much one vectors?

We definitely need more applications. Kh251

I don't agree with the last changes. Performing eigenvalue decomposition reduce the size of the matrix, thus improves speed, but decreases accuracy. I know I might be wrong, but I'd like to understand... KH251 09:32, 21 July 2005 (UTC)Reply

Not necessarily: what you say is one valid interpretation of the reduction, but the reduction can also be interpreted as creating a "better" matrix, since the operation tends to "soften" the representation and reduce possible noise.
Also, it's not always true that this makes it easier on the computational side; for instance, LSA is rather heavier than just just leaving the thing alone (I have a reference for that somewhere, I am just rather busy at the moment...). Hope it helps ! Cheers ! Rama 12:14, 21 July 2005 (UTC)Reply
Yes but LSA is computed once, the important part is having real time answers to queries. Once the matrix is smaller, this will be faster, won't it ? KH251 12:37, 21 July 2005 (UTC)Reply
LSA produces a very serious computation burden on a search engine. Right now, if you type a word at a search engine, it looks the word up in a trie and finds documents that contain that word in O(1) time (independent of the number of documents in the collection). If you had a search engine that looked up documents in the LSA latent space, it would have to perform high-dimensional nearest neighbor search. LSA is typically used with 100+ dimensions, so none of the computational geometry speed-ups for nearest neighbor search apply. Therefore, the search would be O(N), where N is the number of documents in the collection. For Google, that would be 8,000,000,000. As you can see, this is disastrous for searching the web. -- hike395 06:14, July 22, 2005 (UTC)
Oh ! That's how ! Thank you very much for the explanation. You made my day. KH251 09:02, 22 July 2005 (UTC)Reply

Since we seem to be several people to have a taste for the thing, would anyone fancy creating a "NLP project" on Wikipedia ? Rama 12:18, 22 July 2005 (UTC)Reply

Intro Improvement Request edit

I encountered this term for the first time just a few minutes ago. I read the intro, but I still don't have a clear idea of what a document-term matrix is, other than it is a mathematical matrix and that it is related to a body of text. Danielx (talk) 01:42, 2 November 2009 (UTC)Reply