24.6 tf-idf: Term frequency, inverse document frequency

what a document is about by looking at the words.

tf-idf: measure how important a word is to a document in a collection (or corpus) of documents

Term frequency (standarized)

Zipf’s law states that the frequency that a word appears is inversely proportional to its rank.

find the important words for the content of each document by decreasing the weight for commonly used words and increasing the weight for words that are not used very much in a collection or corpus of documents