Term Frequency in Lecture Slides

Term Frequency in Lecture Slides

yazan Lennart Hessenauer -
Yanıt sayısı: 1

Hi,

I have troubles to comprehend the computed term frequencies in the VSM example on slide 29 in lecture 4.
It says on the slide: tf("Frodo",d2)=1 and tf("stab",d2)=2. When I try to calculate it, I get tf("Frodo",d2)=1+log10(1)1+log10(2)1 and tf("stab",d2)=1+log10(2)1+log10(2)=1, because the "Frodo" occurs once  and "stab" occurs twice in d3. Also, "stab" (or "orc") is the most frequent term in d3 and therefore used for normalization, hence the denominator.

It appears as if the absolute term frequency is used in this example rather than the logarithmic and normalized term frequency. Is this correct? And which variant should we use in the exam?

Thanks for your help :)

Best,
Lennart

Lennart Hessenauer yanıt olarak

Re: Term Frequency in Lecture Slides

yazan Goran Glavaš -

Hi Lennart,

Indeed, it would appear that raw term frequency was used as the tf in this example. 

In an exam, you'd be given a concrete formula for both TF and IDF (since in practice, there are multiple variants for each, i.e., definitions are not unambiguous). 

Cheers,

Goran