Term Frequency in Lecture Slides

Term Frequency in Lecture Slides

par Lennart Hessenauer,
Nombre de réponses : 1

Hi,

I have troubles to comprehend the computed term frequencies in the VSM example on slide 29 in lecture 4.
It says on the slide: tf("Frodo",d2)=1 and tf("stab",d2)=2. When I try to calculate it, I get tf("Frodo",d2)=1+log10(1)1+log10(2)1 and tf("stab",d2)=1+log10(2)1+log10(2)=1, because the "Frodo" occurs once  and "stab" occurs twice in d3. Also, "stab" (or "orc") is the most frequent term in d3 and therefore used for normalization, hence the denominator.

It appears as if the absolute term frequency is used in this example rather than the logarithmic and normalized term frequency. Is this correct? And which variant should we use in the exam?

Thanks for your help :)

Best,
Lennart

En réponse à Lennart Hessenauer

Re: Term Frequency in Lecture Slides

par Goran Glavaš,

Hi Lennart,

Indeed, it would appear that raw term frequency was used as the tf in this example. 

In an exam, you'd be given a concrete formula for both TF and IDF (since in practice, there are multiple variants for each, i.e., definitions are not unambiguous). 

Cheers,

Goran