Term Frequency in Lecture Slides

Term Frequency in Lecture Slides

di Lennart Hessenauer -
Numero di risposte: 1

Hi,

I have troubles to comprehend the computed term frequencies in the VSM example on slide 29 in lecture 4.
It says on the slide: \( tf("Frodo", d_2)=1 \) and \( tf("stab", d_2)=2 \). When I try to calculate it, I get \( tf("Frodo",d_2)=\frac{1+\log_{10}(1)}{1+\log_{10}(2)}\neq 1 \) and \( tf("stab",d_2)=\frac{1+\log_{10}(2)}{1+\log_{10}(2)}=1 \), because the "Frodo" occurs once  and "stab" occurs twice in \( d_3 \). Also, "stab" (or "orc") is the most frequent term in \( d_3 \) and therefore used for normalization, hence the denominator.

It appears as if the absolute term frequency is used in this example rather than the logarithmic and normalized term frequency. Is this correct? And which variant should we use in the exam?

Thanks for your help :)

Best,
Lennart

In riposta a Lennart Hessenauer

Re: Term Frequency in Lecture Slides

di Goran Glavaš -

Hi Lennart,

Indeed, it would appear that raw term frequency was used as the tf in this example. 

In an exam, you'd be given a concrete formula for both TF and IDF (since in practice, there are multiple variants for each, i.e., definitions are not unambiguous). 

Cheers,

Goran