Jump to content

What am I missing about frequency stats (ie, why do the "hits" seem off)?


Jonathan Kiel

Recommended Posts

I'm looking for some frequency statistics and my numbers don't match up.

 

First, I search for =נתן [Range Isa] and Accordance tells me the frequency is 0.12 (hits / 1,000 words). However, when I compare the returned numbers I don't get the same frequency. 

 

=נתן occurs 56 times in Isa.

* [Range Isa] returns 22,838 total words in the book.

Using those numbers, the frequency would be 2.45. 

 

What am I doing wrong? 

Edited by Jonathan Kiel
Link to comment
Share on other sites

Its important to remember that there is a distinction between the [RANGE] command and actually specifying a Range condition using the (+) button.  95% of the time, yes, they are functionally equivalent, but this is one of those 5% differences :)

 

Under the hood, the [RANGE] command filters results by the range, while the Range condition causes the search to only be accomplished on the specified range.  This means that, as one distinction, statistics get reported differently.  Since your search is searching the entire corpus, but then restricting its results to those in Isaiah, your statistic of 0.12 is relative to the entire text, not just the book of Isaiah.  If, instead, you add the Range search condition, you'll get a number closer to what you are expecting.

 

Two more points of interest:

 

First, don't forget to consider other Analytics tools, such as the Table or Table chart in getting frequency counts.  They automatically split up your results by book, so they return the same numbers, regardless of using [RANGE] or a condition.

 

Second, you'll notice that, even when specifying the range differently, the frequency doesn't match up with your provided numbers.  This is because the concept of a "word" is actually a bit ambiguous.  You searched for *, which actually tells Accordance to find all lexical forms.  This excludes forms like suffixes!  You'll get a bit closer using a "*" search to find all inflected forms, but the best word count can actually just be seen on the Table analytic (Set Table Display -> Show total word count).

 

I hope this helps!

Link to comment
Share on other sites

Joel, 

 

I appreciate your response! Can you clarify what you mean by the "(+) button"? My results don't change when I use the character '+' or when I insert an <AND> command. 

 

Also, what do you mean by "your statistic of 0.12 is relative to the entire text"? In what way is it relative to the entire text? (Searching the entire Hebrew text for נתן gives a hits count of 4.22.)

 

Thanks again...

Link to comment
Share on other sites

For my own info and as a possible caution to Jonathan, do Ketib/Qere pairs count as 2 in these types of statistics?

 

Best,

Pete

Link to comment
Share on other sites

Yes, both Ketib and Qere will be included unless you ignore bracketed words.

 

 

 

Screen%20Shot%202018-09-27%20at%209.05.5

Edited by Mark Allison
Link to comment
Share on other sites

Joel, 

 

I appreciate your response! Can you clarify what you mean by the "(+) button"? My results don't change when I use the character '+' or when I insert an command. 

 

Also, what do you mean by "your statistic of 0.12 is relative to the entire text"? In what way is it relative to the entire text? (Searching the entire Hebrew text for נתן gives a hits count of 4.22.)

 

Thanks again...

 

Jonathan, I'm referring to the 'Add Condition' button that has a plus on the right side of the entry box.  See attached screenshot:

Screen Shot 2018-09-27 at 12.19.22 PM.png

 

Once you press it, you can set the Range to be a specific book (you may need to define your range as well):

 

Screen Shot 2018-09-27 at 12.19.29 PM.png

 

 

To clarify my search, think specifically about the search you asked Accorance to do.  You said, find נתן across the entire Hebrew corpus, but then restrict the results to just those in the book of Isaiah.  This means the number of 0.12 is the hit count / # words in entire Hebrew corpus.

If you use the range search condition (as described earlier in this post), you are instead asking Accordance to find נתן within the book of Isaiah, so the hit count you see will be just compared to that book.

 

Does this help clarify?

Link to comment
Share on other sites

Jonathan, I'm referring to the 'Add Condition' button that has a plus on the right side of the entry box.  See attached screenshot:

attachicon.gifScreen Shot 2018-09-27 at 12.19.22 PM.png

 

Once you press it, you can set the Range to be a specific book (you may need to define your range as well):

 

attachicon.gifScreen Shot 2018-09-27 at 12.19.29 PM.png

 

 

To clarify my search, think specifically about the search you asked Accorance to do.  You said, find נתן across the entire Hebrew corpus, but then restrict the results to just those in the book of Isaiah.  This means the number of 0.12 is the hit count / # words in entire Hebrew corpus.

If you use the range search condition (as described earlier in this post), you are instead asking Accordance to find נתן within the book of Isaiah, so the hit count you see will be just compared to that book.

 

Does this help clarify?

 

Joel,

 

Again, very helpful! I admit, though, that I don't understand some of the different results. You say the number 0.12 is the hit count for =נתן for the entire Hebrew corpus. Why would that number differ from a search of just =נתן without any specified range, which returns a hit count of 4.22?

 

I understand the difference in restricting the results from a search of the entire corpus and restricting my search *to* a specific corpus. I don't, however, still understand where the 0.12 number is coming from. 

 

Thanks again!

Link to comment
Share on other sites

Joel,

 

Again, very helpful! I admit, though, that I don't understand some of the different results. You say the number 0.12 is the hit count for =נתן for the entire Hebrew corpus. Why would that number differ from a search of just =נתן without any specified range, which returns a hit count of 4.22?

 

I understand the difference in restricting the results from a search of the entire corpus and restricting my search *to* a specific corpus. I don't, however, still understand where the 0.12 number is coming from. 

 

Thanks again!

 

Okay, I figured it out... נתן occurs every 4.22 (or so) times / 1,000 words over the entire Hebrew corpus. It occurs roughly 2.2x /1,000 words in Isaiah. And if you only count the number of occurrences in Isaiah and divide that by total number of words in the Hebrew corpus, you get 0.12.

 

My trouble is that last one is seems to be a somewhat unhelpful number as you are comparing נתן in Isa to other instances of נתן outside Isa. Still, it was what I was asking the search to do!

 

Thanks for your help, Joel! 

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...