Searching for frequent words in a passage

August 17, 2012

Hi,

I'm wondering whether it possible to search within a given passage to find a set of words that appear (substantially) more frequently in that passage than they do elsewhere in the book/OT? For example, if I was looking at 1 Sam 9, the verb מצא appears 7 times in 721 words, which is a frequency of 1/100ish. But generally in the OT the verb מצא appears only 457 times in 500,000 odd words - a frequency more like 1/1000. Is there any way I could give accordance a passage, and have it tell me which words in that passage are more frequent than I might expect?

Such a search would aid significantly in identifying thematic concepts (Leitwörter) in narrative analysis.

Thanks for your help,

Nathan

August 17, 2012

I would search for all the words with * in word search and a range. Then try the different Analysys options in Details. Some of the count options relate to the relative frequency compared to other passages, but they won't give exact numbers. For those you'll need to compare searches in and outside of your range.

See this Help page for details:

Comprehensive Reference > Dialog Boxes > Set XX Display > Set Analysis Display Dialog Box for Non-Tagged Texts

i.e.:

Count Pop-up Menu: Lets you choose the statistical method by which each form is counted. You can choose from:

Number displays the total number of "hits" (times each word appears) in the current search range.
Frequency identifies the words which appear most frequently in the current search range. This number shows the number of hits per 1000 words in the search range.
Uniqueness identifies those words which are more or less unique to the current search range. This number is computed by dividing the frequency of hits in the search range by the frequency of verses containing this word in the entire text.
Importance identifies those words which are most important in the current search range. This number is computed by multiplying the number of hits in the search range by the uniqueness of each word (as defined above).
None displays the words without any count.

August 17, 2012

Hi Helen,

Thanks for such a prompt response! I had no idea that information was available (I don't think I'd ever pressed cmd-T on that window before!). Still, I'm not sure how accordance is calculating these numbers - can you help? They don't seem to line up with the description in the help file.

So, for example, if I want to analyse אבראם in Genesis, I type:

[range gen] <and> אברהם

This tells me that אברהם occurs 133 times in 117 verses. The analysis window (set to "number") agrees. So far so good. When i switch the analysis window to frequency, it tells me that there are 0.28 occurrences per 1000 words in Genesis. This seemed very low to me, so I investigated. Searching for:

[range gen] <and> *

tells me that there are 28 648 words in Genesis, in 1533 verses. Shouldn't this mean that אבראם has a frequency of 133 / 28.648 = 4.64. Where did accordance come up with 0.28?

In the same way, uniqueness relies on the "frequency of verses containing the word in the entire text." I take it that the "entire text" refers to HMT-W4? So searching for * tells me that there are 427 030 words in HTM-W4 in 23 213 verses, and searching for אברהם tells me that it occurs 175 times in 159 verses. Therefore the frequency of verses containing the word in the entire text would be 159/23.213 = 6.85. Uniqueness should therefore be 4.64/6.85 = 0.677. Accordance tells me that it is 0.836?

Importance gives me the answer I expect (once I grant accordance's calculation of uniqueness) 133 * 0.836 = 111.1. Accordance gives me 111.

I'm sure I could figure out a way to use the analysis window, but I think I need to understand it better first! Any help?

August 17, 2012

This isn't an issue with the analysis, but actually an issue with Range. This is a common misconception within Accordance, cause it can get a little tricky:

There are two ways to specify range, either with the [RANGE] command, or the Range popup. They function similarly, but slightly differently. The Range popup only searches the chosen range. The [RANGE] command searches the entire text, but only displays results from the [RANGE]. For many searches, this doesn't make a difference. But, for some counting searches, or especially frequency in the Analysis, this matters a lot.

Since you did [RANGE gen], it searched the entire hebrew bible, but only gave you results within Genesis. So, your frequency, uniqueness, and other statistics apply when compared to the entire bible, not just the book of Genesis. If you want to limit those statistics, do a normal search for אברהם, but use the Range popup menu to select Genesis (you may have to define the Range). e.g. "Search within every Verse in Genesis".

Hope this helps clarify!

August 17, 2012

Joel,

that is the best and clearest description of the difference between the two "ranges" that I have seen! Thanks!!!

August 17, 2012

Glad to help! To clarify [RANGE] a bit, it actually respects the Range menu as well. So, when I said above that [RANGE] searches the 'entire text', that should really read "searches the entire 'Range popup menu'", which is more technically confusing, but more accurate. Using this information, though, you can actually construct interesting searches, such as [COUNT 5] [RANGE Gen] with Range of the Pentateuch. That will give you all the words in Genesis that are used exactly 5 times throughout the pentateuch. Very powerful, but can get a bit tricky.

August 18, 2012

Joel, thanks. That is indeed a very helpful reply. I get something close to the results expected using the Range popup, but still not exactly. I suspect the reason is that searching for * isn't giving me an accurate idea of how many words are in the book?

If I look at a table analysis for a search * on the book of Genesis (using the Range popup) I get this:

Total Hits per

Hits 1000 words

Genesis28648 889.16

Why does * only find 889.16 hits per 1000 words? I would have thought it would find all words? Looking at the highlighted text after the search, it doesn't seem to find pronominal suffixes. But, I assume that when statistics (like uniqueness and importance) are calculated, they include all "words"? What's the correct way for me to find out how many "words" are in a text, so that I can use your analysis window, but supplement these figures with some of my own calculations?

Or am I on the wrong track entirely here? Thanks again.

[EDIT] Never mind that last question. I see that the table analysis will also tell me how many words are in the book. I take it that omitting pronominal suffixes from a search for * is intentional, and including them in the statistics is the reason all my figures are off?

Edited August 18, 2012 by Nathan Lovell

August 18, 2012

The count of total words includes the punctuation so it's never entirely accurate. To include the suffixes search for each inflected form rather than each lemma (they don't have one) by using "*".

Searching for frequent words in a passage

Recommended Posts

Nathan Lovell

Link to comment

Share on other sites

Helen Brown

Link to comment

Share on other sites

Nathan Lovell

Link to comment

Share on other sites

Joel Brown

Link to comment

Share on other sites

Ken Simpson

Link to comment

Share on other sites

Joel Brown

Link to comment

Share on other sites

Nathan Lovell

Link to comment

Share on other sites

Helen Brown

Link to comment

Share on other sites

Please sign in to comment

Browse

Activity