Skip to main content

Command Palette

Search for a command to run...

Comparison of 4 bigram association measures

Published
6 min read
M

Mohamad's interest is in Programming (Mobile, Web, Database and Machine Learning). He is studying at the Center For Artificial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia (UKM).

  1. Pointwise Mutual Information (PMI)

  2. Mutual Information (MI)

  3. Log-Likelihood Ratio (LLR)

  4. Chi-square (χ²) Test


Bigram association measures are statistical metrics used to quantify the association or relationship between pairs of consecutive words (bigrams) in a corpus or text dataset. They help identify significant and meaningful associations between words, which can be useful in various natural language processing (NLP) tasks, such as text classification, information retrieval, and language modeling. Here are some commonly used bigram association measures and their characteristics:

  1. Pointwise Mutual Information (PMI):

    • PMI measures the degree of association by comparing the observed frequency of a bigram with the expected frequency under independence.

    • It calculates the log-ratio of the joint probability of the bigram to the product of the individual word probabilities.

    • PMI can capture both positive and negative associations, and higher values indicate stronger associations.

    • Suitable Scenario: PMI is widely used in applications like keyword extraction, sentiment analysis, and collocation detection. It is suitable when you need to identify statistically significant word associations in a given corpus.

  2. Mutual Information (MI):

    • MI measures the amount of information that two words share by comparing their joint probability with their individual probabilities.

    • It calculates the difference between the joint probability and the product of the individual word probabilities.

    • MI can capture both positive and negative associations, and higher values indicate stronger associations.

    • Suitable Scenario: MI is commonly used in information retrieval tasks, such as document retrieval and query expansion, where identifying word associations helps improve search accuracy.

  3. Log-Likelihood Ratio (LLR):

    • LLR measures the likelihood of the observed bigram frequency compared to the expected frequency under a null hypothesis of independence.

    • It calculates the log-ratio of the observed frequency to the expected frequency.

    • LLR can capture both positive and negative associations, and higher values indicate stronger associations.

    • Suitable Scenario: LLR is often used in text classification tasks, such as spam detection and sentiment analysis, where identifying significant word associations can improve classification accuracy.

  4. Chi-square (χ²) Test:

    • The chi-square test measures the difference between the observed frequency and the expected frequency of a bigram under the null hypothesis of independence.

    • It calculates the chi-square statistic, which indicates the deviation from independence.

    • Chi-square can capture both positive and negative associations, and higher values indicate stronger associations.

    • Suitable Scenario: The chi-square test is commonly used in feature selection for text classification and information retrieval tasks. It helps identify informative bigrams that are more likely to be associated with specific classes or topics.

When choosing a suitable bigram association measure, consider the specific requirements of your NLP task and the nature of the data. PMI, MI, LLR, and chi-square are all popular measures, but their suitability depends on the particular application and the desired interpretation of word associations.


When comparing the effectiveness of Pointwise Mutual Information (PMI) with Log-Likelihood Ratio (LLR) or Chi-square (χ²) for bigram association measures, it's important to consider the characteristics and performance of each measure in relation to the specific task and dataset. Here are some factors to consider:

  1. Sensitivity to Sparse Data: PMI tends to handle sparse data better than LLR and χ². Since sentiment analysis datasets often contain sparse data, PMI can provide more reliable and meaningful associations, even with low-frequency bigrams. LLR and χ² may struggle to produce robust associations when dealing with sparse data.

  2. Balanced Handling of Positive and Negative Associations: PMI, LLR, and χ² can all capture both positive and negative associations. However, PMI is particularly effective in sentiment analysis because it can identify associations that indicate sentiment regardless of the polarity. It is crucial to detect both positive and negative sentiment indicators accurately, and PMI's ability to handle associations in both directions makes it suitable for this task.

  3. Interpretability: PMI is often considered more interpretable than LLR and χ². PMI is based on the concept of information theory and provides a log-ratio that quantifies the strength of association. This log-ratio can be easily interpreted as a measure of how surprising or informative the association is. On the other hand, LLR and χ² values are statistical test statistics that indicate the deviation from independence, which may be less intuitive to interpret directly.

  4. Computational Complexity: PMI, LLR, and χ² have similar computational complexity and are relatively efficient to compute. Therefore, the computational aspect is not a significant differentiator when comparing their effectiveness.

It's worth noting that the effectiveness of bigram association measures can vary depending on the specific dataset, task requirements, and the underlying characteristics of the text being analyzed. It's recommended to experiment with different measures and evaluate their performance on your specific sentiment analysis task to determine the most effective measure for your particular scenario.


The following are a few examples of sentiment analysis tasks where Pointwise Mutual Information (PMI) could be more effective than Log-Likelihood Ratio (LLR) or Chi-square (χ²):

  1. Fine-Grained Sentiment Analysis: In fine-grained sentiment analysis, the goal is to classify text into multiple sentiment categories, such as positive, negative, and neutral, or even more specific emotions like joy, sadness, anger, etc. PMI can be more effective in this scenario because it captures specific word associations that are indicative of sentiment. It can help identify sentiment-bearing bigrams that are highly informative for distinguishing between different sentiment categories.

  2. Domain-Specific Sentiment Analysis: Sentiment analysis often needs to be tailored to specific domains, such as product reviews in e-commerce or social media discussions about movies. In these cases, PMI can be more effective as it captures domain-specific word associations that are relevant for sentiment analysis within that particular domain. By identifying associations specific to the domain, PMI can provide more accurate sentiment analysis results compared to LLR or χ².

  3. Sentiment Lexicon Expansion: Sentiment lexicons are valuable resources for sentiment analysis, containing words and phrases mapped to their associated sentiment polarity. PMI can be useful for expanding sentiment lexicons by identifying new sentiment-bearing bigrams that are highly associated with sentiment categories. It can help uncover meaningful word associations that may not be captured by LLR or χ², contributing to the enrichment and improvement of sentiment lexicons.

  4. Comparative Sentiment Analysis: Comparative sentiment analysis involves comparing sentiment between multiple entities or aspects within a text, such as comparing the sentiment towards different products in a set of reviews. PMI can be more effective in this scenario as it can identify specific word associations that indicate comparative sentiment. It can help uncover significant associations that indicate preference, comparison, or contrast between different entities or aspects.

It's important to note that the effectiveness of PMI, LLR, and χ² can still depend on the specific dataset, task requirements, and underlying characteristics of the text being analyzed. Therefore, it is recommended to experiment and evaluate different measures on your specific sentiment analysis task to determine the most effective approach for your particular scenario.