Ambiguity in NLP

Aakrit Singhal
8 min readSep 23, 2021

--

This will be a paper review of Natural Language Processing: A Perspective from Computation in Presence of Ambiguity, Resource Constraint and Multilinguality.

Rationale

NLP’s 3 dimensions [1]

The main aim of NLP is to make computers understand natural languages. Computers are fairly efficient at understanding structured data forms like tables, databases, and spreadsheets, however understanding natural languages comes under the unstructured form of data, and that is where NLP techniques are used. This paper explains language processing by distinctly explaining the trinity of NLP: Language, algorithm, and problem. This paper effectively explains the stages and tasks such as morphology analysis, Part of speech tagging, Named Entity Recognition, parsing (deep and shallow), Semantics extractions, Pragmatics, and Discourse Processing, and also the ambiguities associated with language processing.

Introduction : Process of Ambiguity

In example 1 as shown, the word ‘Son’ has a sense ambiguation, wherein in the WordNet, it has two senses — one of an offspring and the other of the Son of God. Hence, as clearly seen, the first lingual sense should be applied here. Next comes the phrasal verb/multiword which is ‘get up’. We cannot use the individual meanings of ‘get’ and ‘up’ since the phrase is supposed to be considered as a single entity to understand its true meaning. Individually extracting the meaning of both words would not make factual sense to the phrase. In addition, this multiword is ambiguous since it can have two meanings/senses — to wake up, or to rise. We also know that the word ‘open’ has a POS ambiguity of either being a noun, verb, or adjective. As stated, here it is an adjective and has the 21st sense stored in the WordNet as its meaning. The question ‘Should you bunk?’ is interpreted as an ellipsis, thereby it is filled with the word ‘school’ as the place that is being bunked. Similar to before, ‘bunk’ has both POS and sense ambiguity. The word ‘father’ has been stated thrice. The first one involves pragmatics and the ambiguity lies in situation specific information. The second one shows co-reference disambiguation and is associated to John, which I considered as the Principal of the school. Co reference disambiguation is when pronouns are used without the direct association to nouns, wherein they have more than one noun that can be referred to, as also explained by the example of the dog and cat in the paper. In languages such as Hindi and German, proper nouns cannot be identified by looking at capitalization, while in English, named entity recognition is easier since it has capitalization of proper nouns.

As seen by an example, NER in English also contains person-place-organization disambiguation which states that two same proper nouns can further be associated to two different nouns. The paper also states that sentencification, tokenization, and morphology also contain ambiguity due to the large number of morphological forms of English as a language (1st/2nd/3rd person, gender, plurality).

Stages of NLP and its Ambiguities

Phonology and Phonetics

In this the first problem is homophony, wherein two similar sounding words have different meanings. Near homophony is when rapid speech causes two different words to appear similar sounding. Word boundary detection is when one string of pronunciation of word can be split into two ways, each having different meanings.

Morphology

Study of the internal structure of word forms though various sources such as root words, lexemes, and processes as stated. Thereby, different languages have different morphological richness based on the number of word forms they have existing. The ambiguity here is as mentioned earlier (splitting of strings into different forms of words).

Lexicon

Word information is stored here, which can be used in applications such as Question Answering and information extraction. These are details such as the POS, Semantic tag, and Morphology. Word sense disambiguation here is finding meaning through context of the sentence. These word senses have been stored on WordNet.

Parsing

Deciphering the structure of a sentence and its association is parsing. It contains grammar rules like PP à P NP to indicate that a preposition = preposition + noun. This involved structural ambiguity. This has (a) Scope ambiguity — refers to the amount of words a particular word refers to. For instance, in example 5, it is not known whether both men and women are taken to safe locations, or that was only within the scope of the women. (b) Attachment Ambiguity — refers to the uncertainty of attaching a phrase or a clause to a part of a sentence, and arises when prepositions can play a dual role. In example 7, is ambiguous if the telescope is with the Boy, or the person who saw the boy. Rule based solutions involve making rules for attachment and selection preferences which come from WordNet, while ML based approach have an annotated corpus and make machines learn attachments.

Semantics Processing

Refers to extracting the meanings of words. Even though there might be multiple meanings, it is a rule that a sentence needs to be represented in one unambiguous form from predicate calculus, semantic net, frame, conceptual dependency, conceptual structure, etc. The paper also states that IIT Bombay has been using the UNL Framework wherein the main verb of a sentence is the entry point, individual nodes are called universal words, and edges connecting them are the semantic relations. The framework consists Universal Words with restrictions on them (done by parentheses; done because they are disambiguated), relation (edges), and Attributes (added to nodes to indicate eg: plurality, tense, main predicate, emphasis, topicalization, speech act, etc.).

Pragmatics Processing

This is one of the most difficult stages of NLP due to the fact that is deals with understanding user sentiment, intentions, and emotion. As shown by Example 12, ‘are my sandals there’ can either be asked by the user for just known if the sandals are there are not, or it can be asked in the intention of urgency, and taking the action to see if the sandals are there. Large-scale context, history, intent, and tone make this task difficult.

Discourse Processing

Refers to the task of processing connected sentences. As seen in Example 13, for each sentence, the meaning/association of ‘John’ changes, however the last sentence which associates John to be a Janitor completely disregards the previous three sentences.

Textual Humour and Ambiguity

Refers to computational humour and how humour is caused by incongruity, and incongruity is caused by Ambiguity. Thereby, this shows the relation of Humour and Ambiguity.

Resource Constrained Word Sense Disambiguation

This part focuses on one type of ambiguity i.e the lexical/sense ambiguity. Word sense disambiguation is taking a sense id and marking it on a target word WT, where the sense id comes from Wordnet. As stated earlier, the ML approach (annotated corpora) is one way to handle this, but a sense marked corpora is extensive to create. As seen by Table 1, a WSD system which is domain specific (of a particular language-domain pair) can extend to both language and domain axis, filling in the entire matrix, thereby it performs better than a general domain WSD system.

Parameters for WSD

Most important parameter is Domain specific information, which is when in a wordnet there are multiple meanings of a word, each meaning refers to the context of a particular domain eg: tourism domain. Others include Wordnet dependent parameters and Corpus-dependent parameters.

Scoring Function for WSD

Based on parameters a scoring function for WSD has been devised. This uses disambiguation of words, sense distributions learnt, domain dominant concepts, and interaction of synset in the sentence with others. The formula 1 shows the energy due to self-activation of a neuron as θi * Vi and the other part of Wij*Vi*Uj represents the weight of the connections between two neurons in terms of corpus co-occurrence, and conceptual and wordnet-based distance with other words in the sentence.

WSD Algorithm employing the scoring function

The algorithm was iterative: First, it tagged the monosemous words in the sentence, then disambiguates the other words to increase the polysemy degree, and then picks the sense for a word with the highest score (calculated by Equation 1). The paper uses a tourism domain sense marked corpora, whose data was downloaded from the Indian tourism websites. Table 2,3, and 4 show the tourism and health domain data, and the polysemy words, average degree of wordnet polysemy, and average degree of corpus polysemy for each category. The algorithms for this used and its function is stated in the paper. IWSD and EGS allow the comparison of scoring function in greedy vs. iterative settings, PPR, SVM, and McCarthy et al. help to compare it with other algorithms for WSD, and RB, WFS, and MFS compare results with the reported baselines for WSD. As shown by Table 5, IWSD beat both RB and WFS by a large amount with respect to Precision, recall, and F-scores where precision is True Positives/(true positives + false positives) and recall is true positives/(true positives + false negatives). MFS required large amounts of sense marked corpora, making it difficult to compare, however IWSD comes very close to MFS scores. Lastly, SVM outperforms IMFS.

Parameter Projection

The scoring function shows that wordnet and sense marked corpora are costly resources that are required. However, both cost and time can be reduced by using an expansion approach. This can be done using a synset based multilingual dictionary wherein the words within synsets of different languages are cross linked, as shows In Table 6 and Fig.3. Here, Hindi is the pivot language linking other Indian languages.

As shown in the paper, for a word W in language L, it has k sense. First, the probability of sense will be found given the word expressed as Formula 2. The words of the other language will be cross linked to the meaning of those words in Hindi as shown by Fig.5. And then the probability of those words will be calculated sing the cross linked hindi words as shown by the formula for probability in the paper (this is used for calculating sense distributions of Marathi words using the sense marked Hindi corpus from the same domain. This helps to calculate the relative values of sense distribution of words. Table 6 shows the sense distribution of Marathi words learnt from its self corpora with those projected from Hindi sense tagged corpus, which states the probability of certain Marathi words having certain meanings in Marathi, and Hindi. Table 7 shows the results of IWSD, PageRank, and Wordnet Baseline when it was run on Marathi and Bengali test corpora, and trained on Hindi corpora. This table reinforced the strong performance of IWSD compared to PageRank and WordNet Baseline.

Conclusion

This paper explained the stages of language processing and all the ambiguities associated with it, then it went in depth on WSD and explained way to solve it given resource constraints. It also projected sense distributions in corpora from one language to the other and explained the parameters, scoring function, and comparison between different algorithms using domain specific data.

Reference

[1] Bhattacharyya, P.. “Natural Language Processing : A Perspective from Computation in Presence of Ambiguity , Resource Constraint and Multilinguality.” (2012).

--

--