No Sentence Is Too Confusing To Ignore

  • Published on

  • View

  • Download


  • Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, ACL 2010, pages 6169,Uppsala, Sweden, 16 July 2010. c2010 Association for Computational Linguistics

    No sentence is too confusing to ignore

    Paul CookDepartment of Computer Science

    University of TorontoToronto, Canada

    Suzanne StevensonDepartment of Computer Science

    University of TorontoToronto, Canada


    We consider sentences of the form NoX is too Y to Z, in which X is a nounphrase, Y is an adjective phrase, and Zis a verb phrase. Such constructions areambiguous, with two possible (and oppo-site!) interpretations, roughly meaning ei-ther that Every X Zs, or that No X Zs.The interpretations have been noted to de-pend on semantic and pragmatic factors.We show here that automatic disambigua-tion of this pragmatically complex con-struction can be largely achieved by us-ing features of the lexical semantic prop-erties of the verb (i.e., Z) participating inthe construction. We discuss our experi-mental findings in the context of construc-tion grammar, which suggests a possibleaccount of this phenomenon.

    1 No noun is too adjective to verb

    Consider the following two sentences:

    (1) No interest is too narrow to deserve its ownnewsletter.

    (2) No item is too minor to escape his attention.

    Each of these sentences has the form of No X is tooY to Z, where X, Y, and Z are a noun phrase, ad-jective phrase, and verb phrase, respectively. Sen-tence (1) is generally taken to mean that every in-terest deserves its own newsletter, regardless ofhow narrow it is. On the other hand, (2) is typi-cally interpreted as meaning that no item escapeshis attention, regardless of how minor it is. Thatis, sentences with the identical form of No X is tooY to Z either can mean that every X Zs, or canmean the oppositethat no X Zs!1

    1Note that in examples (1) and (2), the nouns interest anditem are the subjects of the verbs deserve and escape, respec-

    This verbal illusion (Wason and Reich, 1979),so-called because there are two opposite inter-pretations for the very same structure, is of in-terest to us for two reasons. First, the con-tradictory nature of the possible meanings hasbeen explained in terms of pragmatic factors con-cerning the relevant presuppositions of the sen-tences. According to Wason and Reich (1979)(as explained in more detail below), sentencessuch as (2) are actually nonsensical, but peoplecoerce them into a sensible reading by revers-ing the interpretation. One of our goals in thiswork is to explore whether computational lin-guistic techniquesspecifically automatic corpusanalysis drawing on lexical resourcescan helpto elucidate the factors influencing interpretationof such sentences across a collection of actual us-ages.

    The second reason for our interest in this con-struction is that it illustrates a complex ambigu-ity that can cause difficulty for natural languageprocessing applications that seek to semanticallyinterpret text. Faced with the above two sen-tences, a parsing system (in the absence of spe-cific knowledge of this construction) will presum-ably find the exact same structure for each, giv-ing no basis on which to determine the correctmeaning from the parse. (Unsurprisingly, whenwe run the C&C Parser (Curran et al., 2007) on (1)and (2) it assigns the same structure to each sen-tence.) Our second goal in this work is thus to ex-plore whether increased linguistic understandingof this phenomenon could be used to disambiguatesuch examples automatically. Specifically, we usethis construction as an example of the kind ofdifficulties faced in semantic interpretation whenmeaning may be determined by pragmatic or otherextra-syntactic factors, in order to explore whether

    tively. In this construction the noun can also be the object ofthe verb, as in the title of this paper which claims no sentencecan/should be ignored.


  • lexical semantic features can be used as cues toresolving pragmatic ambiguity when a complexsemantico-pragmatic model is not feasible.

    In the remainder of this paper, we present thefirst computational study of the No X is too Y toZ phenomenon, which attempts to automaticallydetermine the meaning of instances of this seman-tically and pragmatically complex construction. InSection 2 we present previous analyses of thisconstruction, and our hypothesis. In Section 3,we describe the creation of a dataset of instancesthat verifies that both interpretations (every andno) indeed occur in corpora. We then analyzethe human annotations in this dataset in more de-tail in Section 4. In Section 5, we present the fea-ture model we use to describe the instances, whichtaps into the lexical semantics and polarity of theconstituents. In Section 6, we describe machinelearning experiments and classification results thatsupport our hypothesis that the interpretation ofthis construction largely depends on the semanticsof its component verb. In Section 7 we suggest thatour results support an analysis of this phenomenonwithin construction grammar, and point to somefuture directions in our research in Section 8.

    2 Background and our proposal

    The No X is too Y to Z construction was investi-gated by Wason and Reich (1979), and discussedmore recently by Pullum (2004) and Liberman(2009a,b). Here we highlight some of the mostimportant properties of this complex phenomenon.Our presentation owes much to the lucid discus-sion and clarification of this topic, and of the workof Wason and Reich specifically, by Liberman.

    Wason and Reich argue that the compositionalinterpretation of sentences of the form of (1) and(2) is every X Zs. Intuitively, this can be under-stood by considering a sentence identical to sen-tence (1), but without a negative subject: This in-terest is too narrow to deserve its own newslet-ter, which means that this interest is so narrowthat it does not deserve a newsletter. This ex-ample indicates that the meaning of too narrowto deserve its own newsletter is so narrow thatit does not deserve a newsletter. When this neg-ative too assertion is compositionally combinedwith the No interest subject of sentence (1), it re-sults in a meaning with two negatives: No inter-est is so narrow that it does not deserve a newslet-ter, or simply, Every interest deserves a newslet-

    ter. Wason and Reich note that in sentences suchas (1), the compositional every interpretation isconsistent with common beliefs about the world,and thus refer to such sentences as pragmatic.

    By contrast, the compositional interpretation ofsentences such as (2) does not correspond to ourcommon sense beliefs. Consider an analogous(non-negative subject) sentence to sentence (2)i.e., This item is too minor to escape his attention.It is nonsensical that This item is so minor thatit does not escape his attention, since being moreminor entails more likelihood of escaping atten-tion, not less. The compositional interpretation of(2) is similarly nonsensicali.e., that No itemis so minor that it does not escape his attention;Such sentences are thus termed non-pragmaticby Wason and Reich, who argue that the com-plexity of the non-pragmatic sentencesarising inpart due to the number of negations they containcauses the listener or reader to misconstrue them.According to their reasoning, listeners choose aninterpretation that is consistent with their beliefsabout the worldnamely that no X Zs, in thiscase that No item escapes his attentioninsteadof the compositional interpretation (Every itemescapes his attention).

    While Wason and Reich focus on the compo-sitional semantics and pragmatics of these sen-tences, they also note that the non-pragmatic ex-amples typically use a verb that itself has someaspect of negation, such as ignore, miss, and over-look. This property is also pointed out by Pullum(2004), who notes that avoid in his example ofthe construction means manage to not do some-thing. Building on this observation, we hypothe-size that lexical properties of the component con-stituents of this construction, particularly the verb,can be important cues to its semantico-pragmaticinterpretation. Specifically, we hypothesize thatthe pragmatic (every interpretation) and non-pragmatic (no interpretation) sentences will tendto involve verbs with different semantics. Giventhat verbs of different semantic classes have differ-ent selectional preferences, we also expect to seethe every and no sentences associated with se-mantically different nouns and adjectives.

    3 Dataset

    3.1 Extraction

    To create a dataset of usages of the constructionno NP is too AP to VPreferred to as the tar-


  • get constructionwe use two corpora: the BritishNational Corpus (Burnard, 2000), an approxi-mately one hundred million word corpus of late-twentieth century British English, and The NewYork Times Annotated Corpus (Sandhaus, 2008),approximately one billion words of non-newswiretext from the New York Times from the years19872006. We extract all sentences in these cor-pora containing the sequence of strings no, is too,and to separated by one or more words. We thenmanually filter all sentences that do not have noNP as the subject of is too, or that do not have toVP as an argument of is too. After removing dupli-cates, this results in 170 sentences. We randomlyselect 20 of these sentences for development data,leaving 150 sentences for testing.

    Although we find only 170 examples of thetarget construction in 1.1 billion words of text,note that our extraction process is quite strict andmisses some relevant usages. For example, we donot extract sentences of the form Nothing is too Yto Z in which the subject NP does not contain theword no. Nor do we extract usages of the relatedconstruction No X is too Y for Z, where Z is an NPrelated to a verb, as in No interest is too narrowfor attention. (We would only extract the latter ifthere were an infinitive verb embedded in or fol-lowing the NP.) In the present study we limit ourconsideration to sentences of the form discussedby Wason and Reich (1979), but intend to con-sider related constructions such as thesewhichappear to exhibit the same ambiguity as the targetconstructionin the future.

    We next manually identify the noun, adjective,and verb that participate in the target constructionin each sentence. Although this could be done au-tomatically using a parser (e.g., Collins, 2003) orchunker (e.g., Abney, 1991), here we want to en-sure error-free identification. We also note a num-ber of sentences containing co-ordination, such asin the following example.

    (3) These days, no topic is too recent orspecialized to disqualify it from museumapotheosis.

    This sentence contains two instances of the tar-get construction: one corresponding to the noun-adjective-verb triple topic, recent, disqualify, andthe other to the triple topic, specialized, disqual-ify. In general, we consider each unique noun-adjective-verb triple participating in the target con-struction as a separate instance.

    3.2 Annotation

    We used Amazon Mechanical Turk (AMT, to obtain judge-ments as to the correct interpretation of each in-stance of the target construction in both the devel-opment and testing datasets. For each instance, wegenerated two paraphrases, one corresponding toeach of the interpretations discussed in Section 1.We then presented the given instance of the targetconstruction along with its two paraphrases to an-notators through AMT, as shown in Table 1. Ingenerating the paraphrases, one of the authors se-lected the most appropriate paraphrase, in theirjudgement, where can in the paraphrases in Ta-ble 1 was selected from can, should, will, and .Note that the paraphrases do not contain the ad-jective from the target construction. In the case ofmultiple instances of the target construction withdiffering adjectives but the same noun and verb,we only solicited judgements for one instance, andused these judgements for the other instances. Inour dataset we observe that all instances obtainedfrom the same sentence which differ only with re-spect to their noun or verb have the same inter-pretation. We therefore believe that instances withthe same noun and verb but a different adjectiveare unlikely to differ in their interpretation.


    Read the sentence below.

    Based on your interpretation of that sen-tence, select the answer that most closelymatches your interpretation.

    Select I dont know if neither answer isclose to your interpretation, or if you arereally unsure.

    That success was accomplished in large part totight control on costs , and no cost is too smallto be scrutinized .

    Every cost can be scrutinized.

    No cost can be scrutinized.

    I dont know.

    Enter any feedback you have about this HIT. Wegreatly appreciate you taking the time to do so.

    Table 1: A sample of the Amazon MechanicalTurk annotation task.


  • We also allowed the judges to optionally enterany feedback about the annotation task which insome casesdiscussed in the following sectionwas useful in determining whether the judgesfound a particular instance difficult to annotate.2

    For each instance of the target construction weobtained three judgements from unique workerson AMT. For approximately 80% of the items,the judgements were unanimous. In the remainingcases we solicited four additional judgements, andused the majority judgement. We paid $0.05 perjudgement; the average time spent on each annota-tion was approximately twenty seconds, resultingin an average hourly wage of about $10.

    The development data was also annotated bythree native English speaking experts (compu-tational linguists with extensive linguistic back-ground, two of whom are also authors of this pa-per). The inter-annotator agreement among thesejudges is very high, with pairwise observed agree-ments of 1.00, 0.90, and 0.90, and correspondingunweighted Kappa scores of 1.00, 0.79, and 0.79.The majority judgements of these annotators arethe same as those obtained from AMT on the de-velopment data, giving us confidence in the reli-ability of the AMT judgements. These findingsare consistent with those of Snow et al. (2008) inshowing that AMT judgements can be as reliableas those of expert judges.

    Finally, we remove a small number of itemsfrom the testing dataset which were difficult toparaphrase due to ellipsis of the verb participatingin the target construction, or an extra negation inthe verb phrase. We further remove one sentencebecause we believe the paraphrases we providedare in fact misleading. The number of sentencesand of instances (i.e., noun-verb-adjective triples)of the target construction in the development andtesting datasets is given in Table 2. 160 of the 199testing instances (80%) have the every interpre-tation, with the remainder having the no inter-pretation.

    4 Analysis of annotation

    We now more closely examine the annotations ob-tained from AMT to better determine the extent to

    2In other cases the comments were more humourous. Inresponse to the following sentence If youve ever yearnedto live on Sesame Street, where no problem is too big to besolved by a not-too-big slice of strawberry-rhubarb pie, thisis the spot for you, one judge told us her preferred types ofpie.

    Dataset # sentences # instancesDevelopment 20 33Test 140 199

    Table 2: The number of sentences containing thetarget construction, and the number of resulting in-stances.

    which they are reliable. We also consider specificinstances of the target construction that are judgedinconsistently to establish some of the causes ofdisagreement.

    One of the three experts who annotated the de-velopment items (discussed in Section 3.2) alsoannotated twenty items selected at random fromthe testing data. In this case two instances arejudged differently than the majority judgement ob-tained from AMT. These instances are given belowwith the noun, adjective and verb in the target con-struction underlined.

    (4) When it comes to the clash of candidates onnational television, no detail, it seems, is toominor for negotiation, no risk too small toeliminate.

    (5) Lectures by big-name Wall Street felons willshow why no swindler is too big to beat therap by peaching on small-timers.

    For sentence (4), the AMT judgements were unan-imously for the no interpretation whereas theexpert annotator chose the every interpretation.We are uncertain as to the reason for this disagree-ment, but are convinced that the every interpre-tation is the intended one.

    In the case of sentence (5), the AMT judge-ments were split fourthree for the every andno interpretations, respectively, while the ex-pert annotator chose the no interpretation. Forthis sentence the provided paraphrases were Ev-ery swindler can beat the rap and No swindlercan beat the rap. If attention in the sentenceis restricted to the target constructioni.e., noswindler is too big to beat the rap by peachingon small-timerseither of the no and everyinterpretations is possible. That is, this clausealone can mean that no swindler is big enoughto be able to beat the rap (the no interpreta-tion), or that no swindler is big enough that they


  • are above peaching on small-timers (or in otherwords, every swindler is able to beat the rap bypeaching on small-timers, the every interpreta-tion). However, the intention of the sentence as theno interpretation is clear from the referral in themain clause to big-name Wall Street felons, whichimplies that big swindlers have not beaten therap. Since the AMT annotators may not be devot-ing a large amount of attention to the task, theymay focus only on the target construction and notthe preliminary disambiguating material. In thisevent, they may be choosing between the everyand no interpretations based on how cynical theyare of the ability (or lack thereof) of the Americanlegal system to punish Wall Street criminals.

    We also examine a small number of examplesin the testing set which do not receive a clearmajority judgement from AMT. For this analysiswe consider items for which the difference in thenumber of judgements for each of the every andthe no interpretations is one or less This givesfour instances of the target construction, one ofwhich we have already discussed above, example(5); the others are presented below, again with thenoun, adjective, and verb participating in the targetconstruction underlined:

    (6) Where are our priorities when we socarefully weigh costs and medical efficacy indeciding to offer a medical lifeline to theelderly, yet no amount of money is too greatto spend on the debatable paths weve takenin our war against terror?

    (7) No neighborhood is too remote to diminishMr. Levines determination to discover andannounce some previously unheralded treat.

    (8) No one is too remote anymore to beconcerned about style, Ms. Hansensuggested.

    In example (6) the author is using the target con-struction to express somebody elses viewpointthat any amount should be spent on the waragainst terror. Therefore the literal reading ofthe target construction appears to be the everyinterpretation. However, this construction is be-ing used rhetorically (as part of the overall sen-tence) to express the authors belief that too muchmoney is being spent on the war against terror,which is close in meaning to the no interpreta-tion. It appears that the annotators are split be-tween these two readings. For sentence (7) the

    atypicality of neighbourhood as the subject of di-minish may make this instance particularly diffi-cult for the judges. Sentence (8) appears to us to bea clear example of the every interpretation. Theparaphrases for this usage are Everyone shouldbe concerned about style and No one should beconcerned about style. In this case it is possiblethat the judges are biased by their beliefs aboutwhether one should be concerned about style, andthat this is giving rise to the lack of agreement.These examples illustrate that some of these us-ages are clearly complex for people to annotate.Such complex examples may require more contextto be annotated with confidence.

    5 Model

    To test our hypothesis that the interaction of the se-mantics of the noun, adjective, and verb in the tar-get construction contributes to its pragmatic inter-pretation, we represent each instance in our datasetas a vector of features that capture aspects of thesemantics of its component words.

    WordNet To tap into general lexical semanticproperties of the words in the construction, weuse features that draw on the semantic classes ofwords in WordNet (Fellbaum, 1998). These bi-nary features each represent a synset in WordNet,and are turned on or off for the component words(the noun, adjective, and verb) in each instanceof the target construction. A synset feature is onfor a word if the synset occurs on the path fromall senses of the word to the root, and off other-wise. We use WordNet version 3.0 accessed usingNLTK version 2.0 (Bird et al., 2009).

    Polarity Because of the observation that theverb in the target construction, in particular, hassome property of negativity in the no interpre-tation, we also use features representing the se-mantic polarity of the noun, adjective, and verbin each instance. The features are tertiary, repre-senting positive, neutral, or negative polarity. Weobtain polarity information from the subjectivitylexicon provided by Wilson et al. (2005), and con-sider words to be neutral if they have both positiveand negative polarity, or are not in the lexicon.

    6 Experimental results

    6.1 Experimental setup

    To evaluate our model we conduct a 5-fold cross-validation experiment using the items in the test-


  • ing dataset. When partitioning the items in thetesting dataset into the five parts necessary for thecross-validation experiment, we ensure that all theinstances of the target construction from a singlesentence are in the same part. This ensures thatno instance used for training is from the same sen-tence as an instance used for testing. We furtherensure that the proportion of items in each class isroughly the same in each split.

    For each of the five runs, we linearly scale thetraining data to be in the range [1, 1], and ap-ply the same transformation to the testing data.We train a support vector machine (LIBSVM ver-sion 2.9, Chang and Lin, 2001) with a radial ba-sis function kernel on the training portion in eachrun, setting the cost and gamma parameters usingcross-validation on just the training portion, andthen test the classifier on the testing portion forthat run using the same parameter settings. Wemicro-average the accuracy obtained on each ofthe five runs. Finally, we repeat each 5-fold cross-validation experiment five times, with five randomsplits, and report the average accuracy over thesetrials.

    6.2 Results

    Results for experiments using various subsets ofthe features are presented in Table 3. We re-strict the component wordthe noun, adjective, orverbfor which we extract features to those listedin column Word, and extract only the featuresgiven in column Features (WordNet, polarity, orall). The majority baseline is 80%, correspondingto always selecting the every interpretation. Ac-curacies shown in boldface are significantly betterthan the majority class baseline using a paired t-test. (In all cases where the difference is signifi-cant, we obtain p 0.01.)

    We first consider the results using features ex-tracted only for the noun, adjective, or verb indi-vidually, using all features. The best accuracy inthis group of experiments, 87%, is achieved usingthe verb features, and is significantly higher thanthe majority baseline. On the other hand, the clas-sifiers trained on the noun and adjective featuresindividually perform no better than the baseline.These results support our hypothesis that lexicalsemantic properties of the component verb in theNo X is too Y to Z construction do indeed playan important role in determining its interpretation.Although we proposed that selectional constraints

    from the verb would also lead to differing seman-tics of the nouns and adjectives in the two interpre-tations, our WordNet features are likely too sim-plistic to capture this effect, if it does hold. Beforeruling out the semantic contribution of these wordsto the interpretation, we need to explore whethera more sophisticated model of selectional prefer-ences, as in Ciaramita and Johnson (2000) or Clarkand Weir (2002), yields more informative featuresfor the noun and adjective.

    Experimental setup % accuracyWord FeaturesNoun All 80Adjective All 80Verb All 87All WordNet 88All Polarity 80All All 88Majority baseline 80

    Table 3: % accuracy on testing data for each exper-imental condition and the majority baseline. Ac-curacies in boldface are statistically significantlydifferent from the baseline.

    We now consider the results using the WordNetand polarity features individually, but extracted forall three component words. The WordNet featuresperform as well as the best results using all fea-tures for all three words, which gives further sup-port to our hypothesis that the semantics of thecomponents of the target construction are relatedto its interpretation. The polarity features performpoorly. This is perhaps unsurprising as polarity isa poor approximation to the property of negativ-ity that we are attempting to capture. Moreover,many of the nouns, adjectives, and verbs in ourdataset either have neutral polarity or are not inthe polarity lexicon, and therefore the polarity fea-tures are not very discriminative. In future work,we plan to examine the WordNet classes of theverbs that occur in the no interpretation to try tomore precisely characterize the property of nega-tivity that these verbs tend to have.

    6.3 Error analysis

    To better understand the errors our classifier ismaking, we examine the specific instances whichare classified incorrectly. Here we focus on theexperiment using all features for all three com-ponent words. There are 23 instances which are


  • consistently mis-classified in all runs of the exper-iment. According to the AMT judgements, each ofthese instances corresponds to the no interpreta-tion. These errors reflect the bias of the classifiertowards the more frequent class, the every inter-pretation.

    We further note that two of the instances dis-cussed in Section 4examples (4) and (6)areamong those instances consistently classified in-correctly. The majority judgement from AMT forboth of these instances is the no interpretation,while in our assessment they are in fact the ev-ery interpretation. We are therefore not surprisedto see these items mis-classified as every.

    Example (8) was incorrectly classified in onetrial. In this case we agree with the gold-standardlabel obtained from AMT in judging this instanceas the every interpretation; nevertheless, thisdoes appear to be a difficult instance given the lowagreement observed for the AMT judgements.

    It is interesting that no items with an every in-terpretation are consistently misclassified. In thecontext of our overall results showing the impactof the verb features on performance, we concludethat the no interpretation arises due to particularlexical semantic properties of certain verbs. Wesuspect then that the consistent errors on the 21truly misclassified expressions (23 minus the 2 in-stances discussed above that we believe to be an-notated incorrectly) are due to sparse data. Thatis, if it is indeed the verb that plays a major role inleading to a no interpretation, there may simplybe insufficient numbers of such verbs for traininga supervised model in a dataset with only 39 ex-amples of those usages.

    7 Discussion

    We have presented the first computational study ofthe semantically and pragmatically complex con-struction No X is too Y to Z. We have developeda computational model that automatically disam-biguates the construction with an accuracy of 88%,reducing the error-rate over the majority-baselineby 40%. The model uses features that tap into thelexical semantics of the component words partic-ipating in the construction, particularly the verb.These results demonstrate that lexical propertiescan be successful in resolving an ambiguity pre-viously thought to depend on complex pragmaticinference over presuppositions (as in Wason andReich (1979)).

    These results can be usefully situated withinthe context of linguistic and psycholinguistic workon semantic interpretation processing. Beginningaround 20 years ago, work in modeling of humansemantic preferences has focused on the extent towhich properties of lexical items influence the in-terpretation of various linguistic ambiguities (e.g.,Trueswell and Tanenhaus, 1994). While semanticcontext and plausibility are also proposed to playa role in human interpretation of ambiguous sen-tences (e.g., Crain and Steedman, 1985; Altmannand Steedman, 1988), it has been pointed out thatit would be difficult to operationalize the com-plex interactions of presuppositional factors withreal-world knowledge in a precise algorithm fordisambiguation (Jurafsky, 1996). Although not in-tended as proposing a cognitive model, the workhere can be seen as connected to these lines of re-search, in investigating the extent to which lexicalfactors can be used as proxies to more hiddenfeatures that underlie the appropriate interpreta-tion of a pragmatically complex construction.

    Moreover, as in the approach of Jurafsky(1996), the phenomenon we investigate here maybe best considered within a constructional analy-sis (e.g., Langacker, 1987), in which both the syn-tactic construction and the particular lexical itemscontribute to the determination of the meaning of ausage. We suggest that a clause of the form No X istoo Y to Z might be the (identical) surface expres-sion of two underlying constructionsone withthe every interpretation and one with the nointerpretationwhich place differing constraintson the semantics of the verb. (E.g., in the nointerpretation, the verb typically has some neg-ative semantic property, as noted in Section 2.)Looked at from the other perspective, the lexicalsemantic properties of the verb might determinewhich No X is too Y to Z construction (and associ-ated interpretation) it is compatible with. Our re-sults support this view, by showing that semanticclasses of verbs have predictive value in selectingthe correct interpretation.

    Note that such a constructional analysis ofthis phenomenon assumes that both interpretationsof these sentences are linguistically valid, giventhe appropriate lexical instantiation. This standsin contrast to the analysis of Wason and Reich(1979), which presumes that people are apply-ing some higher-level reasoning to correct anill-formed statement in the case of the no in-


  • terpretation. While such extra-grammatical infer-ence may play a role in support of language under-standing when people are faced with noisy data, itseems unlikely to us that a construction that is usedquite readily and with a predictable interpretationis nonsensical according to rules of grammar. Ourresults point to an alternative linguistic analysis,one whose further development may also help toimprove automatic disambiguation of instances ofNo X is too Y to Z. In the next section, we discussdirections for future work that could elaborate onthese preliminary findings.

    8 Future Work

    One limitation of this study is that the dataset usedis rather small, consisting of just 199 instancesof the target construction. As discussed in Sec-tion 3.1, the extraction process we use to obtainour experimental items has low recall; in particularit misses variants of the target construction such asNothing is too Y to Z and No X is too Y for Z. Inthe future we intend to expand our dataset by ex-tracting such usages. Furthermore, the data usedin the present study is primarily taken from newstext. While we do not adopt the view of some thatusages of the target construction having the nointerpretation are errors, it could be the case thatsuch usages are more frequent in less formal text.In the future we also intend to extract usages ofthe target construction from datasets of less formaltext, such as blogs (e.g., Burton et al., 2009).

    Constructions other than No X is too Y to Z ex-hibit a similar ambiguity. For example, the con-struction X didnt wait to Y is ambiguous betweenX did Y right away and X didnt do Y at all(Karttunen, 2007). In the future we would like toextend our study to consider more such construc-tions which are ambiguous due to the interpreta-tion of negation.

    In Section 4 we note that for some instances thecomplexity of the sentences containing the targetconstruction may make it difficult for the anno-tators to judge the meaning of the target. In thefuture we intend to present simplified versions ofthese sentenceswhich retain the noun, adjective,and verb from the target construction in the orig-inal sentenceto the judges to avoid this issue.Such an approach will also help us to focus moreclearly on observable lexical semantic effects.

    We are particularly interested in further explor-ing the hypothesis that it is the semantics of the

    component verb that gives rise to the meaning ofthe target construction. Recall Pullums (2004)observation that the verb in the no interpretationinvolves explicitly not acting. Using this intuition,we have informally observed that it is largely pos-sible to (manually) predict the interpretation of thetarget construction knowing only the componentverb. We are interested in establishing the extent towhich this observation holds, and precisely whichaspects of a verbs meaning give rise to the inter-pretation of the target construction.

    Our current model of the semantics of the targetconstruction does not capture Wason and Reichs(1979) observation that the compositional mean-ing of instances having the no interpretation isnon-pragmatic. While we do not adopt their viewthat these usages are somehow errors, we dothink that their observation can indicate other pos-sible lexical semantic properties that may help toidentify the correct interpretation. Taking the clas-sic example from Wason and Reich, no head in-jury is too trivial to ignore, one clue to the nointerpretation is that generally a head injury is notsomething that is ignored. On the other hand, con-sidering Wason and Reichs example no missile istoo small to ban, it is widely believed that missilesshould be banned. We would like to add featuresthat capture this knowledge to our model.

    In preliminary experiments we have used co-occurrence information as an approximation tothis knowledge. (For example, we would expectthat head injury would tend to co-occur less withignore than with antonymous verbs such as treator address.) Although our early results usingco-occurrence features do not indicate that theyare an improvement over the other features con-sidered (WordNet and polarity), it may also bethe case that our present formulation of these co-occurrence features does not effectively capturethe intended knowledge. In the future we planto further consider such features, especially thosethat model the selectional preferences of the verbparticipating in the target construction.

    These several strands of future workincreasing the size of the dataset, improving thequality of annotation, and exploring additionalfeatures in our computational modelwill en-able us to extend our linguistic analysis of thisinteresting phenomenon, as well as to improveperformance on automatic disambiguation of thiscomplex construction.


  • Acknowledgments

    We thank Magali Boizot-Roche and TimothyFowler for their help in preparing the data for thisstudy. This research was financially supported bythe Natural Sciences and Engineering ResearchCouncil of Canada and the University of Toronto.


    Steven Abney. 1991. Parsing by chunks. In RobertBerwick, Steven Abney, and Carol Tenny, ed-itors, Principle-Based Parsing: Computationand Psycholinguistics, pages 257278. KluwerAcademic Publishers.

    Gerry T. M. Altmann and Mark Steedman. 1988.Interaction with context during human sentenceprocessing. Cognition, 30(3):191238.

    Steven Bird, Edward Loper, and Ewan Klein.2009. Natural Language Processing withPython. OReilly Media Inc.

    Lou Burnard. 2000. The British National Cor-pus Users Reference Guide. Oxford UniversityComputing Services.

    Kevin Burton, Akshay Java, and Ian Soboroff.2009. The ICWSM 2009 Spinn3r Dataset. InProc. of the Third International Conference onWeblogs and Social Media. San Jose, CA.

    Chih-Chung Chang and Chih-Jen Lin. 2001. LIB-SVM: a library for support vector machines.Software available at

    Massimiliano Ciaramita and Mark Johnson. 2000.Explaining away ambiguity: Learning verb se-lectional preference with Bayesian networks. InProceedings of the 18th International Confer-ence on Computational Linguistics (COLING2000), pages 187193. Saarbrucken, Germany.

    Stephen Clark and David Weir. 2002. Class-basedprobability estimation using a semantic hier-archy. Computational Linguistics, 28(2):187206.

    Michael Collins. 2003. Head-driven statisticalmodels for natural language parsing. Compu-tational Linguistics, 29(4):589637.

    Stephen Crain and Mark Steedman. 1985. Onnot being led up the garden path: The useof context by the psychological syntax pro-cessor. In David R. Dowty, Lauri Karttunen,and Arnold M. Zwicky, editors, Natural lan-guage parsing: Psychological, computational,and theoretical perspectives, pages 320358.Cambridge University Press, Cambridge.

    James Curran, Stephen Clark, and Johan Bos.2007. Linguistically motivated large-scale NLPwith C&C and Boxer. In Proceedings of the45th Annual Meeting of the Association forComputational Linguistics Companion VolumeProceedings of the Demo and Poster Sessions,pages 3336. Prague, Czech Republic.

    Christiane Fellbaum, editor. 1998. Wordnet: AnElectronic Lexical Database. Bradford Books.

    Daniel Jurafsky. 1996. A probabilistic model oflexical and syntactic access and disambigua-tion. Cognitive Science, 20(2):137194.

    Lauri Karttunen. 2007. Word play. ComputationalLinguistics, 33(4):443467.

    Ronald W. Langacker. 1987. Foundations ofCognitive Grammar: Theoretical Prerequisites,volume 1. Stanford University Press, Stanford.

    Mark Liberman. 2009a. No detail too small.Retrieved 9 February 2010 from

    Mark Liberman. 2009b. No wug is toodax to be zonged. Retrieved 9 February2010 from

    Geoffrey K. Pullum. 2004. Too complex toavoid judgment? Retrieved 7 April 2010 from

    Evan Sandhaus. 2008. The New York Times An-notated Corpus. Linguistic Data Consortium,Philadelphia, PA.

    Rion Snow, Brendan OConnor, Daniel Jurafsky,and Andrew Y. Ng. 2008. Cheap and fast Butis it good? Evaluating non-expert annotationsfor natural language tasks. In Proceedings ofEMNLP-2008, pages 254263. Honolulu, HI.

    John Trueswell and Michael J. Tanenhaus. 1994.Toward a lexicalist framework for constraint-based syntactic ambiguity resolution. InCharles Clifton, Lyn Frazier, and Keith Rayner,editors, Perspectives on Sentence Processing,pages 155179. Lawrence Erlbaum, Hillsdale,NJ.

    Peter Wason and Shuli Reich. 1979. A verbal il-lusion. The Quarterly Journal of ExperimentalPsychology, 31(4):591597.

    Theresa Wilson, Janyce Wiebe, and Paul Hoff-mann. 2005. Recognizing contextual polarity inphrase-level sentiment analysis. In Proceedingsof HLT/EMNLP-2005, pages 347354. Vancou-ver, Canada.