1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam.

  • Published on
    27-Mar-2015

  • View
    223

  • Download
    0

Transcript

  • Slide 1

1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam Slide 2 2 What kind of resource is wordnet? Mostly used database in language technology Enormous impact in language technology development Large Free and downloadable English Slide 3 WordNet http://wordnet.princeton.edu/ http://wordnet.princeton.edu/ Developed by George Miller and his team at Princeton University, as the implementation of a mental model of the lexicon Organized around the notion of a synset: a set of synonyms in a language that represent a single concept Semantic relations between concepts Covers over 117,000 concepts and over 150,000 English words Slide 4 4 Relational model of meaning manwoman boygirl cat kitten dog puppy animal man woman boy meisje cat kitten dog puppy animal Slide 5 Wordnet: a network of semantically related words {conveyance;transport} {vehicle} {motor vehicle; automotive vehicle} {car; auto; automobile; machine; motorcar} {bumper} {car door} {car window} {car mirror} {armrest} {doorlock} {hinge; flexible joint} {cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab} Slide 6artifact -> vehicle -> 4-wheeled vehicle -> car strict separation of part of speech although concepts are closely related (bed sleep) and are similar (dead death) lexicalization patterns reveal important mental structures"> 8 Some observations on Wordnet synsets are more compact representations for concepts than word meanings in traditional lexicons synonyms and hypernyms are substitutional variants: begin commence I once had a canary. The bird got sick. The poor animal died. hyponymy and meronymy chains are important transitive relations for predicting properties and explaining textual properties: object -> artifact -> vehicle -> 4-wheeled vehicle -> car strict separation of part of speech although concepts are closely related (bed sleep) and are similar (dead death) lexicalization patterns reveal important mental structures Slide 9 9 Lexicalization patterns 25 unique beginners garbage tree organism animal bird canarychurch building artifact object plant flower rose waste threat entity common canary abbey crocodiledog basic level concepts balance of two principles: predict most features apply to most subclasses where most concepts are created amalgamate most parts most abstract level to draw a pictures Slide 10 10 Wordnet top level Slide 11 11 Meronymy & pictures beak tail leg Slide 12 12 Meronymy & pictures Slide 13 13 Co-reference constraint in wordnet: Cats cannot be a kind of cats S: (n) cat, true cat (feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats)S:true cat S: (n) guy, cat, hombre, bozo (an informal term for a youth or man) "a nice guy"; "the guy's only doing it for some doll"S:guyhombrebozo S: (n) cat (a spiteful woman gossip) "what a cat she is!"S: S: (n) kat, khat, qat, quat, cat, Arabian tea, African tea (the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant) "in Yemen kat is used daily by 85% of adults"S:katkhatqatquatArabian teaAfrican tea S: (n) cat-o'-nine-tails, cat (a whip with nine knotted cords) "British sailors feared the cat"S:cat-o'-nine-tails S: (n) Caterpillar, cat (a large tracked vehicle that is propelled by two endless metal belts; frequently used for moving earth in construction and farm work)S:Caterpillar S: (n) big cat, cat (any of several large cats typically able to roar and living in the wild)S:big cat S: (n) computerized tomography, computed tomography, CT, computerized axial tomography, computed axial tomography, CAT (a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross- sectional scans along a single axis)S:computerized tomographycomputed tomographyCTcomputerized axial tomographycomputed axial tomography S: (n) domestic cat, house cat, Felis domesticus, Felis catus (any domesticated member of the genus Felis)S:domestic catFelis domesticusFelis catus Slide 14 14 Slide 15 15 Wordnet 3.0 statistics POSUniqueSynsetsTotal Strings Word-Sense Pairs Noun117,79882,115146,312 Verb11,52913,76725,047 Adjective21,47918,15630,002 Adverb4,4813,6215,580 Totals155,287117,659206,941 Slide 16 16 Wordnet 3.0 statistics POSMonosemousPolysemous Words and SensesWordsSenses Noun101,86315,93544,449 Verb6,2775,25218,770 Adjective16,5034,97614,399 Adverb3,7487331,832 Totals128,39126,89679,450 Slide 17 17 Wordnet 3.0 statistics POSAverage Polysemy Including Monosemous Words Excluding Monosemous Words Noun1.242.79 Verb2.173.57 Adjective1.42.71 Adverb1.252.5 Slide 18 18 http://www.visuwords.com Slide 19 19 Slide 20 20 Usage of Wordnet Improve recall of textual based analysis: Query -> Index Synonyms: commence begin Hypernyms: taxi -> car Hyponyms: car -> taxi Meronyms: trunk -> elephant Lexical entailments: gun -> shoot Inferencing: what things can burn? Expression in language generation and translation: alternative words and paraphrases Slide 21 21 Improve recall Information retrieval: small databases without redundancy, e.g. image captions, video text Text classification: small training sets Question & Answer systems query analysis: who, whom, where, what, when Slide 22text with specific cases "car""> 22 Improve recall Anaphora resolution: The girl fell off the table. She.... The glass fell of the table. It... Coreference resolution: When he moved the furniture, the antique table got damaged. Information extraction (unstructed text to structured databases): generic forms or patterns "vehicle" - > text with specific cases "car" Slide 23 23 Improve recall Summarizers: Sentence selection based on word counts -> concept counts Avoid repetition in summary -> language generation Limited inferencing: detect locations, organisations, etc. Slide 24 24 Many others Data sparseness for machine learning: hapaxes can be replaced by semantic classes Use redundancy for more robustness: spelling correction and speech recognition can built semantic expectations using Wordnet and make better choices Sentiment and opinion mining Natural language learning Slide 25 Recall & Precision query: cell phone mobile phones nerve cell police cell recall = doorsnede / relevant precision = doorsnede / gevonden foundintersectionrelevant Recall < 20% for basic search engines! (Blair & Maron 1985) jail neuron Slide 26 26 EuroWordNet The development of a multilingual database with wordnets for several European languages Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328 March 1996 - September 1999 2.5 Million EURO. http://www.hum.uva.nl/~ewnhttp://www.hum.uva.nl/~ewnhttp://www.hum.uva.nl/~ewn http://www.illc.uva.nl/EuroWordNet/finalresults- ewn.htmlhttp://www.illc.uva.nl/EuroWordNet/finalresults- ewn.html Slide 27 27 EuroWordNet Languages covered: EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian. Size of vocabulary: EuroWordNet-1: 30,000 concepts - 50,000 word meanings. EuroWordNet-2: 15,000 concepts- 25,000 word meaning. Type of vocabulary: the most frequent words of the languages all concepts needed to relate more specific concepts Slide 28 28 EuroWordNet Model I = Language Independent link II = Link from Language Specific to Inter lingual Index III = Language Dependent Link III Lexical Items Table cavalcare andare muoversi III guidare ILI-record {drive} Inter-Lingual-Index Ontology 2OrderEntity LocationDynamic Domains Traffic AirRoad` III Lexical Items Table bewegen gaan rijden berijden III Lexical Items Table driveride move go III Lexical Items Table cabalgar jinetear III conducir mover transitar III II I I Slide 29 29 ENGLISH Car Train Vehicle Inter-Lingual-Index Transport Road Air Water Domains DOLCE SUMO Device Object TransportDevice English Words vehicle cartrain 1 2 4 33 Czech Words dopravn prostednk autovlak 2 1 French Words vhicule voiture train 2 1 Estonian Words liiklusvahend autokillavoor 2 1 German Words Fahrzeug AutoZug 2 1 Spanish Words vehculo autotren 2 1 Italian Words veicolo autotreno 2 1 Dutch Words voertuig autotrein 2 1 EuroWordNet Design Slide 30 30 Differences in relations between EuroWordNet and WordNet Added Features to relations Cross-Part-Of-Speech relations New relations to differentiate shallow hierarchies New interpretations of relations Slide 31 31 EWN Relationship Labels Disjunction/Conjunction of multiple relations of the same type WordNet1.5 door1 -- (a swinging or sliding barrier that will close the entrance to a room or building; "he knocked on the door"; "he slammed the door as he left") PART OF: doorway, door, entree, entry, portal, room access door 6 -- (a swinging or sliding barrier that will close off access into a car; "she forgot to lock the doors of her car") PART OF: car, auto, automobile, machine, motorcar. Slide 32 32 EWN Relationship Labels {airplane}HAS_MERO_PART: conj1 {door} HAS_MERO_PART: conj2 disj1{jet engine} HAS_MERO_PART: conj2 disj2{propeller} {door}HAS_HOLO_PART: disj1 {car} HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance} {dog} HAS_HYPERONYM: conj1{mammal} HAS_HYPERONYM: conj2{pet} {albino}HAS_HYPERONYM: disj1{plant} HAS_HYPERONYM: disj2{animal} Default Interpretation: non-exclusive disjunction Slide 33 33 Factive/Non-factive CAUSES (Lyons 1977) factive (default interpretation): to kill causes to die: {kill}CAUSES{die} non-factive: E 1 probably or likely causes event E 2 or E 1 is intended to cause some event E 2 : to search may cause to find. {search}CAUSES {find} non-factive EWN Relationship Labels Slide 34 34 Cross-Part-Of-Speech relations WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy: adornment 2 change of state-- (the act of changing something) adorn 1 change, alter-- (cause to change; make different) EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations: {adorn V}XPOS_NEAR_SYNONYM{adornment N} {size N}XPOS_NEAR_HYPONYM{tall A} {short A} Slide 35 35 Role relations In the case of many verbs and nouns the most salient relation is not the hyperonym but the relation between the event and the involved participants. These relations are expressed as follows: {knife}ROLE_INSTRUMENT{to cut} {to cut}INVOLVED_INSTRUMENT{knife}reversed {school}ROLE_LOCATION {to teach} {to teach}INVOLVED_LOCATION {school}reversed These relations are typically used when other relations, mainly hyponymy, do not clarify the position of the concept network, but the word is still closely related to another word. Slide 36 36 Co_Role relations guitar playerHAS_HYPERONYMplayer CO_AGENT_INSTRUMENTguitar playerHAS_HYPERONYMperson ROLE_AGENTto play music CO_AGENT_INSTRUMENTmusical instrument to play musicHAS_HYPERONYM to make ROLE_INSTRUMENTmusical instrument guitarHAS_HYPERONYMmusical instrument CO_INSTRUMENT_AGENTguitar player ice sawHAS_HYPERONYMsaw CO_INSTRUMENT_PATIENTice sawHAS_HYPERONYMsaw ROLE_INSTRUMENTto saw iceCO_PATIENT_INSTRUMENTice saw REVERSED Slide 37 37 Co_Role relations Examples of the other relations are: criminalCO_AGENT_PATIENTvictim novel writer/ poetCO_AGENT_RESULTnovel/ poem doughCO_PATIENT_RESULTpastry/ bread photograpic cameraCO_INSTRUMENT_RESULTphoto Slide 38 38 Overview of the Language Internal relations in EuroWordnet Same Part of Speech relations: NEAR_SYNONYMYapparatus - machine HYPERONYMY/HYPONYMYcar - vehicle ANTONYMYopen - close HOLONYMY/MERONYMYhead - nose Cross-Part-of-Speech relations: XPOS_NEAR_SYNONYMYdead - death; to adorn - adornment XPOS_HYPERONYMY/HYPONYMYto love - emotion XPOS_ANTONYMYto live - dead CAUSEdie - death SUBEVENTbuy - pay; sleep - snore ROLE/INVOLVEDwrite - pencil; hammer - hammer STATEthe poor - poor MANNERto slurp - noisily BELONG_TO_CLASSRome - city Slide 39 chronical patient ; mental patient patient HYPONYM -PROCEDURE -LOCATION STATE -CAUSE cure -PATIENT treat docter disease; disorder physiotherapy medicine etc. hospital, etc. stomach disease, kidney disorder, -PATIENT -AGENT child docter child co-- AGENT-PATIENT Horizontal & vertical semantic relations HYPONYM Slide 40 40 Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages; Index-records are mainly based on WordNet synsets and consist of synonyms, glosses and source references; Various types of complex equivalence relations are distinguished; Equivalence relations from synsets to index records: not on a word-to-word basis; Indirect matching of synsets linked to the same index items; The Multilingual Design Slide 41 41 Equivalent Near Synonym 1. Multiple Targets (1:many) Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5: make clean by removing dirt, filth, or unwanted substances from remove unwanted substances from, such as feathers or pits, as of chickens or fruit remove in making clean; "Clean the spots off the rug" remove unwanted substances from - (as in chemistry) 2. Multiple Sources (many:1) Dutch wordnet: versiersel near_synonym versiering ILI-Record:decoration. 3. Multiple Targets and Sources (many:many) Dutch wordnet: toestel near_synonym apparaat ILI-records:machine; device; apparatus; tool Slide 42 42 Equivalent Hyperonymy Typically used for gaps in English WordNet: genuine, cultural gaps for things not known in English culture: Dutch: klunen, to walk on skates over land from one frozen water to the other pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English: Dutch: kunststof = artifact substance artifact object Slide 43 43 Equivalent Hyponymy has_eq_hyponym Used when wordnet1.5 only provides more narrow terms. In this case there can only be a pragmatic difference, not a genuine cultural gap, e.g.: Spanish dedo = either finger or toe. Slide 44 44 { toe : part of foot } { finger : part of hand } { dedo, dito : finger or toe } { head : part of body } { hoofd : human head } { kop : animal head } toe finger head dito dedo hoofd kop EN-Net NL-Net IT-Net ES-Net = normal equivalence =eq_has_hyponym =eq_has_hyperonym Complex mappings across languages Slide 45 45 Typical gaps in the (English) ILI Dutch: doodschoppen (to kick to death): eq_hyperonym {kill}V and to {kick}V aardig (Adjective, to like): eq_near_synonym {like}V cassire (female cashier) eq_hyperonym {cashier}, {woman} kunstproduct (artifact substance) eq_hyperonym {artifact} and to {product} Spanish: alevn (young fish): eq_hyperonym {fish} and eq_be_in_state {young} cajera (female cashier) eq_hyperonym {cashier}, {woman} Slide 46 46 Wordnets as semantic structures Wordnets are unique language-specific structures: different lexicalizations differences in synonymy and homonymy different relations between synsets same organizational principles: s...

Recommended

View more >