Piek Vossen VU University Amsterdam

  • Published on
    11-Feb-2016

  • View
    17

  • Download
    0

DESCRIPTION

From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning. Piek Vossen VU University Amsterdam. Overview. Wordnet, EuroWordNet Global Wordnet Grid Stevin project Cornetto 7 th Frame work project KYOTO. WordNet. http://wordnet.princeton.edu/ - PowerPoint PPT Presentation

Transcript

From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaningPiek VossenVU University AmsterdamGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOverviewWordnet, EuroWordNetGlobal Wordnet GridStevin project Cornetto7th Frame work project KYOTOGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordNethttp://wordnet.princeton.edu/Lexical semantic database for EnglishDeveloped by George Miller and his team at Princeton University, as the implementation of a mental model of the lexiconOrganized around the notion of a synset: a set of synonyms in a language that represent a single conceptSemantic relations between concepts (synsets) and not between wordsCurrently covers over 117,000 concepts (synsets) and over 150,000 English wordsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenRelational model of meaningmanwomanboygirlmanwomanboycatkittendogpuppyanimalGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet: a network of semantically related words{car; auto; automobile; machine; motorcar}hyper(o)nymhyponymmeronymsHyponymy and meronymy relations are: transitive directedGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet Semantic RelationsWN 1.5 starting pointThe synset as a weak notion of synonymy:two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value. (Miller et al. 1993)Relations between synsets:ExampleHYPONYMYnoun-to-nouncar/ vehicleverb-to-verbwalk/ moveMERONYMYnoun-to-nounhead/ noseANTONYMYadjective-to-adjectivegood/badverb-to-verbopen/ closeENTAILMENTverb-to-verbbuy/ payCAUSEverb-to-verbkill/ dieGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet Data Modelbankfiddleviolinviolistfiddlerstringrec: 12345 financial instituterec: 54321- side of a riverrec: 9876- small string instrumentrec: 65438- musician playing violinrec:42654- musicianrec:25876- string instrumentrec:35576- string of instrumentrec:29551- underweartype-oftype-ofpart-ofVocabulary of a languageConceptsRelations122112polysemypolysemy&synonymypolysemyGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenSome observations on Wordnetsynsets are more compact representations for concepts than word meanings in traditional lexiconssynonyms and hypernyms are substitutional variants:begin commenceI once had a canary. The bird got sick. The poor animal died.hyponymy and meronymy chains are important transitive relations for predicting properties and explaining textual properties:object -> artifact -> vehicle -> 4-wheeled vehicle -> carstrict separation of part of speech although concepts are closely related (bed sleep) and are similar (dead death)lexicalization patterns reveal important mental structuresGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenLexicalization patterns25 unique beginnersgarbagetreeorganismanimalbirdcanarychurchbuildingartifactobjectplantflowerrosewastethreatentitycommon canaryabbeycrocodiledogbasic level concepts balance of two principles: predict most features apply to most subclasses where most concepts are created amalgamate most parts most abstract level to draw a picturesGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet top levelGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenMeronymy & picturesbeakGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenMeronymy & picturesGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet 3.0 statisticsPOS Unique Synsets Total Strings Word-Sense Pairs Noun117,79882,115146,312Verb11,52913,76725,047Adjective21,47918,15630,002Adverb4,4813,6215,580Totals155,287117,659206,941Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet 3.0 statisticsPOS Monosemous Polysemous Polysemous Words and Senses Words Senses Noun101,86315,93544,449Verb6,2775,25218,770Adjective16,5034,97614,399Adverb3,7487331,832Totals128,39126,89679,450Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnet 3.0 statisticsPOS Average Polysemy Average Polysemy Including Monosemous Words Excluding Monosemous Words Noun1.242.79Verb2.173.57Adjective1.42.71Adverb1.252.5Guest lecture, Language Engineering Applications, February, 26th 2009, Leuvenhttp://www.visuwords.comGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenUsage of WordnetMostly used database in language technologyEnormous impact in language technology developmentLargeFree and downloadableEnglishGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenUsage of WordnetImprove recall of textual based analysis: Query -> IndexSynonyms: commence beginHypernyms: taxi -> carHyponyms: car -> taxiMeronyms: trunk -> elephantLexical entailments: gun -> shootInferencing:what things can burn?Expression in language generation and translation:alternative words and paraphrasesGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenImprove recallInformation retrieval: effective on small databases without redundancy, e.g. image captions, video textText classification:expand small training setsreduce training effortQuestion & Answer systemsquestion classification: who, where, what, whenmatch answers to question typesGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenImprove recallAnaphora resolution:The girl fell off the table. She....The glass fell of the table. It...Coreference resolution:When he moved the furniture, the antique table got damaged. Information extraction (unstructed text to structured databases):generic forms or patterns "vehicle" - > text with specific cases "car"Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenImprove recallSummarizers:Sentence selection based on word counts -> concept countsAvoid repetition in summary -> language generation, pick out another synonym or hypernymLimited inferencing: detect locations, people, organisations, etc.Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEnabling technologiesSemantic similarity: what sentences or expressions are semantically similar?Semantic relatedness and textual entailment: smoke entails fire, fire entails damageWord-Senses-DisambiguationErwin Marsi, University of Tilbug, http://daeso.uvt.nl/demos/index.htmlGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenRecall & Precisioncellphonemobilephonesnerve cellpolice cellrecall = doorsnede / relevantprecision = doorsnede / gevondenRecall < 20% for basic search engines!(Blair & Maron 1985)jailneuronMany othersData sparseness for machine learning: hapaxes can be replaced by semantic classes that match classes from the training setUse redundancy for more robustness: spelling correction and speech recognition can built semantic expectations using Wordnet and make better choicesSentiment and opinion miningNatural language learningGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEuroWordNetThe development of a multilingual database with wordnets for several European languagesFunded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328March 1996 - September 19992.5 Million EURO.http://www.hum.uva.nl/~ewnhttp://www.illc.uva.nl/EuroWordNet/finalresults-ewn.htmlGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEuroWordNetLanguages covered: EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, ItalianEuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.Size of vocabulary:EuroWordNet-1: 30,000 concepts - 50,000 word meanings.EuroWordNet-2: 15,000 concepts- 25,000 word meaning.Type of vocabulary: the most frequent words of the languagesall concepts needed to relate more specific conceptsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEuroWordNet Model I = Language Independent linkII = Link from Language Specific to Inter lingual IndexIII = Language Dependent LinkInter-Lingual-IndexIIIIGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenDifferences in relations between EuroWordNet and WordNet Added Features to relations Cross-Part-Of-Speech relations New relations to differentiate shallow hierarchies New interpretations of relationsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEWN Relationship Labels{airplane}HAS_MERO_PART: conj1 {door}HAS_MERO_PART: conj2 disj1{jet engine}HAS_MERO_PART: conj2 disj2{propeller}{door}HAS_HOLO_PART: disj1 {car}HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance} Default Interpretation: non-exclusive disjunctionGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOverview of the Language Internal relations in EuroWordnetSame Part of Speech relations:HYPERONYMY/HYPONYMYcar - vehicleANTONYMYopen - closeHOLONYMY/MERONYMYhead noseNEAR_SYNONYMYapparatus - machineCross-Part-of-Speech relations:XPOS_NEAR_SYNONYMYdead - death; to adorn - adornmentXPOS_HYPERONYMY/HYPONYMYto love - emotionXPOS_ANTONYMYto live - deadCAUSEdie - deathSUBEVENTbuy - pay; sleep - snoreROLE/INVOLVEDwrite - pencil; hammer - hammerSTATEthe poor - poorMANNERto slurp - noisily BELONG_TO_CLASSRome - cityGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenCo_Role relationscriminalCO_AGENT_PATIENTvictimnovel writer/ poetCO_AGENT_RESULTnovel/ poemdoughCO_PATIENT_RESULTpastry/ breadphotograpic cameraCO_INSTRUMENT_RESULTphotoguitar playerHAS_HYPERONYMplayerCO_AGENT_INSTRUMENTguitarplayerHAS_HYPERONYMpersonROLE_AGENTto play musicCO_AGENT_INSTRUMENTmusical instrumentto play musicHAS_HYPERONYM to makeROLE_INSTRUMENTmusical instrumentguitarHAS_HYPERONYMmusical instrumentCO_INSTRUMENT_AGENTguitar playerGuest lecture, Language Engineering Applications, February, 26th 2009, Leuvenchronical patient ; mental patientpatientHYPONYM-PROCEDURE-LOCATIONSTATE-CAUSEcure-PATIENTtreatdocterdisease; disorderphysiotherapymedicineetc.hospital, etc.stomach disease, kidney disorder, -PATIENT-AGENTchild docter childco--AGENT-PATIENTHorizontal & vertical semantic relationsHYPONYMHYPONYMGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenThe Multilingual DesignInter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages;Index-records are mainly based on WordNet synsets and consist of synonyms, glosses and source references;Various types of complex equivalence relations are distinguished;Equivalence relations from synsets to index records: not on a word-to-word basis;Indirect matching of synsets linked to the same index items;Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEquivalent Near Synonym1. Multiple Targets (1:many)Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5: make clean by removing dirt, filth, or unwanted substances from remove unwanted substances from, such as feathers or pits, as of chickens or fruit remove in making clean; "Clean the spots off the rug" remove unwanted substances from - (as in chemistry)2. Multiple Sources (many:1)Dutch wordnet: versiersel near_synonym versiering ILI-Record:decoration.3. Multiple Targets and Sources (many:many)Dutch wordnet: toestel near_synonym apparaatILI-records:machine; device; apparatus; toolGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEquivalent HyperonymyTypically used for gaps in English WordNet:genuine, cultural gaps for things not known in English culture:Dutch: klunen, to walk on skates over land from one frozen water to the otherpragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English: Dutch: kunststof = artifact substance artifact objectGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEuroWordNet statisticsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenSynsetsNo. of sensesSens./syns.EntriesSens./entryLIRels.LIRels/synsEQRels-ILIEQRels/synSynsets without ILIDutch44015702011,59562831,251116392,54534481,217203Spanish23370505262,16279331,81551632,36212360,910Italian40428484991,20329781,471170682,90717891,781561French22745328091.44187771.75494942.18227301.0020German15132204531.35170981.20348182.30163471.080Czech12824199491.56122831.62262592.05128241.000Estonian7678138391.80109611.26163182.1390041.170English16361405882,48173202,34421402,58n.a.n.a.n.a.WN15945151876021,981266171,482113752,24n.a.n.a.n.a.Wordnets as semantic structuresWordnets are unique language-specific structures:same organizational principles: synset structure and same set of semantic relations. different lexicalizationsdifferences in synonymy and homonymy:"decoration" in English versus "versiersel/versiering" in Dutch"bank" in English (money/river) versus "bank" in Dutch (money/furniture)BUT also different relations for similar synsetsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenAutonomous & Language-Specificvoorwerp{object}lepel{spoon}werktuig{tool}tas{bag}bak{box}blok{block}lichaam{body}Wordnet1.5Dutch WordnetGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenLinguistic versus Artificial OntologiesArtificial ontology: better control or performance, or a more compact and coherent structure. introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise). What properties can we infer for spoons?spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cookingGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenLinguistic versus Artificial OntologiesLinguistic ontology: Exactly reflects the relations between all the lexicalized words and expressions in a language. Captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language. What words can be used to name spoons?spoon -> object, tableware, silverware, merchandise, cutlery, Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenWordnets versus ontologiesWordnets:autonomous language-specific lexicalization patterns in a relational network. Usage: to predict substitution in text for information retrieval,text generation, machine translation, word-sense-disambiguation.Ontologies: data structure with formally defined concepts.Usage: making semantic inferences.Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenFrom EuroWordNet to Global WordNetEuroWordNet ended in 1999Global Wordnet Association was founded in 2000 to maintain the framework: http://www.globalwordnet.orgCurrently, wordnets exist for more than 50 languages, including:Arabic, Bantu, Basque, Chinese, Bulgarian, Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean, Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, Zulu...Many languages are genetically and typologically unrelatedGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGlobal Wordnet AssociationDanishNorwaySwedishPortugueseKoreanRussianBasqueCatalanThaiArabicPolishWelshChinese20 Indian LanguagesBrazilian PortugueseHebrewLatvianPersianKurdishAvestanBaluchiHungarianEnglishGermanSpanishFrenchItalianDutchCzechEstonianRomanianBulgarianTurkishSlovenianGreekSerbianEuroWordNetBalkaNethttp://www.globalwordnet.orgGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenSome downsides of the EuroWordNet modelConstruction is not done uniformlyCoverage differsNot all wordnets can communicate with one another, i.e. linked to different versions of English wordnetProprietary rights restrict free access and usageA lot of semantics is duplicatedComplex and obscure equivalence relations due to linguistic differences between English and other languagesGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenNext step: Global WordNet GridInter-LingualOntologyDeviceObjectTransportDeviceCzech Wordsdopravn prostednk auto vlak 21French Wordsvhicule voituretrain21Estonian Wordsliiklusvahend autokillavoor21Dutch Wordsvoertuigautotrein21Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGWNG: Main FeaturesConstruct separate wordnets for each Grid languageContributors from each language encode the same core set of concepts plus culture/language-specific onesSynsets (concepts) are mapped crosslinguistically via an ontology instead of just the English WordnetGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenThe Ontology: Main FeaturesList of concepts is not just based on the lexicon of a particular language (unlike in EuroWordNet) but uses ontological observationsOntology contains only upper and mid-level conceptsConcepts are related in a type hierarchyConcepts are defined with axiomsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenThe Ontology: Main FeaturesMinimal set of concepts (Reductionist view):to express equivalence across languagesto support inferencingOntology need not and cannot provide a concept for all concepts found in the Grid languages Lexicalization in a language is not sufficient to warrant inclusion in the ontologyLexicalization in all or many languages may be sufficientOntological observations will be used to define the concepts in the ontologyOntological framework still must be powerful enough to encode all concepts that are lexically expressed in any of the Grid languagesAdditional lexicalized concepts are related to the ontology through complex relationsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOntological observationsIdentity criteria as used in OntoClean (Guarino & Welty 2002), :rigidity: to what extent are properties true for entities in all worlds? You are always a human, but you can be a student for a short while.essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of.unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not.Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenType-role distinction Current WordNet treatment, hyponyms of dog:lapdog:1 # toy dog:1, toy:4 # hunting dog:1 # working dog:1, etc.dalmatian:2, coach dog:1, carriage dog:1 # Leonberg:1 # Newfoundland:1 # poodle:1, poodle dog:1, etc.(1) a husky is a kind of dog(type)(2) a husky is a kind of working dog (role)Whats wrong? (2) is defeasible, (1) is not:*This husky is not a dogThis husky is not a working dogGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOntology and lexiconHierarchy of disjunct types:Canine PoodleDog; NewfoundlandDog; GermanShepherdDog; HuskyLexicon:NAMES for TYPES:{poodle}EN, {poedel}NL, {pudoru}JP((instance x Poodle)LABELS for ROLES:{watchdog}EN, {waakhond}NL, {banken}JP((instance x Canine) and (role x GuardingProcess))Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOntology and lexiconHierarchy of disjunct types:River; Clay; etcLexicon:NAMES for TYPES:{river}EN, {rivier, stroom}NL((instance x River)LABELS for dependent concepts:{rivierwater}NL (water from a river => water is not a unit){kleibrok}NL (irregularly shared piece of clay=>non-essential) ((instance x water) and (instance y River) and (portion x y)((instance x Object) and (instance y Clay) and (portion x y) and (shape X Irregular))Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKIF expression for gender marking{teacher}EN((instance x Human) and (agent x TeachingProcess)){Lehrer}DE ((instance x Man) and (agent x TeachingProcess)){Lehrerin}DE ((instance x Woman) and (agent x TeachingProcess))Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKIF expression for perspectivesell: subj(x), direct obj(z),indirect obj(y) versus buy: subj(y), direct obj(z),indirect obj(x) (and (instance x Human)(instance y Human) (instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e)The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenAspectual variantsSlavic languages: two members of a verb pair for an ongoing event and a completed event.English: can mark perfectivity with particles, as in the phrasal verbs eat up and read through. Romance languages: mark aspect by verb conjugations on the same verb. Dutch, verbs with marked aspect can be created by prefixing a verb with door: doorademen, dooreten, doorfietsen, doorlezen, doorpraten (continue to breathe/eat/bike/read/talk).These verbs are restrictions on phases of the same processDoes NOT warrant the extension of the ontology with separate processes for each aspectual variantGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKinship relations in Arabic(Eam~)father's brother, paternal uncle. (xaAl)mother's brother, maternal uncle. (Eam~ap)father's sister, paternal aunt. (xaAlap)mother's sister, maternal auntGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKinship relations in Arabic......... ($aqiyqapfull) sister, sister on the paternal and maternal side (as distinct from (>uxot): 'sister' which may refer to a 'sister' from paternal or maternal side, or both sides). (vakolAna)father bereaved of a child (as opposed to (yatiym) or (yatiymap) for feminine: 'orphan' a person whose father or mother died or both father and mother died). (vakolaYa)other bereaved of a child (as opposed to or for feminine: 'orphan' a person whose father or mother died or both father and mother died).Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenComplex Kinship conceptsfather's brother, paternal uncleWORDNETpaternal uncle => uncle=> brother of ....????ONTOLOGY(=> (paternalUncle ?P ?UNC) (exists (?F) (and (father ?P ?F) (brother ?F ?UNC))))Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenUniversality as evidenceEnglish verb cut abstracts from the precise process but there are troponyms that implicate the manner : snip, clip imply scissors, chop and hack a large knife or an axeDutch there is no general verb but only specific verbs:knippen clip, snip, cut with scissors or a scissor-like tool', snijden cut with a knife or knife-like tool, hakken chop, hack, to cut with an axe, or similar tool). If lexicalization of the specific process is more universal it can be seen as evidence that the specific processes should be listed in the ontology and not the generic verbGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOpen Questions/ChallengesWhat is a word, i.e., a lexical unit?What is the status of complex lexemes like English lightning rod, word of mouth, find out, kick the bucket? What is a semantic unit, i.e. a concept? Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOpen Questions/ChallengesIs there a core inventory of concepts that are universally encoded?If so, what are these concepts?How can crosslinguistic equivalence be verified?Is there systematicity to the language-specific extensions?What are the lexicalization patterns of individual languages? Are lexical gaps accidental or systematic? Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenCoverage: what belongs in a universal lexical database?Formal, linguistic criteria for inclusionInformal, cultural criteriaBoth are difficult to define and apply!Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenAdvantages of the Global Wordnet GridShared and uniform world knowledge:universal inferencinguniform text analysis and interpretationMore compact and less redundant databasesMore clear notion how languages map to the knowledge better criteria for expressing knowledgebetter criteria for understanding variationGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenCORNETTO(STEVIN TENDER) Combinatorial and Relational Network as Toolkit for Dutch Language Technology http://www2.let.vu.nl/oz/cornettoGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGoals of the Cornetto projectGoal: to develop a lexical semantic database for Dutch:40K Entries: generic and central part of the languageRich horizontal and vertical semantic relationsCombinatoric information Ontological informationMethod: merge data from Dutch Wordnet (DWN) and Referentie bestand Nederlands (RBN)April 2006-March 2008, extended to July 2008The data of the final results of the Cornetto project available through the TST-centrale of the Nederlandse Taalunie (free for research).Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenProject overviewDutch WordnetReferentieBestandEnglish WordnetSUMO (KIF)WN-DOMAINSAlign/MergeCornetto***Ontology:Dolce, SumoEntryLU/SynsetPosDWN dataRBN dataSUMO-pointerPWN-pointerDomain***AcquisitionToolkitAcquisitionToolkitCorpusCorpusValidationCorpusEditingDOLCE (KIF)Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenDatabaseCollections: Lexical Units (LU): mainly derived from the RBN Synsets (SY): mainly derived from DWN Terms (TE) and axioms: mainly derived on SUMO and MILO Domains (DM): based on Wordnet domainsMappings: LU SY SY SY (within Dutch and from Dutch to English) SY TE SY DMGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenData OrganizationInternal relationsPrincetonWordnetWordnetDomainsSpanishWordnetCzechWordnetGermanWordnetFrenchWordnetKoreanWordnetArabicWordnetSUMOMILOCollection of Terms and AxiomsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenDatabaseImplemented in DebVisDic:http://deb.fi.muni.cz/index.phpDemo version available:http://www2.let.vu.nl/oz/cornetto/demo.htmlGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOverview of resultsALLNOUNSVERBSADJADVOTHERSSynsets70,37152,8479,0177,689220598Lexical Units119,10885,44917,31415,712475158Lemmas (form+pos)92,68670,3159,05112,2881,032n.a.Synonyms in synsets103,76275,47614,13812,914408826CID records104,55676,53714,21413,132483190Synonym per synset1.471.431.571.681.851.38Senses per lemma1.291.221.911.280.46n.a.Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenMapping relationsNo status value5597653.54%Status value4858046.46%manual101089.67%B-9549444.73%BM-9042154.03%D-55 adjectives1710.16%D-58 verbs7740.74%D-75 nouns20851.99%M-972523624.14%RESUME-7510471.00%TOTAL104556DWN and RBN matches35,28937.74%LUs only in DWN 54,98358.81%LUs only in RBN3,2233.45%Total93,495Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOverview of synset dataSynsets 70371Synonyms 103762InternalRelations 153370EquivalenceRelations 86830Definitions 35620WordNet Domains mappings93822Sumo mappings70654Base Level Concepts8828Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEnglish Wordnet to SUMO mappingthrough two-place relations=the synset is equivalent to the SUMO concept, circle (= Circle)+the synset is subsumed by the SUMO concept, branch (+ PlantBranch)@the synset is an instance of the SUMO concept, Amsterdam (@ City)Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenCornetto SUMO Mappings through tripletsEquality:cirkel: (=, 0, Circle) or (=, , Circle)Subsumption:tak: (+, 0, PlantBranch) or (+, , PlantBranch)Related:blad: (part, 0, PlantBranch) or (part, , PlantBranch)Axiomatized:theewater: (instance, 0, Water) (instance, 1, Making) (instance, 2, Tea) (resource, 0, 1) (result, 2,1) OR (instance, , Water) (instance, 1, Making) (instance, 2, Tea) (resource, , 1) (result, 2,1)Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenOntology mapping: female/male variantsteacher (a person whose occupation is teaching)SUMO: equivalent to TeacherIn Dutch: no neutral formleraar (male teacher) (+,,Teacher), (+,, Man)lerares (female teacher) (+,,Teacher), (+,, Woman)Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics http://www.kyoto-project.eu/Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKYOTO (ICT-211423) Overview Title: Yielding Ontologies for Transition-Based OrganizationFunded: 7th Framework Program-ICT of the European Union: Intelligent Content and SemanticsTaiwan and Japan funded by national grants Goal: Platform for knowledge sharing across languages and culturesEnables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries.Open text mining and deep semantic searchWiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skillsURL: http://www.kyoto-project.eu/Duration: March 2008 March 2011Effort: 364 person months of work. Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKYOTO cyclefrog endemic frogs common frog poison frog Golden poison froggopher frog Dusky gopher frogforest frogGarden ponds are havens for wildlife. They provide food and shelter for frogs, newts and aquatic insects, including damselflies and dragonflies, (garden pont, haven, wild life)(garden pont, has_food, frog)(garden pont, has_food, newt)(garden pont, has_food, aquatic insect)(garden pont, is_shelter, frog)(garden pont, is_shelter, newt)(garden pont, is_shelter, aquatic insect)Guest lecture, Language Engineering Applications, February, 26th 2009, LeuvenEnvironmental organizationsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKyoto main applicationWikyoto (Wiki platform)Connects people with shared interest as a communityUpload documents and sourcesView and edit terms and concepts learned from these documentsCombines concepts with other taxonomiesDiscuss and agree with others in the community, different languages, regions and culturesGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKyoto main applicationTybotsLearns terms and concepts from document collectionOrganizes terms as a hierarchyConnects terms to other hierarchiesDefines:definitionsrelations to other termsproperties and criteria for termsGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKyoto main applicationKybot:Detects facts of interest in text and combines these in a comprehensive overviewUses knowledge represented for terms to detect facts in any document, regardless of languageAllows you to specify any collection of types of knowledge of your interestGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenKyoto databasesDatabase of users that forms the communityDatabase of sources and documents provided by the usersDatabase of terms, presented as a domain wordnet in each languageDatabase of concepts (so-called ontology) that connects the terms of the different languagesDatabases of facts derived from various document and source collections provided by the userGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenThank you for your attentionGuest lecture, Language Engineering Applications, February, 26th 2009, LeuvenCID: Cornetto identifiersID/bookkeeping

Recommended

View more >