Semantic Relation Extraction By Enriching Word Embeddıngs Exploıtıng Turkısh Morphology

Bitirildi

Yazar:

Gökhan Ercan

metin

İngilizce

1 Ayrım

296,33 KB

Eser Türü:

Kitap

Kitap Alt Türü:

Makale

Işık Üniversitesi / Lisansüstü Eğitim Enstitüsü / Department Of Computer Engineering Program

2025

Alındığı Kurum:

Işık Üniversitesi

Konusu:

Distributed representations (DR) are used to capture semantic and syntactic patterns in language by analyzing the distributional relationships of words within textual data. The modeling methods that produce DR are based on the assumption (distributional hypothesis) that "words that occur in the same context tend to have similar meanings," which is inherent to the nature of language. These modeling methods, due to their unsupervised nature, can be trained without human judgment input, allowing researchers to train large datasets at relatively low costs. Although word-based models perform effectively for languages with limited vocabularies, such as English, they exhibit considerable inefficiency when applied to morphologically rich languages with unlimited vocabularies, such as Turkish. We observed that n-gram and statistical segmentation methods, which are commonly used in subword modeling to address the issues of out-of-vocabulary and rare-words, are highly sensitive to orthographic similarity. Consequently, these methods struggle to distinguish between unrelated concepts (e.g., shrink - shrine). Moreover, we noted that the impact of morphological segmentation methods on these types of problems has shown inconsistent results in the literature. This thesis aims to make conceptual assumptions and improvements concerning different types of semantic relationships (e.g., relatedness and similarity), to model the role of language morphology as an input in subword DR models, and to develop the dataset generation methodologies and evaluation methods to measure this effect. Within the scope of the study, different models and segmentation methods were empirically tested, the AnlamVer and OSimUnr datasets were produced, and the task of relatedness classification and associated evaluation methods were proposed to measure the noise introduced by segmentation to the model. Our experiments demonstrate that morphological segmentation produces significantly less noise compared to n-gram-based methods and can lead to substantial performance improvements depending on the nature of the task.

Talep Tarihi:

Çarşamba, 4 Haziran, 2025

Tarayan:

Mehmet Turan

Sisteme Giriş Tarihi:

Çarşamba, 4 Haziran, 2025

Yorum yapmak için Giriş Yapın ya da Üye olun.
5 okunma

Boğaziçi Üniversitesi Bilgileri

Ana menü

Anahtar Sözcük Araması

Arama formu

Sayfa Düzeni

GETEM Menü

Semantic Relation Extraction By Enriching Word Embeddıngs Exploıtıng Turkısh Morphology

İletişim

Boğaziçi Üniversitesi Bilgileri

Ana menü

Buradasınız

Anahtar Sözcük Araması

Arama formu

Sayfa Düzeni

Kullanıcı girişi

GETEM Menü

Semantic Relation Extraction By Enriching Word Embeddıngs Exploıtıng Turkısh Morphology

İletişim