Master Thesis Source Code

This repository contains the source code for my Master's thesis project.

Abstract

This thesis explores the potential of natural language processing (NLP) in the social sciences, specifically the clustering of contextual word embeddings. However, the limited interpretability of these techniques makes it difficult to get a deeper understanding. To address this issue, this thesis proposes a strategy to provide social scientists with a human-friendly explanation of word clusters by using the contextual information around each item to provide an explanation for each cluster.

Using various explainability techniques, salience scores are generated to rank the contextual elements of sentences in order of importance. Then, a probing classifier evaluates the information highlighted by each explainability technique and predicts the cluster to which each embedded word belongs.

The results of this thesis indicate that the use of explainability techniques can generate informative explanations that can help us understand the distinctions between different clusters of contextual word embeddings. Ultimately, we believe that our work can help social scientists be more confident in using contextual word embeddings for various NLP tasks.

Installation

Install the module in the main folder like: pip install MasterThesis

Usage

The available arguments are:

  --sentences_generation: generates sentences from the datasets.
  --clustering_embeddings: clusters the embeddings.
  --extract_sentences_with_target: extracts sentences with target.
  --salience_extraction: extracts salience.
  --training_classifier: trains the classifier.

To run the script, use the following command: python -m marc_thesis [argument]

Experiments file

This project requires the environment variable MARC_THESIS_EXPERIMENT_FOLDER to be set. This variable will contain all the information generated by the different funcitonalities and will serve as a write/read storage folder for them.

mkdir experiments_folder
export MARC_THESIS_EXPERIMENT_FOLDER=/absolute/path/experiments_folder

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
marc_thesis		marc_thesis
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master Thesis Source Code

Abstract

Installation

Usage

Experiments file

About

Releases

Packages

Languages

marc-gav/MasterThesis

Folders and files

Latest commit

History

Repository files navigation

Master Thesis Source Code

Abstract

Installation

Usage

Experiments file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages