Section outline


  • 27.4. Session #1:
     Introduction
    • Recap of DS4DH1
    • Course organization


    4.5. Session #2: Recap of DSDH1

    • Pandas
    • Numpy, Scipy
    • Seaborn


    11.5. Session #3: Corpus linguistics

    • Lexical association measures
    • Multi-word expressions, collocations, idioms
    • Lexico-semantic resources: WordNet, BabelNet, PanLex


    22.5. Session #3: Topic modeling -- exceptionally on Monday, 14-16

    • Latent Dirichlet Allocation
    • Practical examples with LDA in Gensim
    • Homework project #1: pick a corpus, induce topics, analyze topics and topical distribution of documents, prepare a small-scale presentation 


    25.5. Session #4: Networks

    • Introduction to Graph Theory
    • Node importance -- degree centrality, closeness centrality, betweeness centrality
    • Shortest paths
    • Practical exercises with networkx
    • Homework project #2: analysis of a large-scale network dataset; prepare a small-scale presentation with insights


    1.6. Session #5: Evaluation & Statistical Testing

    • Common evaluation measures for classification and regression
    • Gold-standard annotation and inter-annotator agreement
    • Significance testing (parametric: Student’s t-test; non-parametric: Wilcoxon’s test

    15.6. Session #6: Student presentations -- Topic Modeling Homeworks 

    22.6. Session #7Student presentations -- Network Analysis Homeworks

    29.6. Session #8: Deep Learning

    • Convolutional NNs
    • Recurrent NNs
    • Attention mechanism and Transformers
    • Practical exercises in keras


    6.7. Session #9:
    Interpretability & Fairness
    • Explainability and interpretability of machine learning models
    • Biases and fairness: data bias, model bias


    13.7. Session #10: Guest Lecture 

    • A talk by a prominent researcher in the area of Computational Humanities