Abschnittsübersicht

    • Lecture content: Data collection and data acquisition: designing data collection surveys,

      collecting publicly available data from the web (scraping, public APIs), crowdsourcing;

      Types of data: structured, semi-structured, unstructured; Data preparation, preprocessing, and cleaning: error correction, deduplication, normalization, handling missing values;

      Data privacy and intellectual property rights.

      Tutorial content: Scraping and extracting public content from the Web (Python libraries: scrapy and tweepy); Data loading, organization, preparation, formatting, and manipulation (Python libraries: pandas);

      Homework: Usage scenario – Correction of object character recognition (OCR) errors