What do we mean when we talk about web corpora and how are they built?: DiLCo Methods Day 2022 (7 October) - Adrien Barbaresi - Universität Hamburg
What do we mean when we talk about web corpora and how are they built?: DiLCo Methods Day 2022 (7 October)
Using texts from the web to observe language seems simple, but methodological issues are inevitable. So the data collection phase can sometimes become a project in itself.
After a brief history of web corpus linguistics, corpus building methods will be reviewed, from major data sources and their quirks to concrete steps focussing on the discovery and processing of web page contents, including the example of blogs and blog comments.
DiLCo Methods Day 2022 - Natural language processing for digital language
DiLCo organised a "Methods Day " on computational and quantitative analysis of born-digital language. The workshop targets linguists and also other students and researchers from the humanities and beyond who want to broaden their methodological skills. Three lectures will introduce current innovative techniques of meaning representation, social media data collection and analysis.--- DiLCo (‘Digital language variation in context’) is a 3-year international research network initiated in 2021 at the University of Hamburg. The network brings together researchers from Europe and USA with expertise in computational, interactional, and ethnographic approaches to digital language and linguistics. It aims to provide a platform for the development of interdisciplinary ideas in digital language and communication research, and for early-career capacity building.