• 5/2018: "Module 1 - Context profiles" was implemented
  • 12/2017: List with 200.000 lemmas was created and cleaned
  • 6/2017: Project start


Project objectives

Our project aims to develop and put to the test an interdisciplinary Open Access “research and experimenting platform” (CAL²Lab) which can be used for evidence-based analyses of legal language and semantics. The platform will use data from the previously assembled CAL² Corpus of German Law (JuReko) and provide semi-automatic tools to pre-structure the analysis of legal semantics on several relevant dimensions. Specifically, analyses will focus on the (in)determinacy of legal terms, in a diachronic perspective (changes over time) as well as a synchronic one (cross-section through legal schools, media, genres, legal domains, etc.).

The platform will be developed and simultaneously tested in cooperation with legal philosophers, sociologists, legal linguists as well as practitioners from a legislative body (Ministries of Justice). The project is funded by the Academy of Sciences (Baden-Wuerttemberg) and it continues the work of JuReko (descriptions available in English and German).


Project phases

We seek to provide user-friendly tools to explore and statistically analyze the CAL² corpus. Copyright restrictions prohibit any full release of the complete corpus, but we work on interfaces, including an online platform that generates keyword-in-context (KWIC) views, word lists and supplies the following statistics:

  1. Multi-level context-analysis: Creating context profiles for each of the 200,000 most frequent tokens and n-grams (where n = {2, 3, 5}). This will allow us to measure how usage varied in time and domain, subject area and text type.
  2. Measurement of rigidity and vagueness: Quantifying and comparing the degree to which the usage of a certain expression is fixed (as a “set” phrase) in the language of lawyers. We can thus empirically test notions of “rigidity” and “vagueness”.
  3. Semantic similarity (partial synonymy): Visualizing similar expressions across various metadata (e.g., different points in time or academic journals) using self-organizing maps (calculated by comparing context profiles in a multidimensional matrix and clustering similar profiles).



Project directors

Prof. Dr. phil. Friedemann Vogel
Dr. Dr. iur. Hanjo Hamann

Technical implementation

Isabelle Gauer

Assistant researcher

Yinchun Bai


Project reports (in German)

Gauer, Isabelle; Vogel, Friedemann; Hamann, Hanjo (2017): Juristische Semantik messend verstehen. CAL²Lab – Eine computergestützte Forschungs- und Experimentierplattform als Beitrag zu einer datengestützten Rechtslinguistik. In: Friedemann Vogel (ed.), Recht ist kein Text: Studien zur Sprachlosigkeit im verfassten Rechtsstaat. Berlin: Duncker & Humblot.

More publications.