• 5/2018: "Module 1 - Context profiles" was implemented
  • 12/2017: List with 200.000 lemmas was created and cleaned
  • 6/2017: Project start


Project objectives

Our project aims to develop and put to the test an interdisciplinary Open Access “research and experimenting platform” (CAL²Lab) which can be used for evidence-based analyses of legal language and semantics. The platform will use data from the previously assembled CAL² Corpus of German Law (JuReko) and provide semi-automatic tools to pre-structure the analysis of legal semantics on several relevant dimensions. Specifically, analyses will focus on the (in)determinacy of legal terms, in a diachronic perspective (changes over time) as well as a synchronic one (cross-section through legal schools, media, genres, legal domains, etc.).

The platform will be developed and simultaneously tested in cooperation with legal philosophers, sociologists, legal linguists as well as practitioners from a legislative body (Ministries of Justice). The project is funded by the Academy of Sciences (Baden-Wuerttemberg) and it continues the work of JuReko (descriptions available in English and German).


 Project phases

We seek to provide user-friendly tools to explore and statistically analyze the CAL² corpus. Copyright restrictions prohibit any full release of the complete corpus, but we work on interfaces, including an online platform that generates keyword-in-context (KWIC) views, word lists and supplies the following statistics:

  1. Multi-level context-analysis: Creating context profiles for each of the 200,000 most frequent tokens and n-grams (where n = {2, 3, 5}). This will allow us to measure how usage varied in time and domain, subject area and text type.
  2. Measurement of rigidity and vagueness: Quantifying and comparing the degree to which the usage of a certain expression is fixed (as a “set” phrase) in the language of lawyers. We can thus empirically test notions of “rigidity” and “vagueness”.
  3. Semantic similarity (partial synonymy): Visualizing similar expressions across various metadata (e.g., different points in time or academic journals) using self-organizing maps (calculated by comparing context profiles in a multidimensional matrix and clustering similar profiles).



Project directors

Jun.-Prof. Dr. phil. Friedemann Vogel
Dr. Dr. iur. Hanjo Hamann

Technical implementation

Isabelle Gauer

Assistant researcher

Yinchun Bai