backend tooling for modulplaner-frontend

Noah Vogt 628c784785 clean history commit 1 月之前
config 628c784785 clean history commit 1 月之前
parse 628c784785 clean history commit 1 月之前
.gitignore 628c784785 clean history commit 1 月之前
LICENSE 628c784785 clean history commit 1 月之前
README.md 628c784785 clean history commit 1 月之前
parse_class_pdf.py 628c784785 clean history commit 1 月之前
requirements.txt 628c784785 clean history commit 1 月之前

README.md

modulplaner-backend

Provides backend tooling for modulplaner.

Because the original repo only contains frontend code and data updates were slow and intransparent, I created this repo as a solution.

Basic Usage

After installing the python3 dependencies in requirements.txt, execute parse_class_pdf.py to parse a class timetable pdf. It reads from the filename constant CLASS_PDF_INPUT_FILE defined in config/constants.py and outputs in CLASSES_JSON_OUTPUT_FILE. By default, these files are klassen.pdf and classes.json.

Project Roadmap

Currently I am working on refining the core data generation. In the future, I can see myself also working on:

  • adding documentation on how the extraction works, and problems with this approach
  • adding documentation on the json's the frontend excepts, formulate json shemas
  • addressing the problems in the source data and the frontend data formats (see the following sections)
  • verifying module / lecturer shorthands and rooms in class pdf cells
  • fixing module mapping + verification
  • fixing module dependencies

Problems in the Source Data

  • class pdf's cells sometimes cut off data like lecturer shorthands, which could be repaired by cross-referencing with the lecturer pdf
  • Unclear DSMixe entry in the room line (third line) of class pdf cells, or the line is missing altogether
  • Non-Ascii Characters are present (e.g. for lecturer shorthands)
  • The redundant class name in the class pdf cells sometimes gets mixed up with the module shorthand, which is especially annoying when part of the class name gets cut off too (is handled)
  • missing degree programs in the text above the table need to be guessed via ugly heuristics
  • there is a class called alle which is degree program agnostic
  • degree_program's Kontext BWL, Kontext Kommunikation, Kontext GSW have mixed classes, which arises the need the have a table to differentiate the modules

Problems in the Frontend Data Formats

  • there seem to be teaching_type's defined that may not ever be found in class pdf's
  • changes and deprecation in lecturer shorthands are not possible without breaking the view of older semesters
  • the usefulness of part_of_other_classes needs to be further investigated

Licensing

modulplaner-backend is a free (as in “free speech” and also as in “free beer”) Software. It is distributed under the GNU Affero General Public License v3 (or any later version) - see the accompanying LICENSE file for more details.