class_factory.utils package

Submodules

class_factory.utils.base_model module

class class_factory.utils.base_model.BaseModel(lesson_no: int, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, verbose: bool = False)[source]

Bases: object

A base class for educational modules that provides common setup and utility functions, such as loading lesson readings and setting user-defined objectives.

lesson_no

The specific lesson number for the current instance.

Type:

int

course_name

Name of the course, used as context in other methods and prompts.

Type:

str

lesson_loader

Instance for loading lesson-related data.

Type:

LessonLoader

output_dir

Directory where outputs are saved; defaults to ‘ClassFactoryOutput’.

Type:

Path

logger

Logger instance for the class.

Type:

Logger

user_objectives

Dictionary of user-defined objectives, if provided.

Type:

Optional[Dict[str, str]]

_load_readings(lesson_numbers

Union[int, range]) -> Dict[str, List[str]]: Loads and returns readings for the specified lesson(s) as a dictionary.

set_user_objectives(objectives

Union[List[str], Dict[str, str]], lesson_range: Union[int, range]) -> Dict[str, str]: Sets user-defined objectives for each lesson in the specified range and updates self.user_objectives.

_get_lesson_objectives(lesson_num

int) -> str: Retrieves lesson objectives for a given lesson number, falling back to extracted objectives if no user objectives exist.

set_user_objectives(objectives: List[str] | Dict[str, str], lesson_range: int | range) Dict[str, str][source]

Set user-defined objectives for each lesson in lesson_range, supporting both list and dictionary formats. Updates the self.user_objectives attribute with the processed objectives.

Parameters:
  • objectives (Union[List[str], Dict[str, str]]) – User-provided objectives, either as a list (converted to a dictionary) or as a dictionary keyed by lesson numbers as strings.

  • lesson_range (Union[int, range]) – Single lesson number or range of lesson numbers for objectives.

Returns:

Processed dictionary of user objectives keyed by lesson numbers as strings.

Return type:

Dict[str, str]

Raises:
  • ValueError – If the number of objectives in the list does not match the number of lessons in lesson_range.

  • TypeError – If objectives is neither a list nor a dictionary.

class_factory.utils.llm_validator module

class class_factory.utils.llm_validator.ValidatorInterimResponse(*, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str)[source]

Bases: BaseModel

accuracy: float
completeness: float
consistency: float
reasoning: str
additional_guidance: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.llm_validator.ValidatorResponse(*, overall_score: float, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str, status: int)[source]

Bases: BaseModel

overall_score: float
accuracy: float
completeness: float
consistency: float
reasoning: str
additional_guidance: str
status: int
classmethod from_interim(interim: ValidatorInterimResponse, threshold: float = 8.0) ValidatorResponse[source]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.llm_validator.Validator(llm: any, temperature: float = 0.2, log_level: int = 20, system_prompt: str | None = None, human_prompt: str | None = None, tracer: any | None = None, score_threshold: float = 8.0)[source]

Bases: object

A class for validating responses generated by an LLM (Language Model).

The Validator checks the accuracy, completeness, and relevance of the LLM’s response to ensure it meets the requirements specified in the task prompt. Validation results include a score, status, reasoning, and any additional guidance.

validate(task_description: str, generated_response: str, task_schema: str = '', specific_guidance: str = '') dict[source]

class_factory.utils.load_documents module

Document Loading and Processing Module

This module provides functionality to load, process, and extract text from various document types (PDF, DOCX, and TXT) for generating lesson-specific content. It includes support for OCR processing of scanned documents and handling of structured educational materials like syllabi and lesson readings.

Classes

LessonLoader

Main class for handling document loading and processing operations.

Key Functions

The LessonLoader class provides these key functionalities:

  • Document Loading:
    • load_directory: Load all documents from a specified directory

    • load_lessons: Load lessons from multiple directories with lesson number inference

    • load_readings: Extract text from individual documents

    • load_beamer_presentation: Load Beamer presentation content

  • Syllabus Processing:
    • extract_lesson_objectives: Extract objectives for specific lessons

    • load_docx_syllabus: Load and parse DOCX syllabus content

    • find_docx_indices: Locate lesson sections within syllabus

  • Text Extraction:
    • extract_text_from_pdf: Extract text from PDF files

    • ocr_pdf: Perform OCR on scanned documents

    • convert_pdf_to_docx: Convert PDF files to DOCX format

Dependencies

Core Dependencies:
  • pypdf: PDF text extraction

  • python-docx: DOCX file handling

  • pathlib: File path operations

  • typing: Type hints

Optional OCR Dependencies:
  • pytesseract: OCR processing

  • pdf2image: PDF to image conversion

  • spacy: Text processing

  • contextualSpellCheck: Text correction

  • img2table: Table extraction from images

Example Usage

from class_factory.utils.load_documents import LessonLoader

# Initialize loader with paths
loader = LessonLoader(
    syllabus_path="path/to/syllabus.docx",
    reading_dir="path/to/readings"
)

# Load specific lesson content
lesson_content = loader.load_lessons(lesson_number=5)

# Extract lesson objectives
objectives = loader.extract_lesson_objectives(current_lesson=5)

Notes

  • OCR functionality requires additional package installation via pip install class_factory[ocr]

  • Directory structure should follow consistent naming (e.g., ‘L1’, ‘L2’, etc.)

  • Supports both PDF and DOCX syllabus formats with automatic conversion if needed

See also

-

class:class_factory.utils.tools.logger_setup: Logger configuration

-

mod:class_factory.utils.base_model: Base model implementation

class class_factory.utils.load_documents.LessonLoader(syllabus_path: Path | str, reading_dir: Path | str, slide_dir: Path | str | None = None, project_dir: Path | str | None = None, verbose: bool = True, tabular_syllabus: bool = False)[source]

Bases: object

A class for loading and managing educational content from various document formats.

This class handles loading and processing of lesson materials including syllabi, readings, and Beamer presentations. It supports multiple file formats (PDF, DOCX, TXT) and provides OCR capabilities for scanned documents when necessary.

reading_dir

Directory containing lesson reading materials

Type:

Path

slide_dir

Directory containing Beamer presentation slides

Type:

Path

project_dir

Root project directory

Type:

Path

syllabus_path

Path to the course syllabus file

Type:

Path

logger

Class logger instance

Type:

Logger

Parameters:
  • syllabus_path (Union[Path, str]) – Path to the syllabus file

  • reading_dir (Union[Path, str]) – Directory containing lesson readings

  • slide_dir (Union[Path, str], optional) – Directory for Beamer slides. Defaults to None.

  • project_dir (Union[Path, str], optional) – Root directory for the project. Defaults to None.

  • verbose (bool, optional) – Whether to show detailed logging. Defaults to True.

property slide_dir
static ocr_available()[source]

Validate if current packages installed support OCR

static missing_ocr_packages()[source]
load_readings(file_path: str | Path) str[source]

Load text content from a single document file.

Parameters:

file_path (Union[str, Path]) – Path to the document file to load

Returns:

Extracted text content prefixed with the file title

Return type:

str

Raises:
  • ValueError – If file type is unsupported or file is corrupted

  • ImportError – If OCR packages are needed but not installed

load_directory(load_from_dir: Path | str) List[str][source]

Load all valid document files within a directory.

Parameters:

load_from_dir (Union[Path, str]) – Directory path to load documents from

Returns:

List of extracted text content from all valid documents

Return type:

List[str]

load_lessons(lesson_number_or_range: int | range, logger=None) Dict[str, List[str]][source]

Load specific lessons by scanning directories based on lesson numbers.

Parameters:
  • lesson_number_or_range (Union[int, range]) – A single lesson number or range of lesson numbers to load.

  • logger – Optional custom logger.

Returns:

A dictionary where each key is a lesson number and each value is a list of readings.

Return type:

Dict[str, List[str]]

load_beamer_presentation(tex_path: Path) str[source]

Loas a Beamer presentation from a .tex file and returns it as a string.

Parameters:

tex_path (Path) – The path to the .tex file containing the Beamer presentation.

Returns:

The content of the .tex file.

Return type:

str

find_prior_beamer_presentation(lesson_no: int, max_attempts: int = 3) Path | str[source]

Dynamically finds the most recent prior lesson to use as a template for slide creation.

Parameters:
  • lesson_no (int) – The current lesson number.

  • max_attempts (int) – The maximum number of previous lessons to attempt loading (default 3).

Returns:

The path to the found Beamer file from a prior lesson.

Return type:

Path | str

Raises:

FileNotFoundError – If no valid prior lesson file is found within the max_attempts range.

extract_text_from_pdf(pdf_path: str | Path) str[source]

Extract text content from a PDF file, with OCR fallback if needed.

Parameters:

pdf_path (Union[str, Path]) – Path to the PDF file

Returns:

Extracted text content from the PDF

Return type:

str

Raises:

ImportError – If text extraction fails and OCR packages are not available

ocr_pdf(pdf_path: Path, max_workers: int = 4) str[source]

Perform OCR on a PDF file to extract text content.

Parameters:
  • pdf_path (Path) – Path to the PDF file

  • max_workers (int, optional) – Number of parallel workers for OCR. Defaults to 4.

Returns:

Extracted text content from OCR

Return type:

str

load_docx_syllabus(syllabus_path) List[str][source]
extract_lesson_objectives(current_lesson: int | str, only_current: bool = False) str[source]

Extract lesson objectives from the syllabus for specified lesson(s).

Parameters:
  • current_lesson (Union[int, str]) – The lesson number to extract objectives for

  • only_current (bool, optional) – If True, return only current lesson objectives. If False, include previous and next lessons. Defaults to False.

Returns:

Extracted lesson objectives text. Returns “No lesson objectives provided”

if no syllabus path is set.

Return type:

str

find_docx_indices(syllabus: List[str], current_lesson: int, lesson_identifier: str = '') Tuple[int | None, int | None, int | None, int | None][source]

Finds the indices of the lessons in the syllabus content.

Parameters:
  • syllabus (List[str]) – A list of strings where each string represents a line in the syllabus document.

  • current_lesson (int) – The lesson number for which to find surrounding lessons.

  • lesson_identifier (str, Defaults to None) – The special word indicating a new lesson on the syllabus (eg “Lesson” or “Week”)

Returns:

The indices of the previous, current, next, and the end of the next lesson.

Return type:

Tuple[int, int, int, int]

class_factory.utils.response_parsers module

class class_factory.utils.response_parsers.MultipleChoiceQuestion(*, question: str, A: str, B: str, C: str, D: str, correct_answer: str)[source]

Bases: BaseModel

question: str
A: str
B: str
C: str
D: str
correct_answer: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.response_parsers.TrueFalseQuestion(*, question: str, A: str, B: str, C: str = '', D: str = '', correct_answer: str)[source]

Bases: BaseModel

question: str
A: str
B: str
C: str
D: str
correct_answer: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.response_parsers.FillInTheBlankQuestion(*, question: str, A: str, B: str, C: str, D: str, correct_answer: str)[source]

Bases: BaseModel

question: str
A: str
B: str
C: str
D: str
correct_answer: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.response_parsers.Quiz(*, multiple_choice: List[MultipleChoiceQuestion], true_false: List[TrueFalseQuestion], fill_in_the_blank: List[FillInTheBlankQuestion])[source]

Bases: BaseModel

multiple_choice: List[MultipleChoiceQuestion]
true_false: List[TrueFalseQuestion]
fill_in_the_blank: List[FillInTheBlankQuestion]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.response_parsers.Relationship(*, concept_1: str, relationship_type: str | None, concept_2: str)[source]

Bases: BaseModel

concept_1: str
relationship_type: str | None
concept_2: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.response_parsers.ExtractedRelations(*, relationships: List[Relationship])[source]

Bases: BaseModel

relationships: List[Relationship]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class_factory.utils.slide_pipeline_utils module

class_factory.utils.slide_pipeline_utils.verify_lesson_dir(lesson_no: int, reading_dir: Path) bool[source]

ensure the lesson directory referenced by the user exists

class_factory.utils.slide_pipeline_utils.verify_beamer_file(beamer_file: Path) bool[source]

check to make sure the suggested file actually exists

class_factory.utils.slide_pipeline_utils.comment_out_includegraphics(latex_content: str) str[source]

This function searches for any includegraphics commands in the LaTeX content and comments them out by adding a ‘%’ at the beginning of the line.

Parameters:

latex_content (str) – The raw LaTeX content as a string.

Returns:

The modified LaTeX content with includegraphics commands commented out.

Return type:

str

class_factory.utils.slide_pipeline_utils.validate_latex(latex_code: str, latex_compiler: str = 'pdflatex') bool[source]

Validates LaTeX by attempting to compile it using a LaTeX engine.

Parameters:
  • latex_code (str) – The LaTeX code to validate.

  • latex_compiler (str) – The full path or name of the LaTeX compiler executable if it’s not on the PATH.

Returns:

True if LaTeX compiles successfully, False otherwise.

Return type:

bool

class_factory.utils.slide_pipeline_utils.clean_latex_content(latex_content: str) str[source]

Clean LaTeX content by removing any text before the itle command and stripping extraneous LaTeX code blocks markers.

Parameters:

latex_content (str) – The LaTeX content to be cleaned.

Returns:

The cleaned LaTeX content.

Return type:

str

class_factory.utils.tools module

class_factory.utils.tools.reset_loggers(log_level=30, log_format='%(asctime)s - %(levelname)s - %(message)s - raised_by: %(name)s')[source]
class_factory.utils.tools.logger_setup(logger_name='query_logger', log_level=20)[source]

Set up and return a logger with the specified name and level. Avoids affecting the root logger by setting propagate to False.

Parameters:
  • logger_name (str) – The name of the logger.

  • log_level (int) – The logging level (e.g., logging.INFO, logging.DEBUG).

Returns:

Configured logger instance.

Return type:

logger (logging.Logger)

class_factory.utils.tools.retry_on_json_decode_error(max_retries: int = 3, delay: float = 2.0)[source]

Decorator to retry a function if a JSONDecodeError or ValueError is encountered.

Parameters:
  • max_retries (int) – The maximum number of retries.

  • delay (float) – The delay in seconds between retries.

Returns:

The decorated function with retry logic.

Return type:

Callable

class_factory.utils.tools.print_directory_tree(path, level=0)[source]

Recursively formats the directory structure in a Markdown-friendly way.

class_factory.utils.validator_prompts module

Prompts for the Validator class (used for LLM output validation).

Module contents