class_factory.utils package¶
Submodules¶
class_factory.utils.base_model module¶
- class class_factory.utils.base_model.BaseModel(lesson_no: int, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, verbose: bool = False)[source]¶
Bases:
objectA base class for educational modules that provides common setup and utility functions, such as loading lesson readings and setting user-defined objectives.
- lesson_no¶
The specific lesson number for the current instance.
- Type:
int
- course_name¶
Name of the course, used as context in other methods and prompts.
- Type:
str
- lesson_loader¶
Instance for loading lesson-related data.
- Type:
- output_dir¶
Directory where outputs are saved; defaults to ‘ClassFactoryOutput’.
- Type:
Path
- logger¶
Logger instance for the class.
- Type:
Logger
- user_objectives¶
Dictionary of user-defined objectives, if provided.
- Type:
Optional[Dict[str, str]]
- _load_readings(lesson_numbers
Union[int, range]) -> Dict[str, List[str]]: Loads and returns readings for the specified lesson(s) as a dictionary.
- set_user_objectives(objectives
Union[List[str], Dict[str, str]], lesson_range: Union[int, range]) -> Dict[str, str]: Sets user-defined objectives for each lesson in the specified range and updates self.user_objectives.
- _get_lesson_objectives(lesson_num
int) -> str: Retrieves lesson objectives for a given lesson number, falling back to extracted objectives if no user objectives exist.
- set_user_objectives(objectives: List[str] | Dict[str, str], lesson_range: int | range) Dict[str, str][source]¶
Set user-defined objectives for each lesson in lesson_range, supporting both list and dictionary formats. Updates the self.user_objectives attribute with the processed objectives.
- Parameters:
objectives (Union[List[str], Dict[str, str]]) – User-provided objectives, either as a list (converted to a dictionary) or as a dictionary keyed by lesson numbers as strings.
lesson_range (Union[int, range]) – Single lesson number or range of lesson numbers for objectives.
- Returns:
Processed dictionary of user objectives keyed by lesson numbers as strings.
- Return type:
Dict[str, str]
- Raises:
ValueError – If the number of objectives in the list does not match the number of lessons in lesson_range.
TypeError – If objectives is neither a list nor a dictionary.
class_factory.utils.llm_validator module¶
- class class_factory.utils.llm_validator.ValidatorInterimResponse(*, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str)[source]¶
Bases:
BaseModel- accuracy: float¶
- completeness: float¶
- consistency: float¶
- reasoning: str¶
- additional_guidance: str¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.llm_validator.ValidatorResponse(*, overall_score: float, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str, status: int)[source]¶
Bases:
BaseModel- overall_score: float¶
- accuracy: float¶
- completeness: float¶
- consistency: float¶
- reasoning: str¶
- additional_guidance: str¶
- status: int¶
- classmethod from_interim(interim: ValidatorInterimResponse, threshold: float = 8.0) ValidatorResponse[source]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.llm_validator.Validator(llm: any, temperature: float = 0.2, log_level: int = 20, system_prompt: str | None = None, human_prompt: str | None = None, tracer: any | None = None, score_threshold: float = 8.0)[source]¶
Bases:
objectA class for validating responses generated by an LLM (Language Model).
The Validator checks the accuracy, completeness, and relevance of the LLM’s response to ensure it meets the requirements specified in the task prompt. Validation results include a score, status, reasoning, and any additional guidance.
class_factory.utils.load_documents module¶
Document Loading and Processing Module¶
This module provides functionality to load, process, and extract text from various document types (PDF, DOCX, and TXT) for generating lesson-specific content. It includes support for OCR processing of scanned documents and handling of structured educational materials like syllabi and lesson readings.
Classes¶
- LessonLoader
Main class for handling document loading and processing operations.
Key Functions¶
The LessonLoader class provides these key functionalities:
- Document Loading:
load_directory: Load all documents from a specified directory
load_lessons: Load lessons from multiple directories with lesson number inference
load_readings: Extract text from individual documents
load_beamer_presentation: Load Beamer presentation content
- Syllabus Processing:
extract_lesson_objectives: Extract objectives for specific lessons
load_docx_syllabus: Load and parse DOCX syllabus content
find_docx_indices: Locate lesson sections within syllabus
- Text Extraction:
extract_text_from_pdf: Extract text from PDF files
ocr_pdf: Perform OCR on scanned documents
convert_pdf_to_docx: Convert PDF files to DOCX format
Dependencies¶
- Core Dependencies:
pypdf: PDF text extraction
python-docx: DOCX file handling
pathlib: File path operations
typing: Type hints
- Optional OCR Dependencies:
pytesseract: OCR processing
pdf2image: PDF to image conversion
spacy: Text processing
contextualSpellCheck: Text correction
img2table: Table extraction from images
Example Usage¶
from class_factory.utils.load_documents import LessonLoader
# Initialize loader with paths
loader = LessonLoader(
syllabus_path="path/to/syllabus.docx",
reading_dir="path/to/readings"
)
# Load specific lesson content
lesson_content = loader.load_lessons(lesson_number=5)
# Extract lesson objectives
objectives = loader.extract_lesson_objectives(current_lesson=5)
Notes
OCR functionality requires additional package installation via pip install class_factory[ocr]
Directory structure should follow consistent naming (e.g., ‘L1’, ‘L2’, etc.)
Supports both PDF and DOCX syllabus formats with automatic conversion if needed
See also
-class:class_factory.utils.tools.logger_setup: Logger configuration
-mod:class_factory.utils.base_model: Base model implementation
- class class_factory.utils.load_documents.LessonLoader(syllabus_path: Path | str, reading_dir: Path | str, slide_dir: Path | str | None = None, project_dir: Path | str | None = None, verbose: bool = True, tabular_syllabus: bool = False)[source]¶
Bases:
objectA class for loading and managing educational content from various document formats.
This class handles loading and processing of lesson materials including syllabi, readings, and Beamer presentations. It supports multiple file formats (PDF, DOCX, TXT) and provides OCR capabilities for scanned documents when necessary.
- reading_dir¶
Directory containing lesson reading materials
- Type:
Path
- slide_dir¶
Directory containing Beamer presentation slides
- Type:
Path
- project_dir¶
Root project directory
- Type:
Path
- syllabus_path¶
Path to the course syllabus file
- Type:
Path
- logger¶
Class logger instance
- Type:
Logger
- Parameters:
syllabus_path (Union[Path, str]) – Path to the syllabus file
reading_dir (Union[Path, str]) – Directory containing lesson readings
slide_dir (Union[Path, str], optional) – Directory for Beamer slides. Defaults to None.
project_dir (Union[Path, str], optional) – Root directory for the project. Defaults to None.
verbose (bool, optional) – Whether to show detailed logging. Defaults to True.
- property slide_dir¶
- load_readings(file_path: str | Path) str[source]¶
Load text content from a single document file.
- Parameters:
file_path (Union[str, Path]) – Path to the document file to load
- Returns:
Extracted text content prefixed with the file title
- Return type:
str
- Raises:
ValueError – If file type is unsupported or file is corrupted
ImportError – If OCR packages are needed but not installed
- load_directory(load_from_dir: Path | str) List[str][source]¶
Load all valid document files within a directory.
- Parameters:
load_from_dir (Union[Path, str]) – Directory path to load documents from
- Returns:
List of extracted text content from all valid documents
- Return type:
List[str]
- load_lessons(lesson_number_or_range: int | range, logger=None) Dict[str, List[str]][source]¶
Load specific lessons by scanning directories based on lesson numbers.
- Parameters:
lesson_number_or_range (Union[int, range]) – A single lesson number or range of lesson numbers to load.
logger – Optional custom logger.
- Returns:
A dictionary where each key is a lesson number and each value is a list of readings.
- Return type:
Dict[str, List[str]]
- load_beamer_presentation(tex_path: Path) str[source]¶
Loas a Beamer presentation from a .tex file and returns it as a string.
- Parameters:
tex_path (Path) – The path to the .tex file containing the Beamer presentation.
- Returns:
The content of the .tex file.
- Return type:
str
- find_prior_beamer_presentation(lesson_no: int, max_attempts: int = 3) Path | str[source]¶
Dynamically finds the most recent prior lesson to use as a template for slide creation.
- Parameters:
lesson_no (int) – The current lesson number.
max_attempts (int) – The maximum number of previous lessons to attempt loading (default 3).
- Returns:
The path to the found Beamer file from a prior lesson.
- Return type:
Path | str
- Raises:
FileNotFoundError – If no valid prior lesson file is found within the max_attempts range.
- extract_text_from_pdf(pdf_path: str | Path) str[source]¶
Extract text content from a PDF file, with OCR fallback if needed.
- Parameters:
pdf_path (Union[str, Path]) – Path to the PDF file
- Returns:
Extracted text content from the PDF
- Return type:
str
- Raises:
ImportError – If text extraction fails and OCR packages are not available
- ocr_pdf(pdf_path: Path, max_workers: int = 4) str[source]¶
Perform OCR on a PDF file to extract text content.
- Parameters:
pdf_path (Path) – Path to the PDF file
max_workers (int, optional) – Number of parallel workers for OCR. Defaults to 4.
- Returns:
Extracted text content from OCR
- Return type:
str
- extract_lesson_objectives(current_lesson: int | str, only_current: bool = False) str[source]¶
Extract lesson objectives from the syllabus for specified lesson(s).
- Parameters:
current_lesson (Union[int, str]) – The lesson number to extract objectives for
only_current (bool, optional) – If True, return only current lesson objectives. If False, include previous and next lessons. Defaults to False.
- Returns:
- Extracted lesson objectives text. Returns “No lesson objectives provided”
if no syllabus path is set.
- Return type:
str
- find_docx_indices(syllabus: List[str], current_lesson: int, lesson_identifier: str = '') Tuple[int | None, int | None, int | None, int | None][source]¶
Finds the indices of the lessons in the syllabus content.
- Parameters:
syllabus (List[str]) – A list of strings where each string represents a line in the syllabus document.
current_lesson (int) – The lesson number for which to find surrounding lessons.
lesson_identifier (str, Defaults to None) – The special word indicating a new lesson on the syllabus (eg “Lesson” or “Week”)
- Returns:
The indices of the previous, current, next, and the end of the next lesson.
- Return type:
Tuple[int, int, int, int]
class_factory.utils.response_parsers module¶
- class class_factory.utils.response_parsers.MultipleChoiceQuestion(*, question: str, A: str, B: str, C: str, D: str, correct_answer: str)[source]¶
Bases:
BaseModel- question: str¶
- A: str¶
- B: str¶
- C: str¶
- D: str¶
- correct_answer: str¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.response_parsers.TrueFalseQuestion(*, question: str, A: str, B: str, C: str = '', D: str = '', correct_answer: str)[source]¶
Bases:
BaseModel- question: str¶
- A: str¶
- B: str¶
- C: str¶
- D: str¶
- correct_answer: str¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.response_parsers.FillInTheBlankQuestion(*, question: str, A: str, B: str, C: str, D: str, correct_answer: str)[source]¶
Bases:
BaseModel- question: str¶
- A: str¶
- B: str¶
- C: str¶
- D: str¶
- correct_answer: str¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.response_parsers.Quiz(*, multiple_choice: List[MultipleChoiceQuestion], true_false: List[TrueFalseQuestion], fill_in_the_blank: List[FillInTheBlankQuestion])[source]¶
Bases:
BaseModel- multiple_choice: List[MultipleChoiceQuestion]¶
- true_false: List[TrueFalseQuestion]¶
- fill_in_the_blank: List[FillInTheBlankQuestion]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.response_parsers.Relationship(*, concept_1: str, relationship_type: str | None, concept_2: str)[source]¶
Bases:
BaseModel- concept_1: str¶
- relationship_type: str | None¶
- concept_2: str¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.response_parsers.ExtractedRelations(*, relationships: List[Relationship])[source]¶
Bases:
BaseModel- relationships: List[Relationship]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
class_factory.utils.slide_pipeline_utils module¶
- class_factory.utils.slide_pipeline_utils.verify_lesson_dir(lesson_no: int, reading_dir: Path) bool[source]¶
ensure the lesson directory referenced by the user exists
- class_factory.utils.slide_pipeline_utils.verify_beamer_file(beamer_file: Path) bool[source]¶
check to make sure the suggested file actually exists
- class_factory.utils.slide_pipeline_utils.comment_out_includegraphics(latex_content: str) str[source]¶
This function searches for any includegraphics commands in the LaTeX content and comments them out by adding a ‘%’ at the beginning of the line.
- Parameters:
latex_content (str) – The raw LaTeX content as a string.
- Returns:
The modified LaTeX content with includegraphics commands commented out.
- Return type:
str
- class_factory.utils.slide_pipeline_utils.validate_latex(latex_code: str, latex_compiler: str = 'pdflatex') bool[source]¶
Validates LaTeX by attempting to compile it using a LaTeX engine.
- Parameters:
latex_code (str) – The LaTeX code to validate.
latex_compiler (str) – The full path or name of the LaTeX compiler executable if it’s not on the PATH.
- Returns:
True if LaTeX compiles successfully, False otherwise.
- Return type:
bool
- class_factory.utils.slide_pipeline_utils.clean_latex_content(latex_content: str) str[source]¶
Clean LaTeX content by removing any text before the itle command and stripping extraneous LaTeX code blocks markers.
- Parameters:
latex_content (str) – The LaTeX content to be cleaned.
- Returns:
The cleaned LaTeX content.
- Return type:
str
class_factory.utils.tools module¶
- class_factory.utils.tools.reset_loggers(log_level=30, log_format='%(asctime)s - %(levelname)s - %(message)s - raised_by: %(name)s')[source]¶
- class_factory.utils.tools.logger_setup(logger_name='query_logger', log_level=20)[source]¶
Set up and return a logger with the specified name and level. Avoids affecting the root logger by setting propagate to False.
- Parameters:
logger_name (str) – The name of the logger.
log_level (int) – The logging level (e.g., logging.INFO, logging.DEBUG).
- Returns:
Configured logger instance.
- Return type:
logger (logging.Logger)
- class_factory.utils.tools.retry_on_json_decode_error(max_retries: int = 3, delay: float = 2.0)[source]¶
Decorator to retry a function if a JSONDecodeError or ValueError is encountered.
- Parameters:
max_retries (int) – The maximum number of retries.
delay (float) – The delay in seconds between retries.
- Returns:
The decorated function with retry logic.
- Return type:
Callable
class_factory.utils.validator_prompts module¶
Prompts for the Validator class (used for LLM output validation).