API Reference

This section contains the complete API documentation for all ClassFactory modules.

Core Module

ClassFactory Module

The ClassFactory module provides a unified interface for managing AI-powered educational content generation modules.

Supported Modules

  • BeamerBot: Automates LaTeX Beamer slide generation based on lesson materials

  • ConceptWeb: Creates concept maps showing relationships between key lesson concepts

  • QuizMaker: Generates quizzes with interactive features and similarity analysis

Key Functionalities

  1. Module Management: - Dynamic module creation via create_module() - Shared context and configurations across modules - Consistent error handling and validation

  2. Resource Management: - Centralized path handling for lesson materials - Organized output structure in ClassFactoryOutput - Automated resource loading and validation

  3. AI Integration: - Flexible LLM support (GPT-4, LLaMA, etc.) - Consistent AI interaction patterns - Shared context across operations

Output Directory Structure

ClassFactoryOutput/
├── BeamerBot/
│   └── L{lesson_no}/
├── ConceptWeb/
│   └── L{lesson_no}/
└── QuizMaker/
    └── L{lesson_no}/

Usage

from class_factory import ClassFactory
from langchain_openai import ChatOpenAI
from pathlib import Path

# Initialize factory
factory = ClassFactory(
    lesson_no=10,
    syllabus_path="path/to/syllabus.docx",
    reading_dir="path/to/readings",
    llm=ChatOpenAI(api_key="your_key")
)

# Create and use modules
slides = factory.create_module("BeamerBot").generate_slides()
concept_map = factory.create_module("ConceptWeb").build_concept_map()
quiz = factory.create_module("QuizMaker").make_a_quiz()

Dependencies

  • pathlib: Path handling

  • langchain: LLM integration

  • pyprojroot: Project directory management

  • Custom modules: BeamerBot, ConceptWeb, QuizMaker

Notes

  • BeamerBot operates on single lessons only

  • ConceptWeb and QuizMaker support lesson ranges

  • All modules inherit factory-level configurations

  • Output directories are automatically created and managed

class class_factory.ClassFactory.ClassFactory(lesson_no: int, reading_dir: str | Path, llm, syllabus_path: str | Path = None, project_dir: str | Path | None = None, output_dir: str | Path | None = None, slide_dir: str | Path | None = None, lesson_range: range | None = None, course_name: str = 'Political Science', verbose: bool = True, tabular_syllabus: bool = False, **kwargs)[source]

Bases: object

A factory class responsible for creating and managing instances of various educational modules.

ClassFactory provides a standardized interface for initializing educational modules designed for generating lesson-specific materials, such as slides, concept maps, and quizzes. Modules are dynamically created based on the specified module name, with configurations for content generation provided by the user.

Modules available for creation include: - BeamerBot: Automated LaTeX Beamer slide generation. - ConceptWeb: Concept map generation based on lesson objectives and readings. - QuizMaker: Quiz creation, hosting, and analysis.

lesson_no

The lesson number for which the module instance is created.

Type:

int

syllabus_path

Path to the syllabus file.

Type:

Path

reading_dir

Path to the directory containing lesson readings.

Type:

Path

slide_dir

Path to the directory containing lesson slides.

Type:

Path

llm

Language model instance used for content generation in modules.

project_dir

Base project directory.

Type:

Path

output_dir

Directory where outputs from modules are saved.

Type:

Path

lesson_range

Range of lessons covered by the factory instance.

Type:

range

course_name

Name of the course for which content is generated.

Type:

str

lesson_loader

Instance of LessonLoader for loading lesson-related data and objectives.

Type:

LessonLoader

By default, all module outputs are saved in a structured directory under “ClassFactoryOutput” within the project directory.

create_module(module_name: str, **kwargs)[source]

Create a specific module instance based on the provided module name.

Parameters:
  • module_name (str) – Name of the module to create. Case-insensitive options: - ‘BeamerBot’/’beamerbot’: For LaTeX slide generation - ‘ConceptWeb’/’conceptweb’: For concept map creation - ‘QuizMaker’/’quizmaker’: For quiz generation and management

  • **kwargs – Module-specific configuration options: - output_dir (Path): Custom output directory (defaults to self.output_dir) - verbose (bool): Enable detailed logging (defaults to False) - course_name (str): Override default course name - lesson_range (range): Override default lesson range - slide_dir (Path): Custom slide directory (BeamerBot only)

Returns:

The created module instance based on the provided name.

Return type:

Union[BeamerBot, ConceptMapBuilder, QuizMaker]

Raises:

ValueError – If an invalid module name is provided.

Notes

  • Each module’s output is automatically organized in a dedicated subdirectory:

ClassFactoryOutput/{ModuleName}/L{lesson_no}/ - BeamerBot operates on single lessons, while ConceptWeb and QuizMaker can handle lesson ranges

BeamerBot - Slide Generation

BeamerBot Module

The BeamerBot module provides a framework for generating structured LaTeX Beamer slides based on lesson objectives, readings, and prior lesson presentations. By using a language model (LLM), BeamerBot automates the process of slide creation, ensuring a consistent slide structure while allowing for custom guidance and validation.

Key Functionalities

  1. Automated Slide Generation: - BeamerBot generates a LaTeX Beamer presentation for each lesson, incorporating:

    • A title page with consistent author and institution information

    • “Where We Came From” and “Where We Are Going” slides

    • Lesson objectives with highlighted action verbs (e.g., textbf{Analyze} key events)

    • Discussion questions and in-class exercises

    • Summary slides with key takeaways

  2. Previous Lesson Integration: - Retrieves and references prior lesson presentations to maintain consistent formatting and flow - Preserves author and institution information across presentations

  3. Prompt Customization and Validation: - Supports custom prompts and specific guidance for tailored slide content - Validates generated LaTeX for correct formatting and content quality - Provides multiple retry attempts if validation fails

Dependencies

This module requires:

  • langchain_core: For LLM chain creation and prompt handling

  • pathlib: For file path management

  • Custom utility modules for: - Document loading (load_documents) - LaTeX validation (llm_validator) - Response parsing (response_parsers) - Slide pipeline utilities (slide_pipeline_utils)

Usage

  1. Initialize BeamerBot: ```python beamer_bot = BeamerBot(

    lesson_no=10, llm=llm, course_name=”Political Science”, lesson_loader=lesson_loader, output_dir=output_dir

  2. Generate Slides: `python # Optional specific guidance guidance = "Focus on comparing democratic and authoritarian systems" slides = beamer_bot.generate_slides(specific_guidance=guidance) `

  3. Save the Slides: `python beamer_bot.save_slides(slides) `

class class_factory.beamer_bot.BeamerBot.BeamerBot(lesson_no: int, llm, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, verbose: bool = False, slide_dir: Path | str = None, lesson_objectives: dict = None)[source]

Bases: BaseModel

A class to generate LaTeX Beamer slides for a specified lesson using a language model (LLM).

BeamerBot automates the slide generation process, creating structured presentations based on lesson readings, objectives, and content from prior presentations when available. Each slide is crafted following a consistent format, and the generated LaTeX is validated for correctness.

lesson_no

Lesson number for which to generate slides.

Type:

int

llm

Language model instance for generating slides.

course_name

Name of the course for slide context.

Type:

str

lesson_loader

Loader for accessing lesson readings and objectives.

Type:

LessonLoader

output_dir

Directory to save the generated Beamer slides.

Type:

Path

slide_dir

Directory containing existing Beamer slides.

Type:

Optional[Path]

llm_response

Stores the generated LaTeX response from the LLM.

Type:

str

prompt

Generated prompt for the LLM.

Type:

str

lesson_objectives

user-provided lesson objectives if syllabus not available.

Type:

optional, dict

generate_slides(specific_guidance

str = None, latex_compiler: str = “pdflatex”) -> str: Generates Beamer slides as LaTeX code for the specified lesson.

save_slides(latex_content

str) -> None: Saves the generated LaTeX content to a .tex file.

set_user_objectives(objectives

Union[List[str], Dict[str, str]]): Initialize user-defined lesson objectives, converting lists to dictionaries if needed. Inherited from BaseModel.

Internal Methods:
_format_readings_for_prompt() -> str:

Combines readings across lessons into a single string for the LLM prompt.

_find_prior_lesson(lesson_no: int, max_attempts: int = 3) -> Path:

Finds the most recent prior lesson’s Beamer file to use as a template.

_load_prior_lesson() -> str:

Loads the LaTeX content of a prior lesson’s Beamer presentation as a string.

_generate_prompt() -> str:

Constructs the LLM prompt using lesson objectives, readings, and prior lesson content.

_validate_llm_response(generated_slides: str, objectives: str, readings: str, last_presentation: str,

prompt_specific_guidance: str = “”, additional_guidance: str = “”) -> Dict[str, Any]:

Validates the generated LaTeX for quality and accuracy.

generate_slides(specific_guidance: str = None, lesson_objectives: dict = None, latex_compiler: str = 'pdflatex') str[source]

Generate LaTeX Beamer slides for the lesson using the language model.

Parameters:
  • specific_guidance (str, optional) – Custom instructions for slide content and structure

  • lesson_objectives (dict, optional) – Override default objectives with custom ones Format: {lesson_number: “objective text”}

  • latex_compiler (str, optional) – LaTeX compiler to use for validation. Defaults to “pdflatex”

Returns:

Complete LaTeX content for the presentation, including preamble

Return type:

str

Raises:
  • ValueError – If validation fails after maximum retry attempts

  • FileNotFoundError – If required prior lesson files cannot be located

Note

The method includes multiple validation steps: 1. Content quality validation through LLM 2. LaTeX syntax validation using specified compiler 3. Up to 3 retry attempts if validation fails

save_slides(latex_content: str, output_dir: Path | str = None) None[source]

Save the generated LaTeX content to a .tex file.

Parameters:

latex_content (str) – The LaTeX content to save.

ConceptWeb - Concept Mapping

ConceptWeb Module

The ConceptWeb module provides tools to automatically extract, analyze, and visualize key concepts from lesson materials, helping to identify connections across topics and lessons. Central to this module is the ConceptMapBuilder class, which leverages a language model (LLM) to identify and structure important ideas and relationships from lesson readings and objectives into a graph-based representation.

Key functionalities of the module include:

  • Concept Extraction:
    • Identifies key concepts from lesson readings and objectives using an LLM.

    • Summarizes and highlights main themes from each lesson’s content.

  • Relationship Mapping:
    • Extracts and maps relationships between identified concepts based on lesson objectives and content.

    • Facilitates understanding of how topics interrelate within and across lessons.

  • Graph-Based Visualization:
    • Constructs a concept map in which nodes represent concepts and edges represent relationships.

    • Generates both interactive graph-based visualizations (HTML) and word clouds for key concepts.

  • Community Detection:
    • Groups closely related concepts into thematic clusters.

    • Helps identify broader themes or subtopics within the lesson materials.

  • Data Saving:
    • Optionally saves intermediate data (concepts and relationships) as JSON files for further review or analysis.

Dependencies

This module depends on:

  • langchain_core: For LLM-based extraction and summarization tasks.

  • networkx: For graph generation and analysis of concept relationships.

  • matplotlib or plotly: For creating visualizations and word clouds.

  • Custom utilities for loading documents, extracting objectives, and handling logging.

Usage Overview

  1. Initialize ConceptMapBuilder: - Instantiate ConceptMapBuilder with paths to project directories, reading materials, and the syllabus file.

  2. Generate the Concept Map: - Use build_concept_map() to process lesson materials, extract and summarize concepts, map relationships, and generate visualizations.

  3. Save and Review: - The generated concept map can be saved as an interactive HTML file or as a static word cloud for easier review and analysis.

Example

from class_factory.concept_web.ConceptMapBuilder import ConceptMapBuilder
from class_factory.utils.load_documents import LessonLoader
from langchain_openai import ChatOpenAI

# Set up paths and initialize components
syllabus_path = Path("/path/to/syllabus.docx")
reading_dir = Path("/path/to/lesson/readings")
project_dir = Path("/path/to/project")
llm = ChatOpenAI(api_key="your_api_key")

# Initialize the lesson loader and concept map builder
lesson_loader = LessonLoader(syllabus_path=syllabus_path, reading_dir=reading_dir, project_dir=project_dir)
concept_map_builder = ConceptMapBuilder(
    lesson_no=1,
    lesson_loader=lesson_loader,
    llm=llm,
    course_name="Sample Course",
    lesson_range=range(1, 5)
)

# Build and visualize the concept map
concept_map_builder.build_concept_map()
class class_factory.concept_web.ConceptWeb.ConceptMapBuilder(lesson_no: int, lesson_loader: LessonLoader, llm, course_name: str, output_dir: str | Path = None, lesson_range: range | int = None, lesson_objectives: List[str] | Dict[str, str] = None, verbose: bool = False, save_relationships: bool = False, **kwargs)[source]

Bases: BaseModel

Orchestrates the extraction, analysis, and visualization of key concepts and their relationships from lesson materials.

Uses a language model (LLM) to summarize content, extract relationships, and build a graph-based concept map. Provides methods for processing lessons, saving intermediate data, and generating interactive visualizations.

load_and_process_lessons(threshold: float = 0.995)[source]

Process lesson materials by summarizing content and extracting concept relationships for each lesson.

Parameters:

threshold (float, optional) – Similarity threshold for extracted concepts. Defaults to 0.995.

For each lesson in lesson_range:
  • Load documents and objectives.

  • Summarize readings using the LLM.

  • Extract relationships between concepts and generates unique concept list.

build_concept_map(directed: bool = False, concept_similarity_threshold: float = 0.995, dark_mode: bool = True, lesson_objectives: Dict[str, str] | None = None) None[source]

Run the full pipeline to generate a concept map and visualization.

Parameters:
  • directed (bool, optional) – Whether to create a directed concept map. Defaults to False.

  • concept_similarity_threshold (float, optional) – Threshold for concept similarity. Defaults to 0.995.

  • dark_mode (bool, optional) – Use dark mode for visualization. Defaults to True.

  • lesson_objectives (Optional[Dict[str, str]], optional) – User-provided lesson objectives. Defaults to None.

QuizMaker - Quiz Generation

QuizMaker Module

The QuizMaker module offers a comprehensive framework for generating, distributing, and analyzing quiz questions based on lesson content and objectives. At its core, the QuizMaker class uses a language model (LLM) to create targeted quiz questions, ensuring these questions are relevant to the course material. This class also provides utilities for similarity checking, interactive quiz launches, and detailed results assessment.

Key Functionalities:

  1. Quiz Generation: - Automatically generates quiz questions from lesson objectives and readings. - Customizable difficulty level and quiz content based on user-provided or auto-extracted lesson objectives.

  2. Similarity Checking: - Detects overlap with previous quizzes to prevent question duplication. - Uses sentence embeddings to flag and remove questions too similar to prior quizzes.

  3. Validation and Formatting: - Validates generated questions to ensure proper format and structure. - Corrects answer formatting to meet quiz standards (e.g., answers in ‘A’, ‘B’, ‘C’, ‘D’).

  4. Saving Quizzes: - Exports quizzes as Excel files or PowerPoint presentations. - Customizes PowerPoint presentations using templates for polished quiz slides.

  5. Interactive Quiz Launch: - Launches an interactive Gradio-based web quiz for real-time participation. - Supports QR code access and real-time result saving.

  6. Results Assessment: - Analyzes and visualizes quiz results stored in CSV files. - Generates summary statistics, HTML reports, and dashboards for insights into quiz performance.

Dependencies

This module requires:

  • langchain_core: For LLM interaction and prompt handling.

  • sentence_transformers: For semantic similarity detection in quiz questions.

  • pptx: For PowerPoint presentation generation.

  • pandas: For data handling and result assessment.

  • torch: For managing device selection and embedding models.

  • gradio: For interactive quiz interfaces.

  • Custom utilities for document loading, response parsing, logging, and retry decorators.

Usage Overview

  1. Initialize QuizMaker: - Instantiate QuizMaker with required paths, lesson loader, and LLM.

  2. Generate a Quiz: - Call make_a_quiz() to create quiz questions based on lesson materials, with automatic similarity checking.

  3. Save the Quiz: - Use save_quiz() to save the quiz as an Excel file or save_quiz_to_ppt() to export to PowerPoint.

  4. Launch an Interactive Quiz: - Use launch_interactive_quiz() to start a web-based quiz, with options for real-time participation and result saving.

  5. Assess Quiz Results: - Analyze saved quiz responses with assess_quiz_results(), generating summary statistics, reports, and visualizations.

Example

from pathlib import Path
from class_factory.quiz_maker.QuizMaker import QuizMaker
from class_factory.utils.load_documents import LessonLoader
from langchain_openai import ChatOpenAI

# Set up paths and initialize components
syllabus_path = Path("/path/to/syllabus.docx")
reading_dir = Path("/path/to/lesson/readings")
project_dir = Path("/path/to/project")
llm = ChatOpenAI(api_key="your_api_key")

# Initialize lesson loader and quiz maker
lesson_loader = LessonLoader(syllabus_path=syllabus_path, reading_dir=reading_dir, project_dir=project_dir)
quiz_maker = QuizMaker(
    llm=llm,
    lesson_no=1,
    course_name="Sample Course",
    lesson_loader=lesson_loader,
    output_dir=Path("/path/to/output/dir")
)

# Generate and save a quiz
quiz = quiz_maker.make_a_quiz()
quiz_maker.save_quiz(quiz)
quiz_maker.save_quiz_to_ppt(quiz)
class class_factory.quiz_maker.QuizMaker.QuizMaker(llm, lesson_no: int, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, prior_quiz_path: Path | str = None, lesson_range: range = range(1, 5), quiz_prompt_for_llm: str = None, device=None, lesson_objectives: dict = None, verbose=False)[source]

Bases: BaseModel

A class to generate and manage quizzes based on lesson readings and objectives using a language model (LLM).

QuizMaker generates quiz questions from lesson content, checks for similarity with prior quizzes to avoid redundancy, and validates question format. Quizzes can be saved as Excel or PowerPoint files, launched interactively, and analyzed for performance.

llm

The language model instance for quiz generation.

syllabus_path

Path to the syllabus file.

Type:

Path

reading_dir

Directory containing lesson readings.

Type:

Path

output_dir

Directory for saving quiz files.

Type:

Path

prior_quiz_path

Directory with prior quizzes for similarity checks.

Type:

Path

lesson_range

Range of lessons for quiz generation.

Type:

range

course_name

Name of the course for context in question generation.

Type:

str

device

Device for embeddings (CPU or GPU).

rejected_questions

List of questions flagged as similar to prior quizzes.

Type:

List[Dict]

make_a_quiz(difficulty_level

int = 5, flag_threshold: float = 0.7) -> List[Dict]: Generates quiz questions with similarity checks to avoid redundancy.

save_quiz(quiz

List[Dict]) -> None: Saves quiz questions to an Excel file.

save_quiz_to_ppt(quiz

List[Dict] = None, excel_file: Path = None, template_path: Path = None) -> None: Saves quiz questions to a PowerPoint file, optionally with a template.

launch_interactive_quiz(quiz_data, sample_size

int = 5, seed: int = 42, save_results: bool = False, output_dir: Path = None, qr_name: str = None) -> None: Launches an interactive quiz using Gradio.

assess_quiz_results(quiz_data

pd.DataFrame = None, results_dir: Path = None, output_dir: Path = None) -> pd.DataFrame: Analyzes quiz results and generates summary statistics and visualizations.

Internal Methods:
_validate_llm_response(quiz_questions: Dict[str, Any], objectives: str, readings: str, prior_quiz_questions: List[str], difficulty_level: int, additional_guidance: str) -> Dict[str, Any]:

Validates generated quiz questions for relevance and format.

_validate_questions(questions: List[Dict]) -> List[Dict]:

Checks for formatting errors and corrects them.

_build_quiz_chain() -> Any:

Builds the LLM chain for quiz generation.

_load_and_merge_prior_quizzes() -> Tuple[List[str], pd.DataFrame]:

Loads and merges questions from prior quizzes for similarity checking.

_check_question_similarity(generated_questions: List[str], threshold: float = 0.6) -> List[Dict]:

Checks for question similarity against prior quizzes.

_separate_flagged_questions(questions: List[Dict], flagged_questions: List[Dict]) -> Tuple[List[Dict], List[Dict]]:

Separates flagged questions based on similarity results.

make_a_quiz(difficulty_level: int = 5, flag_threshold: float = 0.7) List[Dict][source]

Generate quiz questions based on lesson readings and objectives, checking for similarity with prior quizzes.

Parameters:
  • difficulty_level (int) – Difficulty of generated questions, scale 1-10. Defaults to 5.

  • flag_threshold (float) – Similarity threshold for rejecting duplicate questions. Defaults to 0.7.

Returns:

Generated quiz questions, with duplicates removed. Each dict contains:
  • question (str): The question text

  • type (str): Question type (e.g. “multiple_choice”)

    1. (str): First answer choice

    1. (str): Second answer choice

    1. (str): Third answer choice

    1. (str): Fourth answer choice

  • correct_answer (str): Letter of correct answer

Return type:

List[Dict[str, Any]]

save_quiz(quiz: List[Dict]) None[source]

Save quiz questions to an Excel file, standardizing answer key styles.

Parameters:

quiz (List[Dict[str, Any]]) – List of quiz questions to save. Each dict should contain: - type (str): Question type - question (str): Question text - A/B/C/D or A)/B)/C)/D) (str): Answer choices - correct_answer (str): Letter of correct answer

Returns:

None

save_quiz_to_ppt(quiz: List[Dict] = None, excel_file: Path | str = None, template_path: Path | str = None, filename: str = None) None[source]

Save quiz questions to a PowerPoint presentation, with options to use a template.

quiz_questions = response.model_dump() if hasattr(response, ‘model_dump’) else json.loads(response.replace(’```json

‘, ‘’).replace(’ ```’, ‘’)) if isinstance(response, str) else response

quiz (List[Dict], optional): List of quiz questions in dictionary format. excel_file (Path, optional): Path to an Excel file containing quiz questions. If provided, it overrides the quiz argument. template_path (Path, optional): Path to a PowerPoint template to apply to the generated slides.

Raises:

ValueError: If neither quiz nor excel_file is provided.

Creates a PowerPoint presentation with each question on a slide, followed by the answer slide.

launch_interactive_quiz(quiz_data: DataFrame | Path | str | List[Dict[str, Any]] = None, sample_size: int = 5, seed: int = 42, save_results: bool = False, output_dir: Path | None = None, qr_name: str | None = None) None[source]

Launch an interactive quiz using Gradio, sampling questions from provided data or generating new data if none is provided.

Parameters:
  • quiz_data (Union[pd.DataFrame, Path, str, List[Dict[str, Any]]], optional) – Quiz questions as a DataFrame, Excel path, or list of dictionaries. If None, generates new questions.

  • sample_size (int, optional) – Number of questions to sample. Defaults to 5.

  • seed (int, optional) – Random seed for consistent sampling. Defaults to 42.

  • save_results (bool, optional) – Whether to save quiz results. Defaults to False.

  • output_dir (Path, optional) – Directory to save quiz results. Defaults to the class’s output directory.

  • qr_name (str, optional) – Name of the QR code image file for accessing the quiz.

Raises:

ValueError – If quiz_data is provided but is not a valid type.

assess_quiz_results(quiz_data: DataFrame | None = None, results_dir: Path | str | None = None, output_dir: Path | str | None = None) DataFrame[source]

Analyze quiz results, generate summary statistics, and visualize responses.

Parameters:
  • quiz_data (pd.DataFrame, optional) – DataFrame of quiz results. If None, loads results from CSV files in results_dir.

  • results_dir (Path, optional) – Directory containing CSV files of quiz results.

  • output_dir (Path, optional) – Directory for saving summary statistics and plots. Defaults to output_dir/’quiz_analysis’.

Returns:

DataFrame with summary statistics, including:
  • question (str): Question text

  • Total Responses (int): Number of unique users who answered

  • Correct Responses (int): Number of correct answers

  • Incorrect Responses (int): Number of incorrect answers

  • Percent Correct (float): Percentage of correct answers

  • Modal Answer (str): Most common answer given

Return type:

pd.DataFrame

Raises:

AssertionError – If quiz_data is provided but is not a pandas DataFrame.

Utilities

Base Model

class class_factory.utils.base_model.BaseModel(lesson_no: int, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, verbose: bool = False)[source]

Bases: object

A base class for educational modules that provides common setup and utility functions, such as loading lesson readings and setting user-defined objectives.

lesson_no

The specific lesson number for the current instance.

Type:

int

course_name

Name of the course, used as context in other methods and prompts.

Type:

str

lesson_loader

Instance for loading lesson-related data.

Type:

LessonLoader

output_dir

Directory where outputs are saved; defaults to ‘ClassFactoryOutput’.

Type:

Path

logger

Logger instance for the class.

Type:

Logger

user_objectives

Dictionary of user-defined objectives, if provided.

Type:

Optional[Dict[str, str]]

_load_readings(lesson_numbers

Union[int, range]) -> Dict[str, List[str]]: Loads and returns readings for the specified lesson(s) as a dictionary.

set_user_objectives(objectives

Union[List[str], Dict[str, str]], lesson_range: Union[int, range]) -> Dict[str, str]: Sets user-defined objectives for each lesson in the specified range and updates self.user_objectives.

_get_lesson_objectives(lesson_num

int) -> str: Retrieves lesson objectives for a given lesson number, falling back to extracted objectives if no user objectives exist.

set_user_objectives(objectives: List[str] | Dict[str, str], lesson_range: int | range) Dict[str, str][source]

Set user-defined objectives for each lesson in lesson_range, supporting both list and dictionary formats. Updates the self.user_objectives attribute with the processed objectives.

Parameters:
  • objectives (Union[List[str], Dict[str, str]]) – User-provided objectives, either as a list (converted to a dictionary) or as a dictionary keyed by lesson numbers as strings.

  • lesson_range (Union[int, range]) – Single lesson number or range of lesson numbers for objectives.

Returns:

Processed dictionary of user objectives keyed by lesson numbers as strings.

Return type:

Dict[str, str]

Raises:
  • ValueError – If the number of objectives in the list does not match the number of lessons in lesson_range.

  • TypeError – If objectives is neither a list nor a dictionary.

Document Loading

Document Loading and Processing Module

This module provides functionality to load, process, and extract text from various document types (PDF, DOCX, and TXT) for generating lesson-specific content. It includes support for OCR processing of scanned documents and handling of structured educational materials like syllabi and lesson readings.

Classes
LessonLoader

Main class for handling document loading and processing operations.

Key Functions

The LessonLoader class provides these key functionalities:

  • Document Loading:
    • load_directory: Load all documents from a specified directory

    • load_lessons: Load lessons from multiple directories with lesson number inference

    • load_readings: Extract text from individual documents

    • load_beamer_presentation: Load Beamer presentation content

  • Syllabus Processing:
    • extract_lesson_objectives: Extract objectives for specific lessons

    • load_docx_syllabus: Load and parse DOCX syllabus content

    • find_docx_indices: Locate lesson sections within syllabus

  • Text Extraction:
    • extract_text_from_pdf: Extract text from PDF files

    • ocr_pdf: Perform OCR on scanned documents

    • convert_pdf_to_docx: Convert PDF files to DOCX format

Dependencies
Core Dependencies:
  • pypdf: PDF text extraction

  • python-docx: DOCX file handling

  • pathlib: File path operations

  • typing: Type hints

Optional OCR Dependencies:
  • pytesseract: OCR processing

  • pdf2image: PDF to image conversion

  • spacy: Text processing

  • contextualSpellCheck: Text correction

  • img2table: Table extraction from images

Example Usage
from class_factory.utils.load_documents import LessonLoader

# Initialize loader with paths
loader = LessonLoader(
    syllabus_path="path/to/syllabus.docx",
    reading_dir="path/to/readings"
)

# Load specific lesson content
lesson_content = loader.load_lessons(lesson_number=5)

# Extract lesson objectives
objectives = loader.extract_lesson_objectives(current_lesson=5)

Notes

  • OCR functionality requires additional package installation via pip install class_factory[ocr]

  • Directory structure should follow consistent naming (e.g., ‘L1’, ‘L2’, etc.)

  • Supports both PDF and DOCX syllabus formats with automatic conversion if needed

See also

-

class:class_factory.utils.tools.logger_setup: Logger configuration

-

mod:class_factory.utils.base_model: Base model implementation

class class_factory.utils.load_documents.LessonLoader(syllabus_path: Path | str, reading_dir: Path | str, slide_dir: Path | str | None = None, project_dir: Path | str | None = None, verbose: bool = True, tabular_syllabus: bool = False)[source]

Bases: object

A class for loading and managing educational content from various document formats.

This class handles loading and processing of lesson materials including syllabi, readings, and Beamer presentations. It supports multiple file formats (PDF, DOCX, TXT) and provides OCR capabilities for scanned documents when necessary.

reading_dir

Directory containing lesson reading materials

Type:

Path

slide_dir

Directory containing Beamer presentation slides

Type:

Path

project_dir

Root project directory

Type:

Path

syllabus_path

Path to the course syllabus file

Type:

Path

logger

Class logger instance

Type:

Logger

Parameters:
  • syllabus_path (Union[Path, str]) – Path to the syllabus file

  • reading_dir (Union[Path, str]) – Directory containing lesson readings

  • slide_dir (Union[Path, str], optional) – Directory for Beamer slides. Defaults to None.

  • project_dir (Union[Path, str], optional) – Root directory for the project. Defaults to None.

  • verbose (bool, optional) – Whether to show detailed logging. Defaults to True.

property slide_dir
static ocr_available()[source]

Validate if current packages installed support OCR

static missing_ocr_packages()[source]
load_readings(file_path: str | Path) str[source]

Load text content from a single document file.

Parameters:

file_path (Union[str, Path]) – Path to the document file to load

Returns:

Extracted text content prefixed with the file title

Return type:

str

Raises:
  • ValueError – If file type is unsupported or file is corrupted

  • ImportError – If OCR packages are needed but not installed

load_directory(load_from_dir: Path | str) List[str][source]

Load all valid document files within a directory.

Parameters:

load_from_dir (Union[Path, str]) – Directory path to load documents from

Returns:

List of extracted text content from all valid documents

Return type:

List[str]

load_lessons(lesson_number_or_range: int | range, logger=None) Dict[str, List[str]][source]

Load specific lessons by scanning directories based on lesson numbers.

Parameters:
  • lesson_number_or_range (Union[int, range]) – A single lesson number or range of lesson numbers to load.

  • logger – Optional custom logger.

Returns:

A dictionary where each key is a lesson number and each value is a list of readings.

Return type:

Dict[str, List[str]]

load_beamer_presentation(tex_path: Path) str[source]

Loas a Beamer presentation from a .tex file and returns it as a string.

Parameters:

tex_path (Path) – The path to the .tex file containing the Beamer presentation.

Returns:

The content of the .tex file.

Return type:

str

find_prior_beamer_presentation(lesson_no: int, max_attempts: int = 3) Path | str[source]

Dynamically finds the most recent prior lesson to use as a template for slide creation.

Parameters:
  • lesson_no (int) – The current lesson number.

  • max_attempts (int) – The maximum number of previous lessons to attempt loading (default 3).

Returns:

The path to the found Beamer file from a prior lesson.

Return type:

Path | str

Raises:

FileNotFoundError – If no valid prior lesson file is found within the max_attempts range.

extract_text_from_pdf(pdf_path: str | Path) str[source]

Extract text content from a PDF file, with OCR fallback if needed.

Parameters:

pdf_path (Union[str, Path]) – Path to the PDF file

Returns:

Extracted text content from the PDF

Return type:

str

Raises:

ImportError – If text extraction fails and OCR packages are not available

ocr_pdf(pdf_path: Path, max_workers: int = 4) str[source]

Perform OCR on a PDF file to extract text content.

Parameters:
  • pdf_path (Path) – Path to the PDF file

  • max_workers (int, optional) – Number of parallel workers for OCR. Defaults to 4.

Returns:

Extracted text content from OCR

Return type:

str

load_docx_syllabus(syllabus_path) List[str][source]
extract_lesson_objectives(current_lesson: int | str, only_current: bool = False) str[source]

Extract lesson objectives from the syllabus for specified lesson(s).

Parameters:
  • current_lesson (Union[int, str]) – The lesson number to extract objectives for

  • only_current (bool, optional) – If True, return only current lesson objectives. If False, include previous and next lessons. Defaults to False.

Returns:

Extracted lesson objectives text. Returns “No lesson objectives provided”

if no syllabus path is set.

Return type:

str

find_docx_indices(syllabus: List[str], current_lesson: int, lesson_identifier: str = '') Tuple[int | None, int | None, int | None, int | None][source]

Finds the indices of the lessons in the syllabus content.

Parameters:
  • syllabus (List[str]) – A list of strings where each string represents a line in the syllabus document.

  • current_lesson (int) – The lesson number for which to find surrounding lessons.

  • lesson_identifier (str, Defaults to None) – The special word indicating a new lesson on the syllabus (eg “Lesson” or “Week”)

Returns:

The indices of the previous, current, next, and the end of the next lesson.

Return type:

Tuple[int, int, int, int]

LLM Validator

class class_factory.utils.llm_validator.ValidatorInterimResponse(*, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str)[source]

Bases: BaseModel

accuracy: float
completeness: float
consistency: float
reasoning: str
additional_guidance: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.llm_validator.ValidatorResponse(*, overall_score: float, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str, status: int)[source]

Bases: BaseModel

overall_score: float
accuracy: float
completeness: float
consistency: float
reasoning: str
additional_guidance: str
status: int
classmethod from_interim(interim: ValidatorInterimResponse, threshold: float = 8.0) ValidatorResponse[source]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class class_factory.utils.llm_validator.Validator(llm: any, temperature: float = 0.2, log_level: int = 20, system_prompt: str | None = None, human_prompt: str | None = None, tracer: any | None = None, score_threshold: float = 8.0)[source]

Bases: object

A class for validating responses generated by an LLM (Language Model).

The Validator checks the accuracy, completeness, and relevance of the LLM’s response to ensure it meets the requirements specified in the task prompt. Validation results include a score, status, reasoning, and any additional guidance.

validate(task_description: str, generated_response: str, task_schema: str = '', specific_guidance: str = '') dict[source]