API Reference¶
This section contains the complete API documentation for all ClassFactory modules.
Core Module¶
ClassFactory Module¶
The ClassFactory module provides a unified interface for managing AI-powered educational content generation modules.
Supported Modules¶
BeamerBot: Automates LaTeX Beamer slide generation based on lesson materials
ConceptWeb: Creates concept maps showing relationships between key lesson concepts
QuizMaker: Generates quizzes with interactive features and similarity analysis
Key Functionalities¶
Module Management: - Dynamic module creation via
create_module()- Shared context and configurations across modules - Consistent error handling and validationResource Management: - Centralized path handling for lesson materials - Organized output structure in
ClassFactoryOutput- Automated resource loading and validationAI Integration: - Flexible LLM support (GPT-4, LLaMA, etc.) - Consistent AI interaction patterns - Shared context across operations
Output Directory Structure¶
ClassFactoryOutput/
├── BeamerBot/
│ └── L{lesson_no}/
├── ConceptWeb/
│ └── L{lesson_no}/
└── QuizMaker/
└── L{lesson_no}/
Usage¶
from class_factory import ClassFactory
from langchain_openai import ChatOpenAI
from pathlib import Path
# Initialize factory
factory = ClassFactory(
lesson_no=10,
syllabus_path="path/to/syllabus.docx",
reading_dir="path/to/readings",
llm=ChatOpenAI(api_key="your_key")
)
# Create and use modules
slides = factory.create_module("BeamerBot").generate_slides()
concept_map = factory.create_module("ConceptWeb").build_concept_map()
quiz = factory.create_module("QuizMaker").make_a_quiz()
Dependencies¶
pathlib: Path handlinglangchain: LLM integrationpyprojroot: Project directory managementCustom modules:
BeamerBot,ConceptWeb,QuizMaker
Notes
BeamerBot operates on single lessons only
ConceptWeb and QuizMaker support lesson ranges
All modules inherit factory-level configurations
Output directories are automatically created and managed
- class class_factory.ClassFactory.ClassFactory(lesson_no: int, reading_dir: str | Path, llm, syllabus_path: str | Path = None, project_dir: str | Path | None = None, output_dir: str | Path | None = None, slide_dir: str | Path | None = None, lesson_range: range | None = None, course_name: str = 'Political Science', verbose: bool = True, tabular_syllabus: bool = False, **kwargs)[source]
Bases:
objectA factory class responsible for creating and managing instances of various educational modules.
ClassFactory provides a standardized interface for initializing educational modules designed for generating lesson-specific materials, such as slides, concept maps, and quizzes. Modules are dynamically created based on the specified module name, with configurations for content generation provided by the user.
Modules available for creation include: - BeamerBot: Automated LaTeX Beamer slide generation. - ConceptWeb: Concept map generation based on lesson objectives and readings. - QuizMaker: Quiz creation, hosting, and analysis.
- lesson_no
The lesson number for which the module instance is created.
- Type:
int
- syllabus_path
Path to the syllabus file.
- Type:
Path
- reading_dir
Path to the directory containing lesson readings.
- Type:
Path
- slide_dir
Path to the directory containing lesson slides.
- Type:
Path
- llm
Language model instance used for content generation in modules.
- project_dir
Base project directory.
- Type:
Path
- output_dir
Directory where outputs from modules are saved.
- Type:
Path
- lesson_range
Range of lessons covered by the factory instance.
- Type:
range
- course_name
Name of the course for which content is generated.
- Type:
str
- lesson_loader
Instance of LessonLoader for loading lesson-related data and objectives.
- Type:
By default, all module outputs are saved in a structured directory under “ClassFactoryOutput” within the project directory.
- create_module(module_name: str, **kwargs)[source]
Create a specific module instance based on the provided module name.
- Parameters:
module_name (str) – Name of the module to create. Case-insensitive options: - ‘BeamerBot’/’beamerbot’: For LaTeX slide generation - ‘ConceptWeb’/’conceptweb’: For concept map creation - ‘QuizMaker’/’quizmaker’: For quiz generation and management
**kwargs – Module-specific configuration options: - output_dir (Path): Custom output directory (defaults to self.output_dir) - verbose (bool): Enable detailed logging (defaults to False) - course_name (str): Override default course name - lesson_range (range): Override default lesson range - slide_dir (Path): Custom slide directory (BeamerBot only)
- Returns:
The created module instance based on the provided name.
- Return type:
Union[BeamerBot, ConceptMapBuilder, QuizMaker]
- Raises:
ValueError – If an invalid module name is provided.
Notes
Each module’s output is automatically organized in a dedicated subdirectory:
ClassFactoryOutput/{ModuleName}/L{lesson_no}/ - BeamerBot operates on single lessons, while ConceptWeb and QuizMaker can handle lesson ranges
BeamerBot - Slide Generation¶
BeamerBot Module¶
The BeamerBot module provides a framework for generating structured LaTeX Beamer slides based on lesson objectives, readings, and prior lesson presentations. By using a language model (LLM), BeamerBot automates the process of slide creation, ensuring a consistent slide structure while allowing for custom guidance and validation.
Key Functionalities¶
Automated Slide Generation: - BeamerBot generates a LaTeX Beamer presentation for each lesson, incorporating:
A title page with consistent author and institution information
“Where We Came From” and “Where We Are Going” slides
Lesson objectives with highlighted action verbs (e.g., textbf{Analyze} key events)
Discussion questions and in-class exercises
Summary slides with key takeaways
Previous Lesson Integration: - Retrieves and references prior lesson presentations to maintain consistent formatting and flow - Preserves author and institution information across presentations
Prompt Customization and Validation: - Supports custom prompts and specific guidance for tailored slide content - Validates generated LaTeX for correct formatting and content quality - Provides multiple retry attempts if validation fails
Dependencies¶
This module requires:
langchain_core: For LLM chain creation and prompt handling
pathlib: For file path management
Custom utility modules for: - Document loading (load_documents) - LaTeX validation (llm_validator) - Response parsing (response_parsers) - Slide pipeline utilities (slide_pipeline_utils)
Usage¶
Initialize BeamerBot: ```python beamer_bot = BeamerBot(
lesson_no=10, llm=llm, course_name=”Political Science”, lesson_loader=lesson_loader, output_dir=output_dir
Generate Slides:
`python # Optional specific guidance guidance = "Focus on comparing democratic and authoritarian systems" slides = beamer_bot.generate_slides(specific_guidance=guidance) `Save the Slides:
`python beamer_bot.save_slides(slides) `
- class class_factory.beamer_bot.BeamerBot.BeamerBot(lesson_no: int, llm, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, verbose: bool = False, slide_dir: Path | str = None, lesson_objectives: dict = None)[source]
Bases:
BaseModelA class to generate LaTeX Beamer slides for a specified lesson using a language model (LLM).
BeamerBot automates the slide generation process, creating structured presentations based on lesson readings, objectives, and content from prior presentations when available. Each slide is crafted following a consistent format, and the generated LaTeX is validated for correctness.
- lesson_no
Lesson number for which to generate slides.
- Type:
int
- llm
Language model instance for generating slides.
- course_name
Name of the course for slide context.
- Type:
str
- lesson_loader
Loader for accessing lesson readings and objectives.
- Type:
- output_dir
Directory to save the generated Beamer slides.
- Type:
Path
- slide_dir
Directory containing existing Beamer slides.
- Type:
Optional[Path]
- llm_response
Stores the generated LaTeX response from the LLM.
- Type:
str
- prompt
Generated prompt for the LLM.
- Type:
str
- lesson_objectives
user-provided lesson objectives if syllabus not available.
- Type:
optional, dict
- generate_slides(specific_guidance
str = None, latex_compiler: str = “pdflatex”) -> str: Generates Beamer slides as LaTeX code for the specified lesson.
- save_slides(latex_content
str) -> None: Saves the generated LaTeX content to a .tex file.
- set_user_objectives(objectives
Union[List[str], Dict[str, str]]): Initialize user-defined lesson objectives, converting lists to dictionaries if needed. Inherited from BaseModel.
- Internal Methods:
- _format_readings_for_prompt() -> str:
Combines readings across lessons into a single string for the LLM prompt.
- _find_prior_lesson(lesson_no: int, max_attempts: int = 3) -> Path:
Finds the most recent prior lesson’s Beamer file to use as a template.
- _load_prior_lesson() -> str:
Loads the LaTeX content of a prior lesson’s Beamer presentation as a string.
- _generate_prompt() -> str:
Constructs the LLM prompt using lesson objectives, readings, and prior lesson content.
- _validate_llm_response(generated_slides: str, objectives: str, readings: str, last_presentation: str,
prompt_specific_guidance: str = “”, additional_guidance: str = “”) -> Dict[str, Any]:
Validates the generated LaTeX for quality and accuracy.
- generate_slides(specific_guidance: str = None, lesson_objectives: dict = None, latex_compiler: str = 'pdflatex') str[source]
Generate LaTeX Beamer slides for the lesson using the language model.
- Parameters:
specific_guidance (str, optional) – Custom instructions for slide content and structure
lesson_objectives (dict, optional) – Override default objectives with custom ones Format: {lesson_number: “objective text”}
latex_compiler (str, optional) – LaTeX compiler to use for validation. Defaults to “pdflatex”
- Returns:
Complete LaTeX content for the presentation, including preamble
- Return type:
str
- Raises:
ValueError – If validation fails after maximum retry attempts
FileNotFoundError – If required prior lesson files cannot be located
Note
The method includes multiple validation steps: 1. Content quality validation through LLM 2. LaTeX syntax validation using specified compiler 3. Up to 3 retry attempts if validation fails
- save_slides(latex_content: str, output_dir: Path | str = None) None[source]
Save the generated LaTeX content to a .tex file.
- Parameters:
latex_content (str) – The LaTeX content to save.
ConceptWeb - Concept Mapping¶
ConceptWeb Module¶
The ConceptWeb module provides tools to automatically extract, analyze, and visualize key concepts from lesson materials, helping to identify connections across topics and lessons. Central to this module is the ConceptMapBuilder class, which leverages a language model (LLM) to identify and structure important ideas and relationships from lesson readings and objectives into a graph-based representation.
Key functionalities of the module include:
- Concept Extraction:
Identifies key concepts from lesson readings and objectives using an LLM.
Summarizes and highlights main themes from each lesson’s content.
- Relationship Mapping:
Extracts and maps relationships between identified concepts based on lesson objectives and content.
Facilitates understanding of how topics interrelate within and across lessons.
- Graph-Based Visualization:
Constructs a concept map in which nodes represent concepts and edges represent relationships.
Generates both interactive graph-based visualizations (HTML) and word clouds for key concepts.
- Community Detection:
Groups closely related concepts into thematic clusters.
Helps identify broader themes or subtopics within the lesson materials.
- Data Saving:
Optionally saves intermediate data (concepts and relationships) as JSON files for further review or analysis.
Dependencies¶
This module depends on:
langchain_core: For LLM-based extraction and summarization tasks.
networkx: For graph generation and analysis of concept relationships.
matplotlib or plotly: For creating visualizations and word clouds.
Custom utilities for loading documents, extracting objectives, and handling logging.
Usage Overview¶
Initialize ConceptMapBuilder: - Instantiate ConceptMapBuilder with paths to project directories, reading materials, and the syllabus file.
Generate the Concept Map: - Use build_concept_map() to process lesson materials, extract and summarize concepts, map relationships, and generate visualizations.
Save and Review: - The generated concept map can be saved as an interactive HTML file or as a static word cloud for easier review and analysis.
Example
from class_factory.concept_web.ConceptMapBuilder import ConceptMapBuilder
from class_factory.utils.load_documents import LessonLoader
from langchain_openai import ChatOpenAI
# Set up paths and initialize components
syllabus_path = Path("/path/to/syllabus.docx")
reading_dir = Path("/path/to/lesson/readings")
project_dir = Path("/path/to/project")
llm = ChatOpenAI(api_key="your_api_key")
# Initialize the lesson loader and concept map builder
lesson_loader = LessonLoader(syllabus_path=syllabus_path, reading_dir=reading_dir, project_dir=project_dir)
concept_map_builder = ConceptMapBuilder(
lesson_no=1,
lesson_loader=lesson_loader,
llm=llm,
course_name="Sample Course",
lesson_range=range(1, 5)
)
# Build and visualize the concept map
concept_map_builder.build_concept_map()
- class class_factory.concept_web.ConceptWeb.ConceptMapBuilder(lesson_no: int, lesson_loader: LessonLoader, llm, course_name: str, output_dir: str | Path = None, lesson_range: range | int = None, lesson_objectives: List[str] | Dict[str, str] = None, verbose: bool = False, save_relationships: bool = False, **kwargs)[source]
Bases:
BaseModelOrchestrates the extraction, analysis, and visualization of key concepts and their relationships from lesson materials.
Uses a language model (LLM) to summarize content, extract relationships, and build a graph-based concept map. Provides methods for processing lessons, saving intermediate data, and generating interactive visualizations.
- load_and_process_lessons(threshold: float = 0.995)[source]
Process lesson materials by summarizing content and extracting concept relationships for each lesson.
- Parameters:
threshold (float, optional) – Similarity threshold for extracted concepts. Defaults to 0.995.
- For each lesson in lesson_range:
Load documents and objectives.
Summarize readings using the LLM.
Extract relationships between concepts and generates unique concept list.
- build_concept_map(directed: bool = False, concept_similarity_threshold: float = 0.995, dark_mode: bool = True, lesson_objectives: Dict[str, str] | None = None) None[source]
Run the full pipeline to generate a concept map and visualization.
- Parameters:
directed (bool, optional) – Whether to create a directed concept map. Defaults to False.
concept_similarity_threshold (float, optional) – Threshold for concept similarity. Defaults to 0.995.
dark_mode (bool, optional) – Use dark mode for visualization. Defaults to True.
lesson_objectives (Optional[Dict[str, str]], optional) – User-provided lesson objectives. Defaults to None.
QuizMaker - Quiz Generation¶
QuizMaker Module¶
The QuizMaker module offers a comprehensive framework for generating, distributing, and analyzing quiz questions based on lesson content and objectives. At its core, the QuizMaker class uses a language model (LLM) to create targeted quiz questions, ensuring these questions are relevant to the course material. This class also provides utilities for similarity checking, interactive quiz launches, and detailed results assessment.
Key Functionalities:¶
Quiz Generation: - Automatically generates quiz questions from lesson objectives and readings. - Customizable difficulty level and quiz content based on user-provided or auto-extracted lesson objectives.
Similarity Checking: - Detects overlap with previous quizzes to prevent question duplication. - Uses sentence embeddings to flag and remove questions too similar to prior quizzes.
Validation and Formatting: - Validates generated questions to ensure proper format and structure. - Corrects answer formatting to meet quiz standards (e.g., answers in ‘A’, ‘B’, ‘C’, ‘D’).
Saving Quizzes: - Exports quizzes as Excel files or PowerPoint presentations. - Customizes PowerPoint presentations using templates for polished quiz slides.
Interactive Quiz Launch: - Launches an interactive Gradio-based web quiz for real-time participation. - Supports QR code access and real-time result saving.
Results Assessment: - Analyzes and visualizes quiz results stored in CSV files. - Generates summary statistics, HTML reports, and dashboards for insights into quiz performance.
Dependencies¶
This module requires:
langchain_core: For LLM interaction and prompt handling.
sentence_transformers: For semantic similarity detection in quiz questions.
pptx: For PowerPoint presentation generation.
pandas: For data handling and result assessment.
torch: For managing device selection and embedding models.
gradio: For interactive quiz interfaces.
Custom utilities for document loading, response parsing, logging, and retry decorators.
Usage Overview¶
Initialize QuizMaker: - Instantiate QuizMaker with required paths, lesson loader, and LLM.
Generate a Quiz: - Call make_a_quiz() to create quiz questions based on lesson materials, with automatic similarity checking.
Save the Quiz: - Use save_quiz() to save the quiz as an Excel file or save_quiz_to_ppt() to export to PowerPoint.
Launch an Interactive Quiz: - Use launch_interactive_quiz() to start a web-based quiz, with options for real-time participation and result saving.
Assess Quiz Results: - Analyze saved quiz responses with assess_quiz_results(), generating summary statistics, reports, and visualizations.
Example
from pathlib import Path
from class_factory.quiz_maker.QuizMaker import QuizMaker
from class_factory.utils.load_documents import LessonLoader
from langchain_openai import ChatOpenAI
# Set up paths and initialize components
syllabus_path = Path("/path/to/syllabus.docx")
reading_dir = Path("/path/to/lesson/readings")
project_dir = Path("/path/to/project")
llm = ChatOpenAI(api_key="your_api_key")
# Initialize lesson loader and quiz maker
lesson_loader = LessonLoader(syllabus_path=syllabus_path, reading_dir=reading_dir, project_dir=project_dir)
quiz_maker = QuizMaker(
llm=llm,
lesson_no=1,
course_name="Sample Course",
lesson_loader=lesson_loader,
output_dir=Path("/path/to/output/dir")
)
# Generate and save a quiz
quiz = quiz_maker.make_a_quiz()
quiz_maker.save_quiz(quiz)
quiz_maker.save_quiz_to_ppt(quiz)
- class class_factory.quiz_maker.QuizMaker.QuizMaker(llm, lesson_no: int, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, prior_quiz_path: Path | str = None, lesson_range: range = range(1, 5), quiz_prompt_for_llm: str = None, device=None, lesson_objectives: dict = None, verbose=False)[source]
Bases:
BaseModelA class to generate and manage quizzes based on lesson readings and objectives using a language model (LLM).
QuizMaker generates quiz questions from lesson content, checks for similarity with prior quizzes to avoid redundancy, and validates question format. Quizzes can be saved as Excel or PowerPoint files, launched interactively, and analyzed for performance.
- llm
The language model instance for quiz generation.
- syllabus_path
Path to the syllabus file.
- Type:
Path
- reading_dir
Directory containing lesson readings.
- Type:
Path
- output_dir
Directory for saving quiz files.
- Type:
Path
- prior_quiz_path
Directory with prior quizzes for similarity checks.
- Type:
Path
- lesson_range
Range of lessons for quiz generation.
- Type:
range
- course_name
Name of the course for context in question generation.
- Type:
str
- device
Device for embeddings (CPU or GPU).
- rejected_questions
List of questions flagged as similar to prior quizzes.
- Type:
List[Dict]
- make_a_quiz(difficulty_level
int = 5, flag_threshold: float = 0.7) -> List[Dict]: Generates quiz questions with similarity checks to avoid redundancy.
- save_quiz(quiz
List[Dict]) -> None: Saves quiz questions to an Excel file.
- save_quiz_to_ppt(quiz
List[Dict] = None, excel_file: Path = None, template_path: Path = None) -> None: Saves quiz questions to a PowerPoint file, optionally with a template.
- launch_interactive_quiz(quiz_data, sample_size
int = 5, seed: int = 42, save_results: bool = False, output_dir: Path = None, qr_name: str = None) -> None: Launches an interactive quiz using Gradio.
- assess_quiz_results(quiz_data
pd.DataFrame = None, results_dir: Path = None, output_dir: Path = None) -> pd.DataFrame: Analyzes quiz results and generates summary statistics and visualizations.
- Internal Methods:
- _validate_llm_response(quiz_questions: Dict[str, Any], objectives: str, readings: str, prior_quiz_questions: List[str], difficulty_level: int, additional_guidance: str) -> Dict[str, Any]:
Validates generated quiz questions for relevance and format.
- _validate_questions(questions: List[Dict]) -> List[Dict]:
Checks for formatting errors and corrects them.
- _build_quiz_chain() -> Any:
Builds the LLM chain for quiz generation.
- _load_and_merge_prior_quizzes() -> Tuple[List[str], pd.DataFrame]:
Loads and merges questions from prior quizzes for similarity checking.
- _check_question_similarity(generated_questions: List[str], threshold: float = 0.6) -> List[Dict]:
Checks for question similarity against prior quizzes.
- _separate_flagged_questions(questions: List[Dict], flagged_questions: List[Dict]) -> Tuple[List[Dict], List[Dict]]:
Separates flagged questions based on similarity results.
- make_a_quiz(difficulty_level: int = 5, flag_threshold: float = 0.7) List[Dict][source]
Generate quiz questions based on lesson readings and objectives, checking for similarity with prior quizzes.
- Parameters:
difficulty_level (int) – Difficulty of generated questions, scale 1-10. Defaults to 5.
flag_threshold (float) – Similarity threshold for rejecting duplicate questions. Defaults to 0.7.
- Returns:
- Generated quiz questions, with duplicates removed. Each dict contains:
question (str): The question text
type (str): Question type (e.g. “multiple_choice”)
(str): First answer choice
(str): Second answer choice
(str): Third answer choice
(str): Fourth answer choice
correct_answer (str): Letter of correct answer
- Return type:
List[Dict[str, Any]]
- save_quiz(quiz: List[Dict]) None[source]
Save quiz questions to an Excel file, standardizing answer key styles.
- Parameters:
quiz (List[Dict[str, Any]]) – List of quiz questions to save. Each dict should contain: - type (str): Question type - question (str): Question text - A/B/C/D or A)/B)/C)/D) (str): Answer choices - correct_answer (str): Letter of correct answer
- Returns:
None
- save_quiz_to_ppt(quiz: List[Dict] = None, excel_file: Path | str = None, template_path: Path | str = None, filename: str = None) None[source]
‘, ‘’).replace(’ ```’, ‘’)) if isinstance(response, str) else response
quiz (List[Dict], optional): List of quiz questions in dictionary format. excel_file (Path, optional): Path to an Excel file containing quiz questions. If provided, it overrides the quiz argument. template_path (Path, optional): Path to a PowerPoint template to apply to the generated slides.
- Raises:
ValueError: If neither quiz nor excel_file is provided.
Creates a PowerPoint presentation with each question on a slide, followed by the answer slide.
- launch_interactive_quiz(quiz_data: DataFrame | Path | str | List[Dict[str, Any]] = None, sample_size: int = 5, seed: int = 42, save_results: bool = False, output_dir: Path | None = None, qr_name: str | None = None) None[source]
Launch an interactive quiz using Gradio, sampling questions from provided data or generating new data if none is provided.
- Parameters:
quiz_data (Union[pd.DataFrame, Path, str, List[Dict[str, Any]]], optional) – Quiz questions as a DataFrame, Excel path, or list of dictionaries. If None, generates new questions.
sample_size (int, optional) – Number of questions to sample. Defaults to 5.
seed (int, optional) – Random seed for consistent sampling. Defaults to 42.
save_results (bool, optional) – Whether to save quiz results. Defaults to False.
output_dir (Path, optional) – Directory to save quiz results. Defaults to the class’s output directory.
qr_name (str, optional) – Name of the QR code image file for accessing the quiz.
- Raises:
ValueError – If quiz_data is provided but is not a valid type.
- assess_quiz_results(quiz_data: DataFrame | None = None, results_dir: Path | str | None = None, output_dir: Path | str | None = None) DataFrame[source]
Analyze quiz results, generate summary statistics, and visualize responses.
- Parameters:
quiz_data (pd.DataFrame, optional) – DataFrame of quiz results. If None, loads results from CSV files in results_dir.
results_dir (Path, optional) – Directory containing CSV files of quiz results.
output_dir (Path, optional) – Directory for saving summary statistics and plots. Defaults to output_dir/’quiz_analysis’.
- Returns:
- DataFrame with summary statistics, including:
question (str): Question text
Total Responses (int): Number of unique users who answered
Correct Responses (int): Number of correct answers
Incorrect Responses (int): Number of incorrect answers
Percent Correct (float): Percentage of correct answers
Modal Answer (str): Most common answer given
- Return type:
pd.DataFrame
- Raises:
AssertionError – If quiz_data is provided but is not a pandas DataFrame.
Utilities¶
Base Model¶
- class class_factory.utils.base_model.BaseModel(lesson_no: int, course_name: str, lesson_loader: LessonLoader, output_dir: Path | str = None, verbose: bool = False)[source]
Bases:
objectA base class for educational modules that provides common setup and utility functions, such as loading lesson readings and setting user-defined objectives.
- lesson_no
The specific lesson number for the current instance.
- Type:
int
- course_name
Name of the course, used as context in other methods and prompts.
- Type:
str
- lesson_loader
Instance for loading lesson-related data.
- Type:
- output_dir
Directory where outputs are saved; defaults to ‘ClassFactoryOutput’.
- Type:
Path
- logger
Logger instance for the class.
- Type:
Logger
- user_objectives
Dictionary of user-defined objectives, if provided.
- Type:
Optional[Dict[str, str]]
- _load_readings(lesson_numbers
Union[int, range]) -> Dict[str, List[str]]: Loads and returns readings for the specified lesson(s) as a dictionary.
- set_user_objectives(objectives
Union[List[str], Dict[str, str]], lesson_range: Union[int, range]) -> Dict[str, str]: Sets user-defined objectives for each lesson in the specified range and updates self.user_objectives.
- _get_lesson_objectives(lesson_num
int) -> str: Retrieves lesson objectives for a given lesson number, falling back to extracted objectives if no user objectives exist.
- set_user_objectives(objectives: List[str] | Dict[str, str], lesson_range: int | range) Dict[str, str][source]
Set user-defined objectives for each lesson in lesson_range, supporting both list and dictionary formats. Updates the self.user_objectives attribute with the processed objectives.
- Parameters:
objectives (Union[List[str], Dict[str, str]]) – User-provided objectives, either as a list (converted to a dictionary) or as a dictionary keyed by lesson numbers as strings.
lesson_range (Union[int, range]) – Single lesson number or range of lesson numbers for objectives.
- Returns:
Processed dictionary of user objectives keyed by lesson numbers as strings.
- Return type:
Dict[str, str]
- Raises:
ValueError – If the number of objectives in the list does not match the number of lessons in lesson_range.
TypeError – If objectives is neither a list nor a dictionary.
Document Loading¶
Document Loading and Processing Module¶
This module provides functionality to load, process, and extract text from various document types (PDF, DOCX, and TXT) for generating lesson-specific content. It includes support for OCR processing of scanned documents and handling of structured educational materials like syllabi and lesson readings.
Classes¶
- LessonLoader
Main class for handling document loading and processing operations.
Key Functions¶
The LessonLoader class provides these key functionalities:
- Document Loading:
load_directory: Load all documents from a specified directory
load_lessons: Load lessons from multiple directories with lesson number inference
load_readings: Extract text from individual documents
load_beamer_presentation: Load Beamer presentation content
- Syllabus Processing:
extract_lesson_objectives: Extract objectives for specific lessons
load_docx_syllabus: Load and parse DOCX syllabus content
find_docx_indices: Locate lesson sections within syllabus
- Text Extraction:
extract_text_from_pdf: Extract text from PDF files
ocr_pdf: Perform OCR on scanned documents
convert_pdf_to_docx: Convert PDF files to DOCX format
Dependencies¶
- Core Dependencies:
pypdf: PDF text extraction
python-docx: DOCX file handling
pathlib: File path operations
typing: Type hints
- Optional OCR Dependencies:
pytesseract: OCR processing
pdf2image: PDF to image conversion
spacy: Text processing
contextualSpellCheck: Text correction
img2table: Table extraction from images
Example Usage¶
from class_factory.utils.load_documents import LessonLoader
# Initialize loader with paths
loader = LessonLoader(
syllabus_path="path/to/syllabus.docx",
reading_dir="path/to/readings"
)
# Load specific lesson content
lesson_content = loader.load_lessons(lesson_number=5)
# Extract lesson objectives
objectives = loader.extract_lesson_objectives(current_lesson=5)
Notes
OCR functionality requires additional package installation via pip install class_factory[ocr]
Directory structure should follow consistent naming (e.g., ‘L1’, ‘L2’, etc.)
Supports both PDF and DOCX syllabus formats with automatic conversion if needed
See also
-class:class_factory.utils.tools.logger_setup: Logger configuration
-mod:class_factory.utils.base_model: Base model implementation
- class class_factory.utils.load_documents.LessonLoader(syllabus_path: Path | str, reading_dir: Path | str, slide_dir: Path | str | None = None, project_dir: Path | str | None = None, verbose: bool = True, tabular_syllabus: bool = False)[source]
Bases:
objectA class for loading and managing educational content from various document formats.
This class handles loading and processing of lesson materials including syllabi, readings, and Beamer presentations. It supports multiple file formats (PDF, DOCX, TXT) and provides OCR capabilities for scanned documents when necessary.
- reading_dir
Directory containing lesson reading materials
- Type:
Path
- slide_dir
Directory containing Beamer presentation slides
- Type:
Path
- project_dir
Root project directory
- Type:
Path
- syllabus_path
Path to the course syllabus file
- Type:
Path
- logger
Class logger instance
- Type:
Logger
- Parameters:
syllabus_path (Union[Path, str]) – Path to the syllabus file
reading_dir (Union[Path, str]) – Directory containing lesson readings
slide_dir (Union[Path, str], optional) – Directory for Beamer slides. Defaults to None.
project_dir (Union[Path, str], optional) – Root directory for the project. Defaults to None.
verbose (bool, optional) – Whether to show detailed logging. Defaults to True.
- property slide_dir
- static ocr_available()[source]
Validate if current packages installed support OCR
- static missing_ocr_packages()[source]
- load_readings(file_path: str | Path) str[source]
Load text content from a single document file.
- Parameters:
file_path (Union[str, Path]) – Path to the document file to load
- Returns:
Extracted text content prefixed with the file title
- Return type:
str
- Raises:
ValueError – If file type is unsupported or file is corrupted
ImportError – If OCR packages are needed but not installed
- load_directory(load_from_dir: Path | str) List[str][source]
Load all valid document files within a directory.
- Parameters:
load_from_dir (Union[Path, str]) – Directory path to load documents from
- Returns:
List of extracted text content from all valid documents
- Return type:
List[str]
- load_lessons(lesson_number_or_range: int | range, logger=None) Dict[str, List[str]][source]
Load specific lessons by scanning directories based on lesson numbers.
- Parameters:
lesson_number_or_range (Union[int, range]) – A single lesson number or range of lesson numbers to load.
logger – Optional custom logger.
- Returns:
A dictionary where each key is a lesson number and each value is a list of readings.
- Return type:
Dict[str, List[str]]
- load_beamer_presentation(tex_path: Path) str[source]
Loas a Beamer presentation from a .tex file and returns it as a string.
- Parameters:
tex_path (Path) – The path to the .tex file containing the Beamer presentation.
- Returns:
The content of the .tex file.
- Return type:
str
- find_prior_beamer_presentation(lesson_no: int, max_attempts: int = 3) Path | str[source]
Dynamically finds the most recent prior lesson to use as a template for slide creation.
- Parameters:
lesson_no (int) – The current lesson number.
max_attempts (int) – The maximum number of previous lessons to attempt loading (default 3).
- Returns:
The path to the found Beamer file from a prior lesson.
- Return type:
Path | str
- Raises:
FileNotFoundError – If no valid prior lesson file is found within the max_attempts range.
- extract_text_from_pdf(pdf_path: str | Path) str[source]
Extract text content from a PDF file, with OCR fallback if needed.
- Parameters:
pdf_path (Union[str, Path]) – Path to the PDF file
- Returns:
Extracted text content from the PDF
- Return type:
str
- Raises:
ImportError – If text extraction fails and OCR packages are not available
- ocr_pdf(pdf_path: Path, max_workers: int = 4) str[source]
Perform OCR on a PDF file to extract text content.
- Parameters:
pdf_path (Path) – Path to the PDF file
max_workers (int, optional) – Number of parallel workers for OCR. Defaults to 4.
- Returns:
Extracted text content from OCR
- Return type:
str
- load_docx_syllabus(syllabus_path) List[str][source]
- extract_lesson_objectives(current_lesson: int | str, only_current: bool = False) str[source]
Extract lesson objectives from the syllabus for specified lesson(s).
- Parameters:
current_lesson (Union[int, str]) – The lesson number to extract objectives for
only_current (bool, optional) – If True, return only current lesson objectives. If False, include previous and next lessons. Defaults to False.
- Returns:
- Extracted lesson objectives text. Returns “No lesson objectives provided”
if no syllabus path is set.
- Return type:
str
- find_docx_indices(syllabus: List[str], current_lesson: int, lesson_identifier: str = '') Tuple[int | None, int | None, int | None, int | None][source]
Finds the indices of the lessons in the syllabus content.
- Parameters:
syllabus (List[str]) – A list of strings where each string represents a line in the syllabus document.
current_lesson (int) – The lesson number for which to find surrounding lessons.
lesson_identifier (str, Defaults to None) – The special word indicating a new lesson on the syllabus (eg “Lesson” or “Week”)
- Returns:
The indices of the previous, current, next, and the end of the next lesson.
- Return type:
Tuple[int, int, int, int]
LLM Validator¶
- class class_factory.utils.llm_validator.ValidatorInterimResponse(*, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str)[source]
Bases:
BaseModel- accuracy: float
- completeness: float
- consistency: float
- reasoning: str
- additional_guidance: str
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.llm_validator.ValidatorResponse(*, overall_score: float, accuracy: float, completeness: float, consistency: float, reasoning: str, additional_guidance: str, status: int)[source]
Bases:
BaseModel- overall_score: float
- accuracy: float
- completeness: float
- consistency: float
- reasoning: str
- additional_guidance: str
- status: int
- classmethod from_interim(interim: ValidatorInterimResponse, threshold: float = 8.0) ValidatorResponse[source]
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class class_factory.utils.llm_validator.Validator(llm: any, temperature: float = 0.2, log_level: int = 20, system_prompt: str | None = None, human_prompt: str | None = None, tracer: any | None = None, score_threshold: float = 8.0)[source]
Bases:
objectA class for validating responses generated by an LLM (Language Model).
The Validator checks the accuracy, completeness, and relevance of the LLM’s response to ensure it meets the requirements specified in the task prompt. Validation results include a score, status, reasoning, and any additional guidance.
- validate(task_description: str, generated_response: str, task_schema: str = '', specific_guidance: str = '') dict[source]