class_factory.concept_web package

Submodules

class_factory.concept_web.ConceptWeb module

ConceptWeb Module

The ConceptWeb module provides tools to automatically extract, analyze, and visualize key concepts from lesson materials, helping to identify connections across topics and lessons. Central to this module is the ConceptMapBuilder class, which leverages a language model (LLM) to identify and structure important ideas and relationships from lesson readings and objectives into a graph-based representation.

Key functionalities of the module include:

  • Concept Extraction:
    • Identifies key concepts from lesson readings and objectives using an LLM.

    • Summarizes and highlights main themes from each lesson’s content.

  • Relationship Mapping:
    • Extracts and maps relationships between identified concepts based on lesson objectives and content.

    • Facilitates understanding of how topics interrelate within and across lessons.

  • Graph-Based Visualization:
    • Constructs a concept map in which nodes represent concepts and edges represent relationships.

    • Generates both interactive graph-based visualizations (HTML) and word clouds for key concepts.

  • Community Detection:
    • Groups closely related concepts into thematic clusters.

    • Helps identify broader themes or subtopics within the lesson materials.

  • Data Saving:
    • Optionally saves intermediate data (concepts and relationships) as JSON files for further review or analysis.

Dependencies

This module depends on:

  • langchain_core: For LLM-based extraction and summarization tasks.

  • networkx: For graph generation and analysis of concept relationships.

  • matplotlib or plotly: For creating visualizations and word clouds.

  • Custom utilities for loading documents, extracting objectives, and handling logging.

Usage Overview

  1. Initialize ConceptMapBuilder: - Instantiate ConceptMapBuilder with paths to project directories, reading materials, and the syllabus file.

  2. Generate the Concept Map: - Use build_concept_map() to process lesson materials, extract and summarize concepts, map relationships, and generate visualizations.

  3. Save and Review: - The generated concept map can be saved as an interactive HTML file or as a static word cloud for easier review and analysis.

Example

from class_factory.concept_web.ConceptMapBuilder import ConceptMapBuilder
from class_factory.utils.load_documents import LessonLoader
from langchain_openai import ChatOpenAI

# Set up paths and initialize components
syllabus_path = Path("/path/to/syllabus.docx")
reading_dir = Path("/path/to/lesson/readings")
project_dir = Path("/path/to/project")
llm = ChatOpenAI(api_key="your_api_key")

# Initialize the lesson loader and concept map builder
lesson_loader = LessonLoader(syllabus_path=syllabus_path, reading_dir=reading_dir, project_dir=project_dir)
concept_map_builder = ConceptMapBuilder(
    lesson_no=1,
    lesson_loader=lesson_loader,
    llm=llm,
    course_name="Sample Course",
    lesson_range=range(1, 5)
)

# Build and visualize the concept map
concept_map_builder.build_concept_map()
class class_factory.concept_web.ConceptWeb.ConceptMapBuilder(lesson_no: int, lesson_loader: LessonLoader, llm, course_name: str, output_dir: str | Path = None, lesson_range: range | int = None, lesson_objectives: List[str] | Dict[str, str] = None, verbose: bool = False, save_relationships: bool = False, **kwargs)[source]

Bases: BaseModel

Orchestrates the extraction, analysis, and visualization of key concepts and their relationships from lesson materials.

Uses a language model (LLM) to summarize content, extract relationships, and build a graph-based concept map. Provides methods for processing lessons, saving intermediate data, and generating interactive visualizations.

load_and_process_lessons(threshold: float = 0.995)[source]

Process lesson materials by summarizing content and extracting concept relationships for each lesson.

Parameters:

threshold (float, optional) – Similarity threshold for extracted concepts. Defaults to 0.995.

For each lesson in lesson_range:
  • Load documents and objectives.

  • Summarize readings using the LLM.

  • Extract relationships between concepts and generates unique concept list.

build_concept_map(directed: bool = False, concept_similarity_threshold: float = 0.995, dark_mode: bool = True, lesson_objectives: Dict[str, str] | None = None) None[source]

Run the full pipeline to generate a concept map and visualization.

Parameters:
  • directed (bool, optional) – Whether to create a directed concept map. Defaults to False.

  • concept_similarity_threshold (float, optional) – Threshold for concept similarity. Defaults to 0.995.

  • dark_mode (bool, optional) – Use dark mode for visualization. Defaults to True.

  • lesson_objectives (Optional[Dict[str, str]], optional) – User-provided lesson objectives. Defaults to None.

class_factory.concept_web.build_concept_map module

Build and analyze concept maps from relationship data.

This module provides functionality to create, analyze and visualize concept maps based on relationships between concepts extracted from educational content.

Functions:

build_graph: Create a weighted graph from concept relationships. detect_communities: Identify concept clusters using various community detection algorithms.

The module supports both directed and undirected graphs, with features including: - Edge weight normalization - Node centrality calculation - Community detection using multiple algorithms (leiden, louvain, spectral) - Visualization preparation with node sizes and community labels

class_factory.concept_web.build_concept_map.build_graph(processed_relationships: List[Tuple[str, str, str]], directed: bool = False) Graph | DiGraph[source]

Build a weighted (directed or undirected) graph from processed concept relationships.

Parameters:
  • processed_relationships (List[Tuple[str, str, str]]) – List of (concept1, relationship, concept2) tuples.

  • directed (bool, optional) – If True, creates a directed graph. Defaults to False.

Returns:

Graph with node and edge attributes for visualization and analysis.

Return type:

nx.Graph | nx.DiGraph

Raises:

ValueError – If relationships are not correctly formatted.

class_factory.concept_web.build_concept_map.detect_communities(G: Graph | DiGraph, method: str = 'leiden', num_clusters: int | None = None) Graph | DiGraph[source]

Detect communities in a concept graph using the specified algorithm.

Parameters:
  • G (nx.Graph | nx.DiGraph) – The input graph.

  • method (str, optional) – Community detection algorithm (‘leiden’, ‘louvain’, ‘spectral’). Defaults to ‘leiden’.

  • num_clusters (int | None, optional) – Number of clusters for spectral clustering. Defaults to None.

Returns:

Graph with ‘community’ node attributes.

Return type:

nx.Graph | nx.DiGraph

Raises:

ValueError – If the specified method is not recognized.

class_factory.concept_web.concept_extraction module

concept_extraction.py

Functions to extract, normalize, and process concept relationships from educational content (lesson readings and objectives).

Features: - Summarizes lesson readings using LLMs and course-specific prompts. - Extracts key concepts and their relationships, guided by lesson objectives. - Normalizes and consolidates concept names using embeddings and inflection. - Prepares structured data for downstream visualization and analysis.

Dependencies: - Language Models: OpenAI GPT or similar, DistilBERT (transformers) - Core Libraries: langchain, torch, inflect

Example

from class_factory.concept_web.concept_extraction import extract_relationships text = “Democracy relies on voting rights…” objectives = “Understand principles of democracy” relationships = extract_relationships(text, objectives, “Political Science”, llm) processed = process_relationships(relationships)

This module is part of the ClassFactory concept mapping pipeline.

class_factory.concept_web.concept_extraction.summarize_text(text: str, prompt: ChatPromptTemplate, course_name: str, llm: Any, parser: StrOutputParser = StrOutputParser(), verbose: bool = False) str[source]

Summarize the provided text using a language model and a structured prompt.

Parameters:
  • text (str) – The text to be summarized.

  • prompt (ChatPromptTemplate) – Prompt template for summarization.

  • course_name (str) – Name of the course for context.

  • llm (Any) – Language model instance.

  • parser (StrOutputParser, optional) – Output parser. Defaults to StrOutputParser().

  • verbose (bool, optional) – Enable detailed logging. Defaults to False.

Returns:

The summary generated by the language model.

Return type:

str

Raises:

ValueError – If validation fails after max retries.

class_factory.concept_web.concept_extraction.extract_relationships(text: str, objectives: str, course_name: str, llm: Any, verbose: bool = False, logger: Logger | None = None) List[Tuple[str, str, str]][source]

Extract key concepts and their relationships from the provided text using an LLM.

Parameters:
  • text (str) – The summarized text.

  • objectives (str) – Lesson objectives for context.

  • course_name (str) – Name of the course.

  • llm (Any) – Language model instance.

  • verbose (bool, optional) – Enable detailed logging. Defaults to False.

  • logger (Optional[logging.Logger], optional) – Logger instance. Defaults to None.

Returns:

List of (concept1, relationship, concept2) tuples.

Return type:

List[Tuple[str, str, str]]

Raises:
  • ValueError – If validation fails after max retries.

  • JSONDecodeError – If response parsing fails.

class_factory.concept_web.concept_extraction.extract_concepts_from_relationships(relationships: List[Tuple[str, str, str]]) List[str][source]

Extract unique concept names from a list of relationships.

Parameters:

relationships (List[Tuple[str, str, str]]) – List of relationship tuples or dicts.

Returns:

Unique concept names.

Return type:

List[str]

class_factory.concept_web.concept_extraction.get_embeddings(concepts: List[str]) Dict[str, Tensor][source]

Generate normalized embeddings for a list of concepts using DistilBERT.

Parameters:

concepts (List[str]) – List of concept strings.

Returns:

Mapping of concept to normalized embedding tensor.

Return type:

Dict[str, torch.Tensor]

class_factory.concept_web.concept_extraction.normalize_concept(concept: str) str[source]

Normalize a single concept string (lowercase, singularize, strip underscores).

Parameters:

concept (str) – Concept string.

Returns:

Normalized concept string.

Return type:

str

class_factory.concept_web.concept_extraction.normalize_for_embedding(concepts: str | List[str]) str | List[str][source]

Normalize one or more concepts for embedding.

Parameters:

concepts (Union[str, List[str]]) – Concept or list of concepts.

Returns:

Normalized concept(s).

Return type:

Union[str, List[str]]

class_factory.concept_web.concept_extraction.normalize_for_output(concept: str) str[source]

Format a concept for output by replacing spaces with underscores and removing ‘is’.

Parameters:

concept (str) – Concept string.

Returns:

Output-formatted concept string.

Return type:

str

class_factory.concept_web.concept_extraction.replace_similar_concepts(existing_concepts: Set[str], new_concept: str, concept_embeddings: Dict[str, Tensor], threshold: float = 0.995) str[source]

Replace a new concept with an existing similar concept if cosine similarity exceeds threshold.

Parameters:
  • existing_concepts (Set[str]) – Set of existing concepts.

  • new_concept (str) – New concept string.

  • concept_embeddings (Dict[str, torch.Tensor]) – Embeddings for all concepts.

  • threshold (float, optional) – Similarity threshold. Defaults to 0.995.

Returns:

Existing or new concept string.

Return type:

str

class_factory.concept_web.concept_extraction.process_relationships(relationships: List[Tuple[str, str, str]], threshold: float = 0.995, max_retries: int = 3) List[Tuple[str, str, str]][source]

Normalize and consolidate relationships by merging similar concepts.

Parameters:
  • relationships (List[Tuple[str, str, str]]) – List of (concept1, relationship, concept2) tuples.

  • threshold (float, optional) – Similarity threshold for merging. Defaults to 0.995.

  • max_retries (int, optional) – Max attempts to resolve duplicates. Defaults to 3.

Returns:

Processed relationships with normalized concepts.

Return type:

List[Tuple[str, str, str]]

class_factory.concept_web.prompts module

Prompts for llm summarizing and relationship extraction

class_factory.concept_web.visualize_graph module

This module provides functions to visualize a concept map generated from processed relationships between concepts. It includes functionalities to create interactive graph visualizations and generate word clouds representing the concepts.

The primary functionalities include: 1. Interactive Graph Visualization: Converts a NetworkX graph into an interactive HTML visualization using pyvis.

The graph can be manipulated dynamically in a web browser, allowing for physics simulations, node filtering, and clustering.

  1. Word Cloud Generation: Creates a word cloud image from a list of concepts, visually representing the frequency of each concept.

Main Functions: - visualize_graph_interactive(G: nx.Graph, output_path: Union[Path, str]) -> None:

Visualizes the given graph interactively using pyvis and saves the result as an HTML file. The nodes are colored based on their community, and the visualization allows for interactive exploration of the graph.

Workflow: 1. Graph Conversion: Converts the provided NetworkX graph into a pyvis graph, applying styles and attributes

like node size and edge width based on centrality and relationship frequency.

  1. Interactive Visualization: Saves the interactive graph as an HTML file, which can be explored in any web browser.

Dependencies: - NetworkX: For graph data structure and manipulation. - Matplotlib: For color mapping and displaying the word cloud. - Pyvis: For creating interactive graph visualizations in HTML.

class_factory.concept_web.visualize_graph.visualize_graph_interactive(G: Graph, output_path: Path | str, directed: bool = False, dark_mode: bool = True, max_nodes: int = 250, centrality_method: str = 'degree', expand_neighbors: bool = True) None[source]

Create an interactive HTML visualization of a concept map using pyvis.

Parameters:
  • G (nx.Graph) – The graph to visualize (with community and text_size attributes).

  • output_path (Union[Path, str]) – Path to save the HTML file.

  • directed (bool, optional) – If True, show edge arrows. Defaults to False.

  • dark_mode (bool, optional) – Use dark background. Defaults to True.

class_factory.concept_web.visualize_graph.filter_graph_by_centrality(G: Graph, max_nodes: int = 250, method: str = 'pagerank', expand_neighbors: bool = True) Graph[source]

Return an induced subgraph containing up to max_nodes most-central nodes.

Parameters:
  • G – original NetworkX graph

  • max_nodes – desired maximum number of nodes in the returned graph

  • method – centrality method to rank nodes (‘pagerank’, ‘degree’, ‘betweenness’)

  • expand_neighbors – if True, after selecting top central nodes, try to include their 1-hop neighbors to preserve local context until max_nodes reached.

Returns:

A copy of the induced subgraph with selected nodes.

Module contents