Glossary - KI:connect / RWTHgpt
Affiliation
A standardized Shibboleth attribute following the eduPerson schema that describes the type of affiliation a person has with an institution (e.g., student@uni-example.de). Used for role-based access control in federated infrastructure.
Agentic AI / AI Agent
An AI system that autonomously plans and executes multi-step tasks by invoking tools, evaluating intermediate results, and iterating toward a goal. Unlike simple function calling, an AI agent operates autonomously across multiple steps.
Application Programming Interface (API)
A standardized programming interface that defines rules and protocols enabling software applications to communicate with each other. AI services typically provide a REST-based API through which models can be accessed programmatically.
API Call
A single HTTP request sent to an API endpoint, containing the model name, conversation history, and optional parameters (e.g., Temperature, Top P), receiving a completion as a response. An API call is the atomic unit of interaction with an AI API and forms the basis for billing and monitoring.
API Endpoint
The URL of an AI service provider through which a model is accessed via API. Endpoints are typically secured with an API key and define which models and functions are accessible.
API Key
A secret access credential passed for authentication against an API. The API key identifies the caller, enables cost attribution, and controls access rights. It should never be stored in publicly accessible code.
Authentication and Authorisation Infrastructure (AAI / DFN-AAI)
A federated identity management system for research and education institutions. In Germany, the AAI is operated by the DFN-Verein and enables cross-institutional Single Sign-On based on the SAML protocol.
Cached Prompt / Prompt Caching
A mechanism in which frequently reused inputs (e.g., system prompts) are pre-processed and cached. Cached prompts are billed at a lower rate than regular input tokens.
Chatbot
A computer-based dialogue system that conducts natural-language conversations with users. Modern chatbots are based on large language models (LLMs) and can respond contextually to questions and instructions.
Chunk / Chunking
A method of splitting documents into smaller text segments (chunks) for indexing in a vector database. Chunk size affects retrieval quality: smaller chunks increase precision, while larger ones provide more context. This decision is made at indexing time and cannot be changed retroactively.
Completion
The response generated by an AI model in reply to a prompt. The term originates from common LLM API terminology and describes the model's process of "completing" a given input.
Context Window
The maximum amount of text (measured in tokens) that a model can consider in a single processing step. The context window encompasses the system prompt, all previous conversation turns, and the current input. If the limit is exceeded, older parts of the conversation may no longer be taken into account.
Data Processing Region (Region)
The geographic area in which an AI model is operated and input data is processed. Important: With some providers, the data storage location (data residency) may differ from the actual processing location – data may, for example, be stored in Europe but processed globally.
Deployer (EU AI Act)
A natural or legal person that deploys an AI system under its own authority (Art. 3(4) EU AI Act). In the university context, the respective university is typically the deployer, even if the technical infrastructure is operated by a central service provider. The deployer bears transparency and information obligations toward users.
Digital Sovereignty
The ability of an organization to independently exercise control over its digital infrastructure, data, and processes. In the university context, digital sovereignty encompasses three areas of action: AI competence, self-operated infrastructure for inference and training, and self-controlled applications.
eduPersonEntitlement
A Shibboleth attribute that describes a person's rights or roles as a URN-encoded string (e.g., urn:geant:dfn.de:...). Enables fine-grained access control independent of institutional boundaries.
Embedding
A numerical vector representation of text or other content that encodes its semantic meaning. Texts with similar meaning are located close to each other in vector space. Embeddings form the basis for similarity search (e.g., RAG) and classification tasks.
Fine-Tuning
Continued training of a pre-trained AI model on a task-specific dataset to specialize it for a particular domain or task. Fine-tuning permanently modifies the model weights.
Frequency Penalty
An inference parameter that reduces the probability that the model will reuse tokens it has already generated. Higher values promote more varied phrasing in the output.
Function Calling
The ability of certain AI models to invoke external functions or tools (e.g., web search, code execution) in a structured manner. The model produces a machine-readable function call rather than free text, which is then processed by the calling application.
Generative AI Model
An AI system that generates new content—such as text, images, audio, or code—from inputs using probabilistic models and learned training data. It learns patterns from training data and produces statistically plausible new outputs.
GPT (Generative Pre-trained Transformer)
A class of AI language models based on the transformer architecture, pre-trained on large text corpora. GPT models learn statistical patterns of natural language through unsupervised pre-training and can be used for many tasks without task-specific fine-tuning. GPT denotes a model architecture and is not synonymous with AI chatbots in general.
Grounding
A technique for anchoring AI outputs in verifiable sources, e.g., through RAG or web search. Grounding reduces hallucinations by having the model base its responses on concrete, verifiable information.
Hallucination
A phenomenon in which an AI model outputs factually incorrect or fabricated information in a seemingly credible manner. Hallucinations occur when the model produces plausible-sounding but factually incorrect output based on its statistical patterns.
Identity Provider (IdP)
A service that manages user identities and confirms them to other services (service providers). In the university context, the IdP is typically the identity management system of the respective institution.
Image-Generative Model
An AI model that generates images from textual input (prompts). Image-generative models typically rely on diffusion-based processes or transformer architectures.
Inference
The process by which a trained AI model processes an input and generates an output. Unlike training—where the model learns—inference applies previously learned knowledge. GPU resources for inference are a central planning criterion for AI platforms.
Input Tokens
Tokens derived from the user's prompt, the system prompt, and the entire conversation history so far. The number of input tokens substantially determines the resource consumption and cost of a request.
Knowledge Base
A structured or unstructured collection of documents, data, or information provided to an AI model as additional context via RAG. Knowledge bases enable domain-specific responses without model retraining.
Large Language Model (LLM)
An AI model trained on large quantities of text data to enable human-like language understanding and generation. LLMs can perform tasks such as text comprehension, text generation, translation, and code synthesis.
MCP (Model Context Protocol)
An open standard for connecting external tools, data sources, and services to AI models in a standardized way. MCP defines a uniform protocol through which applications (e.g., IDEs, CLI tools) can provide context information and tools to AI models.
Multimodality / Multimodal Model
The ability of an AI model to jointly process different input data types (e.g., text and images). Multimodal models can, for example, describe images, analyze documents, or jointly evaluate combined text and image inputs.
Open Source AI Model
An AI model whose weights (and optionally training code and training data) are made publicly available. Open-source models can be operated on own hardware, supporting digital sovereignty and data privacy compliance. Examples: Llama, Mixtral. Note: Many models labeled “open source” are strictly speaking open-weight only (weights available, but restrictive license).
Output
The response generated by an AI model based on a prompt. Depending on the model type, the output may comprise text, images, audio, or structured data.
Output Tokens
Tokens generated by the model as a response (completion). Output tokens are typically billed at a higher rate than input tokens, as their generation is more computationally intensive.
Presence Penalty
An inference parameter that reduces the probability that the model will repeat topics or concepts already present in the output. Unlike Frequency Penalty, this parameter responds to the mere presence of a topic rather than the repetition frequency of individual tokens.
Prompt / Input
The text input submitted by a user or system to an AI model in order to obtain a response. A prompt may contain questions, instructions, examples, or contextual information and substantially influences the quality of the generated output.
Prompt Engineering
The systematic process of designing and optimizing prompts in order to guide an AI model toward producing precise and useful outputs. Techniques include few-shot examples, role instructions, and chain-of-thought prompting, among others.
Reasoning / Reasoning Model
AI models that perform an internal thinking and planning step ("chain of thought") before generating a response. Reasoning models are particularly well suited for complex, multi-step tasks and consume additional so-called reasoning tokens, which may be billed separately.
Reasoning Tokens
Additional tokens consumed internally by reasoning models during their thinking and planning step before generating a response. Reasoning tokens are not visible to end users but are taken into account for billing purposes.
Retrieval-Augmented Generation (RAG)
An architectural principle in which an AI model retrieves relevant information from a knowledge base (retrieval) before generating a response and incorporates it into the context. RAG enriches the model with current or domain-specific information without requiring retraining. RAG prompts are typically significantly longer than standard prompts.
Shibboleth
A widely used open-source framework for federated identity management based on the SAML protocol. Enables cross-institutional Single Sign-On and is widely deployed across the German higher education landscape.
Single Sign-On (SSO)
An authentication method that allows users to log in once with their home institution credentials to gain access to multiple services without having to authenticate separately for each service.
Streaming
A transmission method in which the response of an AI model is sent to the client token by token in real time, rather than waiting for the complete response. Streaming reduces perceived latency and improves the user experience for longer responses.
System Prompt (System Message)
A predefined instruction passed to an AI model at the start of each conversation, typically not visible to end users. The system prompt defines the model's behavior, tone, and scope, enabling application-specific customization without model training.
Temperature
An inference parameter that controls the randomness or creativity of the model's output. Low values (e.g., 0) produce more deterministic, focused responses; higher values (e.g., 1–2) promote more varied and creative outputs. Appropriate values depend on the model and use case.
Tenant
In a multi-tenant architecture, a logically separated organizational unit (e.g., a university) that uses the same service but maintains its own configuration, user data, and billing units. Multi-tenancy enables shared platform operation for multiple institutions with strict data separation.
Token
The smallest processing unit of a language model. Tokens correspond to words, word fragments, or punctuation marks. The token count of a request determines the resource and cost requirements of processing. A rough rule of thumb: 1,000 tokens correspond to approximately 750 words in English (or roughly 500–600 words in German due to longer compound words).
Token Limit
A collective term for model-specific token limits. The context window limit caps input and output combined; the output limit (max_tokens) caps only the generated response. Requests exceeding a limit are either rejected or truncated.
Top P (TopP)
An inference parameter, also called "nucleus sampling", that controls output variability. The model considers only the most probable tokens during selection until their cumulative probability reaches the Top P value. Typically used as an alternative to Temperature.
Vector Database
A specialized database optimized for the efficient storage and retrieval of high-dimensional vectors (embeddings). Vector databases form the technical foundation for Retrieval-Augmented Generation (RAG) and semantic search. Examples: Qdrant, Milvus, Pinecone.
Vector Dimension (Dimensions)
The number of dimensions of an embedding vector. Higher dimensionality allows finer encoding of semantic nuances but increases memory and computational requirements. The number of dimensions is model-specific and must be identical at indexing time and at query time.

