github.com/gemaraproj/gemara@v1.3.0

test/test-data/good-vector-owasp-mapping.yaml raw

  1# AIGF Risk Vectors to OWASP LLM Top 10 Mapping Document
  2title: AIGF Risk Vectors to OWASP Top 10 for LLMs 2025
  3metadata:
  4  id: AIR-OWASP-MAP-001
  5  version: "0.1.0"
  6  type: MappingDocument
  7  gemara-version: "1.1.0"
  8  description: >
  9    Maps AIGF risk vectors to OWASP Top 10 for LLM Applications 2025
 10    entries where a semantic relationship exists. Vectors without a
 11    direct OWASP counterpart are recorded as no-match.
 12  author:
 13    id: finos
 14    name: FINOS
 15    type: Human
 16  mapping-references:
 17    - id: AIR-VEC
 18      title: AI Governance Framework Risk Vectors
 19      version: "0.1.0"
 20      url: "https://aigf.finos.org/risks"
 21    - id: OWASP-LLM-2025
 22      title: OWASP Top 10 for LLM Applications 2025
 23      version: "2025"
 24      url: "https://genai.owasp.org/llm-top-10/"
 25
 26source-reference:
 27  reference-id: AIR-VEC
 28  entry-type: Vector
 29target-reference:
 30  reference-id: OWASP-LLM-2025
 31  entry-type: Vector
 32remarks: >
 33  AIGF risk vectors mapped to OWASP Top 10 for LLM Applications 2025.
 34  Mappings derived from OWASP references in AIGF risk frontmatter.
 35
 36mappings:
 37  # Information Leakage vectors → LLM02 Sensitive Information Disclosure
 38  - id: MAP-RC001-01-LLM02
 39    source: AIR-RC-001-01
 40    relationship: relates-to
 41    targets:
 42      - entry-id: "LLM02:2025"
 43        rationale: >
 44          Model memorization of sensitive data from training or user
 45          interactions directly contributes to sensitive information
 46          disclosure.
 47
 48  - id: MAP-RC001-02-LLM02
 49    source: AIR-RC-001-02
 50    relationship: relates-to
 51    targets:
 52      - entry-id: "LLM02:2025"
 53        rationale: >
 54          Prompt-based extraction techniques target memorized sensitive
 55          information, a primary mechanism for LLM information disclosure.
 56
 57  - id: MAP-RC001-03-LLM02
 58    source: AIR-RC-001-03
 59    relationship: relates-to
 60    targets:
 61      - entry-id: "LLM02:2025"
 62        rationale: >
 63          Inadequate provider data controls increase the likelihood
 64          of sensitive information disclosure through hosted models.
 65
 66  - id: MAP-RC001-04-LLM02
 67    source: AIR-RC-001-04
 68    relationship: relates-to
 69    targets:
 70      - entry-id: "LLM02:2025"
 71        rationale: >
 72          Deficient provider data handling practices around retention,
 73          encryption, and deletion expose sensitive information.
 74
 75  - id: MAP-RC001-05-LLM02
 76    source: AIR-RC-001-05
 77    relationship: relates-to
 78    targets:
 79      - entry-id: "LLM02:2025"
 80        rationale: >
 81          Fine-tuning with proprietary data embeds sensitive information
 82          in model weights, creating persistent disclosure risk.
 83
 84  # Data Poisoning vectors → LLM04 Data and Model Poisoning
 85  - id: MAP-SEC009-01-LLM04
 86    source: AIR-SEC-009-01
 87    relationship: relates-to
 88    targets:
 89      - entry-id: "LLM04:2025"
 90        rationale: >
 91          Training data manipulation through label changes or crafted
 92          data points is a direct form of data and model poisoning.
 93
 94  - id: MAP-SEC009-02-LLM04
 95    source: AIR-SEC-009-02
 96    relationship: relates-to
 97    targets:
 98      - entry-id: "LLM04:2025"
 99        rationale: >
100          Exploiting continuous learning pipelines to feed misleading
101          information is an ongoing form of data poisoning.
102
103  - id: MAP-SEC009-03-supplychain
104    source: AIR-SEC-009-03
105    relationship: relates-to
106    targets:
107      - entry-id: "LLM03:2025"
108        rationale: >
109          Compromise of third-party data feeds represents a supply chain
110          vulnerability that introduces poisoned data into AI systems.
111      - entry-id: "LLM04:2025"
112        rationale: >
113          Compromise of third-party data feeds represents a supply chain
114          vulnerability that introduces poisoned data into AI systems.
115
116  - id: MAP-SEC009-04-poisoning
117    source: AIR-SEC-009-04
118    relationship: relates-to
119    targets:
120      - entry-id: "LLM04:2025"
121        rationale: >
122          Deliberate bias introduction through data poisoning corrupts
123          model decision-making and produces discriminatory outputs.
124      - entry-id: "LLM05:2025"
125        rationale: >
126          Deliberate bias introduction through data poisoning corrupts
127          model decision-making and produces discriminatory outputs.
128
129  # Model Availability vectors → LLM10 Unbounded Consumption
130  - id: MAP-OP007-01-LLM10
131    source: AIR-OP-007-01
132    relationship: relates-to
133    targets:
134      - entry-id: "LLM10:2025"
135        rationale: >
136          Denial of Wallet attacks exploit unbounded consumption through
137          excessive token usage, long prompts, or poorly throttled
138          agentic systems.
139
140  - id: MAP-OP007-02-NOMATCH
141    source: AIR-OP-007-02
142    relationship: no-match
143    remarks: >
144      TSP outage or degradation is an infrastructure availability
145      risk with no direct OWASP LLM Top 10 counterpart; it concerns
146      provider operational maturity rather than LLM-specific
147      vulnerabilities.
148
149  - id: MAP-OP007-03-LLM10
150    source: AIR-OP-007-03
151    relationship: relates-to
152    targets:
153      - entry-id: "LLM10:2025"
154        rationale: >
155          VRAM exhaustion from configuration changes, caching, or memory
156          leaks is a resource exhaustion condition aligned with unbounded
157          consumption.
158
159  # Prompt Injection vectors → LLM01 Prompt Injection
160  - id: MAP-SEC010-01-LLM01
161    source: AIR-SEC-010-01
162    relationship: relates-to
163    targets:
164      - entry-id: "LLM01:2025"
165        rationale: >
166          Direct prompt injection (jailbreaking) is the primary attack
167          pattern described in LLM01.
168
169  - id: MAP-SEC010-02-injection
170    source: AIR-SEC-010-02
171    relationship: relates-to
172    targets:
173      - entry-id: "LLM01:2025"
174        rationale: >
175          Indirect prompt injection via poisoned third-party content
176          is covered in LLM01 and can hijack multi-agent decision-making
177          aligning with LLM06 excessive agency risks.
178      - entry-id: "LLM06:2025"
179        rationale: >
180          Indirect prompt injection via poisoned third-party content
181          is covered in LLM01 and can hijack multi-agent decision-making
182          aligning with LLM06 excessive agency risks.
183
184  - id: MAP-SEC010-03-probing
185    source: AIR-SEC-010-03
186    relationship: relates-to
187    targets:
188      - entry-id: "LLM01:2025"
189        rationale: >
190          Model profiling and inversion use prompt injection techniques
191          to probe internal model structure and extract proprietary
192          system prompts and configurations.
193      - entry-id: "LLM07:2025"
194        rationale: >
195          Model profiling and inversion use prompt injection techniques
196          to probe internal model structure and extract proprietary
197          system prompts and configurations.
198
199  # Model Overreach → LLM06 Excessive Agency
200  - id: MAP-OP018-LLM06
201    source: AIR-OP-018
202    relationship: relates-to
203    targets:
204      - entry-id: "LLM06:2025"
205        rationale: >
206          Model overreach and expanded use beyond validated scope aligns
207          with excessive agency where AI systems operate beyond intended
208          boundaries.
209
210  # Reputational Risk → LLM09 Misinformation
211  - id: MAP-OP020-LLM09
212    source: AIR-OP-020
213    relationship: relates-to
214    targets:
215      - entry-id: "LLM09:2025"
216        rationale: >
217          AI-generated offensive, misleading, or inaccurate outputs that
218          damage reputation are a manifestation of LLM misinformation
219          risks.