This lesson covers one of the v2.1 BoK updates — a new emphasis on data governance and IP policy specifically for AI. The exam now explicitly tests your ability to evaluate and update data governance policies for AI requirements.
Traditional data governance focuses on accuracy, access control, retention, and compliance. AI introduces additional requirements:
Data provenance — Where did the training data come from? Can you trace its origin? This matters for legal compliance, bias detection, and regulatory audits.
Data lineage — How has the data been transformed from its source to the training dataset? Every transformation step (cleaning, augmentation, labeling) must be documented.
Representativeness — Does the training data adequately represent all groups the AI will affect? Unrepresentative data leads to biased models.
Purpose limitation — Was the data collected for a purpose compatible with AI training? Using customer data collected for service delivery to train an AI model may violate privacy regulations.
Data quality for AI — AI-specific quality dimensions include: label accuracy (for supervised learning), temporal relevance (is the data current?), and distributional alignment (does the training distribution match the deployment environment?).
AI creates novel IP challenges that the AIGP exam tests from multiple angles:
Training data rights:
- Using copyrighted material to train AI models is legally contested (ongoing lawsuits by NYT, authors, artists)
- Open-source data may have license restrictions on commercial use
- Web-scraped data may violate terms of service
- Personal data used for training requires lawful basis under privacy law
AI-generated content ownership:
- Who owns content generated by AI? The user who prompted it? The organization? The AI company?
- US Copyright Office position: purely AI-generated works are not copyrightable
- Works with significant human authorship that use AI as a tool may be copyrightable
- Organizations need clear policies on IP ownership of AI-assisted work
Trade secret protection:
- Employees inputting trade secrets into third-party AI tools may destroy trade secret status
- AI model weights and training data may themselves be trade secrets
- Reverse engineering risks: model outputs may reveal proprietary training data
The v2.1 BoK specifically requires you to evaluate and update existing data governance policies for AI. Here's a practical framework:
Step 1: Inventory — Identify all data sources used for AI training, validation, and operation.
Step 2: Rights assessment — For each data source, verify: Do we have the right to use this data for AI? Are there license restrictions? Consent requirements?
Step 3: Quality assessment — Evaluate data quality against AI-specific dimensions (representativeness, label accuracy, temporal relevance).
Step 4: Policy gaps — Compare existing data governance policies against AI requirements. Common gaps include: no purpose limitation policy for AI training, no data provenance requirements, no synthetic data governance.
Step 5: Policy updates — Update policies to address identified gaps. Include AI-specific provisions in data classification, retention, access control, and quality assurance policies.