China Health Data De-identification Framework

Overview

China has rapidly developed a comprehensive data protection framework in recent years, with specific provisions addressing health data. The framework is characterized by stringent requirements, particularly for sensitive personal information like health data, and places a strong emphasis on data localization and security.

Recent Implementation Timeline

July 2021: Data Security Law took effect
November 2021: Personal Information Protection Law (PIPL) implemented
September 2022: Revised Measures for Cybersecurity Review implemented
February 2023: Measures for Standard Contracts for Cross-border Transfer of Personal Information issued

Legal Framework

China's health data protection and de-identification framework is built upon several key laws and regulations:

Primary Legislation

Personal Information Protection Law (PIPL): Implemented on November 1, 2021, this is China's first comprehensive, national-level personal information protection law. It establishes strict requirements for processing sensitive personal information, including health data.
Cybersecurity Law: Effective since June 1, 2017, provides foundational data security requirements including provisions relevant to health data.
Data Security Law: Effective since September 1, 2021, establishes a framework for data classification and protection based on importance.

Reference Links:

Personal Information Protection Law (English translation): http://www.npc.gov.cn/englishnpc/c23934/202112/1abd8829788946ecab270e469b13c39c.shtml
Cyberspace Administration of China (CAC): http://www.cac.gov.cn/english/
National Health Commission of China: http://en.nhc.gov.cn/

Health-Specific Regulations

Measures for the Management of Health and Medical Big Data (2016): Specific provisions for handling healthcare data, including requirements for data collection, storage, and use.
Administrative Measures for Standards, Safety and Services of National Health and Medical Big Data (2018): Technical standards for health data security and management.
Regulations on the Administration of Human Genetic Resources (2019): Governs genetic data collection, storage, and cross-border transfer.
Measures for the Management of Population Health Information (2014): Regulates the collection and use of population health information.

Example: Scope of Health Data Regulation

Under China's framework, regulated health data includes:

Medical records and treatment histories
Disease diagnostic information
Genetic testing results
Biometric data related to health
Health monitoring data from wearable devices
Prescription and medication information
Health insurance claims data

Key Concepts and Classifications

Classification of Health Data

Under the PIPL, health data is explicitly classified as "Sensitive Personal Information" (SPI), which is defined as:

"Personal information that, if leaked or illegally used, may cause discrimination against individuals or seriously endanger personal or property security, including information on biometric characteristics, religious beliefs, specially-designated status, medical health, financial accounts, individual location tracking, etc."

Article 28 of the PIPL specifically requires that processors of sensitive personal information:

Have a specific purpose and sufficient necessity for processing
Implement strict protection measures
Inform individuals about the necessity of processing and potential impacts
Obtain separate consent (or other legal basis) for processing

De-identification Concepts

Chinese law distinguishes between two levels of de-identification:

Concept	Definition	Legal Status	Examples
De-identification (去标识化)	Processing of personal information so that it cannot be identified without additional information	Still considered personal information and subject to PIPL	Replacing patient names with codes; removing ID numbers but keeping a separate key
Anonymization (匿名化)	Processing of personal information so that the identification of specific individuals is impossible and the processed information cannot be restored	Not considered personal information and falls outside PIPL scope	Aggregating patient data into statistical reports; irreversibly hashing identifiers

Reference:

The distinction between these concepts is detailed in Article 73.1 of the PIPL: http://www.npc.gov.cn/npc/c30834/202108/a8c4e3672c74491a80b53a172bb753fe.shtml

Special Requirements for Health Data

As Sensitive Personal Information, health data is subject to heightened protection requirements:

1. Enhanced Consent Requirements

Processing health data requires separate, specific, and explicit consent
The purpose, method, and scope of processing must be clearly explained
Individuals must be informed of the necessity of processing health data
Parental consent is required for processing health data of minors under 14

Example: Consent Form Requirements

A compliant health data consent form in China must include:

Specific description of health data to be collected
Explicit purpose of collection (e.g., "for diabetes treatment monitoring")
Retention period (e.g., "data will be retained for 5 years after your last visit")
Security measures implemented to protect the data
Individual rights to access, correct, and delete their data
Separate checkbox specifically for health data consent
Clear language avoiding technical or legal jargon

2. Impact Assessments

Personal Information Protection Impact Assessments (PIPIAs) are mandatory before processing health data
Records of processing activities must be kept for at least three years
Assessments must evaluate risks and implement corresponding protection measures
Results must be available for regulatory inspection

Case Study: Hospital PIPIA Process

A major Shanghai hospital implemented a standardized PIPIA process for its new patient management system that includes:

Cataloging all health data collected and processed
Identifying potential risks of data breaches or misuse
Evaluating necessity of each data element collected
Implementing technical safeguards including encryption and access controls
Establishing a data breach response plan
Documenting the entire assessment process for regulatory compliance

3. Localization Requirements

Critical Information Infrastructure operators must store health data collected within China's territory
Personal information processors handling large volumes of health data must store it within China
Cross-border transfers of health data typically require a security assessment by the Cyberspace Administration of China (CAC)
Organizations must pass a security assessment, obtain certification, or sign standard contractual clauses approved by the CAC for cross-border transfers

Reference:

Measures for Security Assessment of Cross-border Data Transfer (effective September 1, 2022): http://www.cac.gov.cn/2022-07/07/c_1658811536396503.htm

Technical Standards for De-identification

Several national standards provide technical guidance for de-identification in China:

Information Security Technology Standards

GB/T 35273-2020: Personal Information Security Specification - Provides detailed technical requirements for personal information security, including de-identification methods.
GB/T 37964-2019: Information Security Technology - Guide for De-identifying Personal Information - Offers specific technical guidance on de-identification processes.
GB/T 39335-2020: Information Security Technology - Personal Information Security Impact Assessment Guide - Details how to conduct impact assessments for personal information processing.
GB/T 41479-2022: Information Security Technology - Technical Requirements for the Protection of Personal Information in Health and Medical Big Data Processing - Specific requirements for health data.

Reference:

National Standards can be accessed through the Standardization Administration of China: http://www.sac.gov.cn/

The technical standards recommend several de-identification methods:

Method	Description	Example in Health Context
Data Masking	Replacing identifiers with special characters or placeholders	Replacing "Zhang Wei, ID: 310104198012121234" with "Zhang , ID: 310104******1234"
Pseudonymization	Replacing direct identifiers with artificial identifiers or pseudonyms	Replacing patient names with randomly generated codes (e.g., "Patient-A12B34")
Generalization	Reducing precision of data (e.g., converting exact age to age ranges)	Changing "43 years old" to "40-45 years old"; changing "Shanghai Pudong District" to "Shanghai"
Data Aggregation	Combining data to prevent individual identification	Reporting "15% of patients experienced side effect X" rather than individual patient reactions
Data Perturbation	Adding statistical noise to data	Slightly adjusting laboratory values within clinically insignificant ranges
K-anonymity	Ensuring each record is indistinguishable from at least k-1 other records	Ensuring any combination of quasi-identifiers applies to at least 5 patients in the dataset

Example: De-identification Process for Clinical Trial Data

A pharmaceutical company conducting clinical trials in China implemented this de-identification process:

Removal of direct identifiers (names, ID numbers, contact information)
Replacement of patient IDs with study-specific codes
Generalization of demographic data (age ranges instead of exact ages)
Removal of rare disease information that could enable identification
Generalization of location data to city level only
Removal of exact dates, replacing with study day numbers
Implementation of access controls for the pseudonymization key

Enforcement and Penalties

China's framework includes strict enforcement mechanisms for violations:

Fines of up to 50 million RMB (approximately 7.7 million USD) or 5% of annual revenue for serious violations
Personal liability for responsible individuals, including fines and disqualification from certain positions
Potential criminal penalties for severe violations
Suspension or termination of applications or services
Inclusion in the Social Credit System, affecting an organization's ability to operate in China

Enforcement Example: Health App Penalties

In April 2022, the CAC announced penalties against multiple mobile health applications for excessive collection of personal information and failure to properly de-identify health data. Violations included:

Collection of health data beyond the scope necessary for service provision
Failure to obtain explicit consent for health data processing
Insufficient de-identification measures for shared health data
Inadequate security measures for sensitive personal information

Penalties included fines, mandatory rectification within specified timeframes, and temporary removal from app stores.

Reference:

CAC Enforcement Announcements: http://www.cac.gov.cn/2022-04/25/c_1651927474338504.htm

Recent Developments

China continues to evolve its approach to health data protection:

Health Code Systems: Following COVID-19, China developed health code systems that raised new questions about health data protection and de-identification
Medical Big Data Centers: The 14th Five-Year Plan (2021-2025) includes initiatives to establish national and regional medical big data centers with standardized de-identification requirements
National Healthcare Security Administration Guidelines: New guidelines on health insurance data security and sharing issued in 2022
Cross-border Health Data Transfer Rules: Specific rules for cross-border transfers of health and genetic data implemented in 2022
Genetic Data Protection: Enhanced regulations on genetic data collection and use

Reference:

14th Five-Year Plan for National Health: http://www.gov.cn/zhengce/zhengceku/2022-05/20/content_5691905.htm

National Healthcare Security Administration: http://www.nhsa.gov.cn/

How It Compares to HIPAA Safe Harbor

China's approach differs from HIPAA Safe Harbor in several key ways:

Regulatory Approach: More outcomes-based than prescriptive about specific identifiers to remove
Consent Requirements: Places stronger emphasis on consent for processing de-identified data
Cross-border Transfers: Imposes stricter limitations on cross-border transfers of health data
Data Localization: Has stronger data localization requirements
Penalties: Includes significantly higher penalties for violations (up to 5% of annual revenue vs. HIPAA's maximum of $1.5 million per year)
Integration: Integrates health data protection more explicitly into a broader data security framework
State Security: Emphasizes state security considerations alongside individual privacy protection
Technical Standards: Provides more detailed technical standards for de-identification through national standards

Key Practical Difference Example

A multinational pharmaceutical company conducting clinical trials in both the US and China would need to:

In the US (HIPAA): Remove 18 specific identifiers to qualify for Safe Harbor, with limited restrictions on data transfer
In China (PIPL): Implement more comprehensive de-identification, store data locally, conduct impact assessments, obtain CAC approval for any cross-border transfers, and implement stricter security measures