fbpx

Oklahoma Bar Journal

Processing Health Records With AI Under HIPAA

By Jason T. Seay, Philip D. Hixon and Richard M. Cella

Artificial intelligence tools, including large language models, are quickly transforming how health care organizations process and analyze vast amounts of clinical data. Health care information has been exchanged by health care entities in digital format for over 30 years, and most electronic health records are highly structured, making them easily processable by computers. AI holds the promise to make current data operations even faster and more efficient. From predictive modeling to workflow automation, AI offers insights and turnaround times that were previously impossible for human teams alone. Many health care entities use AI as part of their data analytics operations to conduct payment and coding audits, automate prior authorization processes and directly interface with patients. Some entities are actively exploring the use of generative AI for predictive modeling, which holds the promise of new diagnostic insights that may increase provider efficiency and ultimately lower health care costs.

The upside that AI offers for health care data processing is not without risk. Industry experts have expressed concern that patient privacy rights may not be adequately protected when processing data with AI.[1] Some have expressed concern that AI processing of patient information may lead to the automated, inadvertent redisclosure of patient personally identifiable information in AI output.[2] Health care data often includes protected health information (PHI), which is governed by the strict privacy and security requirements of the Health Insurance Portability and Accountability Act of 1996 and related statutes and regulations[3] (collectively HIPAA). PHI, by definition, contains personally identifiable information of patients.

Processing PHI with AI tools can raise serious HIPAA compliance risks if proper safeguards are not implemented. Data should generally be processed only for payment, treatment and health care purposes as permitted by HIPAA. For processing activities falling outside these categories, it may be necessary to create de-identified data sets for processing. This article presents the general parameters under which covered entities may use AI to process PHI under HIPAA. For processing activities falling outside these general parameters, this article generally explains the process by which PHI may be de-identified under HIPAA for such uses.

WHAT IS PHI, COVERED ENTITY AND BUSINESS ASSOCIATE?

Under HIPAA, individually identifiable health information[4] or PHI[5] includes demographic data tied to an individual’s past, present or future physical or mental health, health care services received or payment related to health care services that either identify an individual or provide a reasonable basis to believe can be used to identify the individual. Such information held or transmitted in any form or media by a “covered entity” or “business associate” is protected under the privacy rule. “Covered entity” under the privacy rule means a health plan, a health care clearinghouse or a health care provider that transmits any health information in electronic form in connection with certain transactions under the rule.[6] “Business associates” include persons or organizations that perform functions on behalf of covered entities where such functions involve the use or disclosure of PHI.[7]

GENERAL USE RESTRICTIONS ON PROCESSING PHI

In the absence of express patient consent for specific processing activities, the privacy rule generally permits a covered entity to only use or disclose PHI for treatment, payment or health care operations as set forth in the regulations, with some limited exceptions.[8] Treatment is broadly defined as the provision, coordination or management of health care and related services.[9] Payment includes a range of activities from health plan coverage activities, risk adjusting, billing, claim adjustment, collection activities and receipt of payment.[10] Health care operations include, among other things, a host of internal (and sometimes external) quality assessment and improvement activities.[11] The permissible uses and disclosures may be for purposes of the covered entity’s own activities or for those of another covered entity having a relationship with the individual whose information is used or disclosed.[12]

Covered entities are generally limited to processing PHI for these purposes, unless a patient consents in writing to other uses of their PHI.[13] Many uses of AI can be adequately categorized as a treatment, payment or health care operation because the regulations are agnostic as to the manner of permissible processing. However, some use cases of AI do not squarely fall within them. An example may arise where PHI from multiple patients is combined into a single data set, which is then used to train AI to perform automated diagnostic functions. Another example is where such data sets are used to create data analytics[14] used for purposes unrelated to treatment, payment or health care operations – such as finding potential participants in a clinical study or trial. For such uses, it may be necessary to first de-identify the data before it is processed.

DE-IDENTIFICATION OF PHI

Generally, “de-identification” is a process whereby all personally identifiable information, and information that could be used to identify an individual, is removed from the data set to be processed. De-identified health information may typically be used or shared without restriction because such data is no longer considered PHI under HIPAA regulations. It generally neither identifies nor provides a reasonable basis to identify an individual.[15] HIPAA regulations permit the de-identification of PHI through either formal determination by a qualified expert or by the removal of specified identifiers.[16]

The Expert Determination Method

The first method requires a qualified individual to have appropriate knowledge of and experience with statistical and scientific principles for rendering information not individually identifiable. The expert must apply such methods to determine that the risk is “very small” that the information could be used to identify an individual. In making the determination, the expert must document their methods and results, justifying the decision.[17]

Experts need not be connected to the health care field and may come from statistical, mathematical or other scientific domains.[18] Considerations on the identification risk are fact-dependent, and an acceptable “very small” risk is based on the ability of an anticipated recipient to identify an individual in each circumstance. Some principles relied on to determine risk include replicability, data source availability and distinguishability.[19] The expert is often looking at the degree to which a data set can be “linked” to a data source that reveals the identity of the corresponding individuals.[20] At times, the expert may recommend modifying a data set in order to mitigate the risk. Such modifications include adjusting certain features or values in the data to ensure that any unique, identifiable elements no longer exist.[21]

The Safe Harbor Method

The second “safe harbor” method requires the removal of 18 specific identifiers[22] (see the sidebar) in combination with a covered entity having no “actual knowledge” that any remaining information could be used to re-identify an individual.[23] The identifiers of the individual, as well as those of the relatives, employers or household members of the individual, must be removed.

“Actual knowledge” in this context means “clear and direct knowledge that the remaining information could be used, either alone or in combination with other information, to identify an individual who is a subject of the information.”[24] Essentially, if the covered entity is not aware that the de-identified information can be used, alone or in conjunction with other data, to identify an individual, then it will likely not have “actual knowledge” of a de-identification risk.

Both de-identification methods yield data that retains some risk of identification. Yet, regardless of the method, the privacy rule does not restrict the use or disclosure of de-identified health information because the de-identified data is no longer considered PHI under HIPAA.[25] Data sets de-identified under HIPAA regulations may therefore be used for purposes unrelated to treatment, payment or health care operations.

OTHER CONSIDERATIONS

This article provides a brief overview of HIPAA compliance risk associated with AI-related processing. It does not substantively address contractual and other regulatory risks related to AI processing of health information.

Contractual obligations between or among covered entities, business associates and subcontractors may either prohibit de-identification altogether or condition de-identification on consent of an upstream entity, such as the covered entity. Such restrictions may be contained in the parties’ business associate agreement or in the substantive services agreement. For example, many private payors place restrictions on a data recipient’s ability to de-identify data. Many health information exchanges, which are used to exchange PHI between covered entities, also impose contractual restrictions on the use and processing of PHI. Before undertaking de-identification, the covered entity, business associate or subcontractor should confirm that its contract does not contain a prohibition on de-identification and, likewise, does not restrict processing of de-identified information with AI.

The use of AI to process health information can potentially result in a reportable breach.[26] By way of example, a somewhat analogous scenario exists with regard to the use of protected attorney-client or work product information, with any protection being forfeited when used in conjunction with large language model AI.[27] Before using AI to process PHI in connection with a permissible use, an evaluation of the AI tool is necessary to confirm that it is HIPAA compliant.

The use of AI to process either PHI or de-identified health information may also trigger other obligations under HIPAA, such as patient authorization and consent issues[28] or amendment of privacy practice notice documents,[29] which are outside the scope of this discussion. It may be necessary to establish a specialized compliance function within an entity to address these issues.

THE SAFE HARBOR METHOD

The “safe harbor” method requires the removal of 18 specific identifiers in combination with a covered entity having no “actual knowledge” that any remaining information could be used to re-identify an individual (see endnote 22). The identifiers are:

1) Names;

2) Geographic subdivisions smaller than a state (street address, city, county, precinct, zip code, equivalent geocodes) except for the initial three digits of a zip code, subject to geographic units above or below 20,000 people;

3) All elements of dates except year directly related to an individual (birth, admission, discharge, death) and all ages over 89, including all elements of dates indicative of age, except such ages and elements may be aggregated into a category of 90 or older;

4) Phone numbers;

5) Fax numbers;

6) Email addresses;

7) Social security numbers;

8) Medical record numbers;

9) Health plan beneficiary numbers;

10) Account numbers;

11) Certificate/license numbers;

12) Vehicle identifiers and serial numbers, including license plates;

13) Device identifiers and serial numbers;

14) URLs;

15) IP addresses;

16) Biometric identifiers, including voice and fingerprints;

17) Full-face photographic images and comparable images; and

18) Any other unique identifying numbers, characteristics or codes.

Authors’ Note: The authors acknowledge the assistance of Mallory Duncan, a 2026 J.D. candidate at the OU College of Law, whose research contributed to this article.  


ABOUT THE AUTHORS

Jason T. Seay is a certified information privacy professional (CIPP/US). He is of counsel with the law firm of GableGotwals, where he maintains a regulatory and transactional practice focused on health care law as well as data privacy, security and governance.

 

 

 

 

 

Philip D. Hixon is a shareholder with the law firm of GableGotwals, where he focuses on health care law and civil litigation. Mr. Hixon served as editor-in-chief of the third edition of Oklahoma Civil Procedure Forms and Practice (3d ed. 2024, Matthew Bender). He also represents District 6 on the OBA Board of Governors.

 

 

 

 

 

Richard M. Cella is a former federal prosecutor and shareholder at the law firm of GableGotwals, where he represents businesses, including health care companies, in complex commercial litigation, regulatory enforcement matters and internal investigations.

 

 

 

 

 


ENDNOTES

[1] See, e.g., Blake Murdoch, “Privacy and Artificial Intelligence: Challenges for Protecting Health Information in a New Era,” BMC Medical Ethics 22, No. 122 (2021), available at https://bit.ly/4rApnZB.

[2] Lauren Quinn, “Are Your Secrets Safe?: Imposing a Fiduciary Duty on Healthcare AI Developers Dealing with Sensitive Health Information,” 94 Fordham L. Rev. 383, 400 (2025), available at https://bit.ly/3ZI0wac (noting that “a[s] certain types of healthcare AI aim to identify patterns of predict predispositions, they may generate new health information that individuals do not want disclosed.”); see also W. Nicholson Price II, “Problematic Interactions Between AI and Health Privacy,” 2021 Utah L. Rev. 925, 928 (2021) (noting that “by enabling accurate and sophisticated inferences about health information from large sets of data that are not obviously tied to health, AI reduces the efficacy of trying to protect (or even identify what counts as) ‘health data.’”).

[3] See, e.g., Health Information Technology for Economic and Clinical Health Act of 2009 (HITECH), Pub. L. 111-5, Tit. XIII, §§13001-13424 (Feb. 17, 2009) (enhancing HIPAA’s privacy, security and breach-notification requirements); see also Cal. Civ. Code §56.05 et seq. (Confidentiality of Medical Information Act); N.Y. Pub. Health Law §18 (patient access to records); Tex. Health & Safety Code §181.101 et seq. (Texas Medical Records Privacy).

[4] 45 C.F.R. §160.103, “Individually Identifiable Health Information.”

[5] 45 C.F.R. §160.103, “Protected Health Information.”

[6] 45 C.F.R. §160.103, “Covered entity.”

[7] 45 C.F.R. §160.103, “Business associate.”

[8] 45 C.F.R. §164.506(a), (c).

[9] 45 C.F.R. §164.501, “Treatment.”

[10] 45 C.F.R. §164.501, “Payment.”

[11] 45 C.F.R. §164.501, “Health care operations.”

[12] 45 C.F.R. §164.506(c).

[13] For health care entities with large patient populations, with historical data sets extending back years, or where multiple consent documents have been used over time, it can be difficult to collate what specific subsets of PHI are subject to which patient consent documents. Restricting AI processing activities to treatment, payment or health care operations can help alleviate the need to rely on patient consent for processing PHI.

[14] “Data analytics” generally means the process of examining data to produce actionable insights. For example, using statistics, querying and computation to describe large trends or patterns identified in data. This is often referred to as “insights data.”

[15] See OCR “Privacy Brief, Summary of the HIPAA Privacy Rule,” https://bit.ly/3ZPfJWT, last revised 05/03, page 4, De-Identified Health Information; see also 45 C.F.R. §§164.502(d)(2), 164.514(a) and (b).

[16] 45 C.F.R. §164.514(b).

[17] 45 C.F.R. §164.514(b)(1)(i)-(ii).

[18] “Guidance on De-identification of Protected Health Information,” Nov. 26, 2012, https://bit.ly/4qYgIjw, page 10.

[19] Id. at 13-14.

[20] Id. at 15.

[21] Id. at 18.

[22] See identifiers listed in the sidebar on page XX. 45 C.F.R. §164.514(b)(2)(i)(A)-(R). Examples of things that may fall under the “any other unique” category include clinical trial numbers, barcodes from medical records or electronic prescribing systems and occupations or characteristics that set an individual apart. See Guidance on De-identification of Protected Health Information, Nov. 26, 2012, https://bit.ly/4kBKqc1, page 26.

[23] 45 C.F.R. §164.514(b)(2)(i)-(ii).

[24] Id. at 27.

[25] See id. at 6.

[26] See 45 C.F.R. §164.400 et seq. (Breach Notification Rule).

[27] See Wesley Weeks, Nick Peterson and Rachel Tuteur, “Careless Generative AI Use Puts Attorney-Client Privilege at Risk,” Bloomberg Law News (Jan. 21, 2025), available at https://bit.ly/45NcFOH (visited Oct. 13, 2025); Ismail Amin, “Client Beware: The Utilization of Artificial Intelligence Platforms and the Potential Waiver of Attorney-Client Privilege,” JDSupra (Aug. 22, 2025), available at https://bit.ly/4qmlma6 (visited Oct. 13, 2025).

[28] See 45 C.F.R. §§164.506(b), 164.508(a).

[29] See 45 C.F.R. §164.520.


Originally published in the Oklahoma Bar JournalOBJ 97 No. 3 (March 2026)

Statements or opinions expressed in the Oklahoma Bar Journal are those of the authors and do not necessarily reflect those of the Oklahoma Bar Association, its officers, Board of Governors, Board of Editors or staff.