As more and more states create APCDs, their uses are expanding and evolving. Data generated by APCDs are being used to generate disease, injury, chronic condition, and utilization rates across demographic and geographic settings, as well as by payer type. States are also contemplating linking administrative claims data with clinical data in order to better assess the quality of health care provider services. In addition, states are moving toward providing greater accessibility to their APCD data through public web sites and/or by subscription through a dedicated portal. In order to fulfill these (and other) proposed uses, capturing and sharing Protected Health Information (PHI) is critical.
In order to identify the members as they move across the health care system and to segment the data when necessary, it is essential that a significant amount of PHI data be collected. Ideally, at a minimum, the following data elements should be collected:
- Names (last, first, middle initial)
- Social Security Number
- Contract Number
- Date of Birth
- Place of Residence (street address, municipality, zip code, state name)
Depending upon the legal constraints found in a particular state, the data elements listed above are collected and processed using different approaches or methodologies. In some states direct identifiers (e.g. – name, SSN) are hashed by the data submitters using the same algorithm (usually SHA 512) prior to transmission to the data processor via SFTP. To prevent a data submitter from identifying a member in the data who over time is covered by another plan, the hashed elements are then encrypted (usually Triple DES) prior to being released.
In other states direct identifiers are collected and then used to create a non-identifiable master member/patient index (or crosswalk) prior to the data being released. In one state the data submitters send the direct identifiers to a “third party’ vendor, which creates the master index and then returns the index to the data submitters for inclusion in the files provided to the data processing vendor.
Once the direct identifiers are collected and converted to an encrypted value or a master member/patient identifier, they are no longer considered to be PHI. However, if these data are released along with the other descriptive member data elements (Date of Birth, Gender, and Place of Residence) which are necessary for the required analyses, the potential exists for the indirect identification of individuals if the APCD data are associated with other information (e.g. – voter registration lists). To prevent this from occurring, both risk mitigation techniques (suppression, generalization, perturbation) and administrative/legal restrictions (data use agreements) must be utilized as the data are released.
Given the amount of PHI contained in most APCDs, along with their expanded uses and greater accessibility to the data, careful consideration must be given to managing the release of the data to avoid violating the pertinent sections of the HIPAA privacy rules.
The Federal rules associated with Protected Health Information (PHI), 45 CFR Parts 160 and 164 – Standards for Privacy of Individually Identifiable Health Information, define three approaches to protect individuals from being associated with their health care data by users of claims and other health care related data. The three approaches are found in §164.514.
- The first approach, as described in §164.514 (b) (2) and commonly referred to as the “safe harbor method” allows for the release of health care data associated with individuals, their relatives, employers, or household members as long as all direct identifiers are removed from the data, including all geographic units smaller than a state (except for using the first three digits of the zip code if the population covered is greater than 20,000) and all date fields.
As the name implies, this approach provides the lowest risk of an individual and his or her health care data being associated. However, by prohibiting the release of all date related, and most geographic related data, the utility of the APCD to any potential non-government users is greatly diminished. For example, if health plans or employers would like to evaluate utilization rates or readmission rates by hospital service area, it would be impossible without the geographic and date of service data.
- The second approach, as described in §164.514 (b)(1) and further defined by the Department of Health and Human Services Office of Civil Rights 2012 Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule, is commonly known as the “expert determination method”. This approach relies on a person with appropriate experience with statistical and scientific methods related to health care data to evaluate the data being collected and released and then provide recommendations on how to minimize the risk that the information could be used alone or in combination with other data sources to identify an individual.
While this approach is potentially less restrictive than the other two approaches and provides greater flexibility to non-state government users of the data, while still providing a high level of risk minimization that an individual and his or her health care data will not be associated, there is also a good deal of uncertainty involved with this approach. If the expert is asked to evaluate a system for releasing the data rather than a specific report or analysis containing member associated data, it may be that the determination will be to follow the safe harbor approach due to the uncertainty of the data content and its presentation to the public. However, the expert, as part of the risk evaluation, must also to take into consideration whether or not the proposed recommendations regarding de-identification will lead to information loss which may limit the usefulness of the resulting health information in certain circumstances.
- The third approach, as described in §164.514 (e) and commonly referred to as the “limited data set”, also prohibits the release of same direct identifiers as listed under the safe harbor approach, but allows the release of geographic units greater than street or postal addresses and all date fields if the data user enters into a data use agreement that specifies how the data are to be used, who specifically can access the data, and requires the user to certify that the data will be protected and not accessed by other parties.
Some may view this approach as having the greatest potential for an individual’s health care data to be associated through indirect identification. However, in states where monetary and civil penalties have been established for violating the requirements of a data use agreement, incentives exist to minimize that risk. The benefit of using the limited data set approach is the certainty that is provided to non-government users that the critical data (dates and geographic data) will be available for important analytic proposals as long as the data use agreement is followed, and no data are generated and made available to the public which could lead to an individual and his or her health care data being associated.
Before making a decision on the approach taken to release their APCD data, will need to weigh the mandated analytic requirements and expectations of those wanting to use the data against the public’s concern that the release of data is occurring in a manner that prevents the association of individuals with their health care data by unscrupulous individuals.