Background
The Sunshine Act
In 2013, the United States Centers for Medicare & Medicaid Services (CMS) introduced a new program that mandated pharmaceutical companies and manufacturers to submit records of payments made to healthcare professionals. The CMS published this data under the Open Payments dataset to provide increased transparency regarding the types of financial relationships between the corporate healthcare industry and healthcare providers. It is important to note that this data is not meant to denote any sort of inappropriate or unlawful behavior. Yet, these relationships may influence the behavior of doctors potentially resulting in impaired patient care, compromised integrity, and increased healthcare costs.
With this knowledge at hand, I wanted to explore the potential impacts these payments may have on healthcare providers, specifically on the types of prescriptions they write.
Data
The Open Payments & Medicare Part D Provider Utilization & Payment datasets
Open Payments
When the Open Payments data was first released in 2013, only data for August - December 2013 was gathered and released. Yet in subsequent years, full yearly data was available. This data encompasses two main types of payments, those for research purposes and general payments for things such as travel, food/beverages, gifts, speaking fees, licensing fees, etc. Altogether, over 40 Million payment records have been released for the years of 2013 through 2016, totaling an amount of $25 Billion. From these 40 Million records I focused on the 38 Million general payments going directly to healthcare providers rather than the 2.5 Million for research purposes. Beyond the number of payments, this dataset provided basic information regarding both the doctor receiving and the company providing the payment, along with a classification for the type of payment, and details about the form and nature of the payment.
Medicare Part D Provider Utilization & Payments
To get a basis for the type of prescriptions doctors write, I utilized yearly Medicare Part D Prescription datasets which provided aggregate information about the prescriptions a doctor wrote for that year. This dataset outlined the amount and costs of each prescription written for and utilized by Medicare Part D participants along with basic information regarding the prescribing individual and drugs themselves. Altogether, these yearly datasets contained more than 100 Million aggregate records between the years of 2013 and 2016.
Approach
Matching the payments providers received to the prescriptions they wrote
Every single healthcare provider in the United States is given a unique National Provider Identifier (NPI), it is essentially a healthcare professional equivalent of a social security number. This NPI is used to track them and their data, in this instance, over multiple datasets and situations. Practically all datasets about healthcare providers include this number and the Medicare Part D Prescriptions database is no different.
But, Federal law prohibits the government from releasing NPIs in the Open Payments data. Instead, a randomly generated unique payment ID is utilized to link providers within all Open Payments databases, but not outside of them. While NPIs are not released, the government does include some basic contact information for each of the records. This introduced a unique challenge for me as I had to find a way to link a provider's NPI to their Open Payment ID with just a few pieces of information.
Linking Payment IDs to National Provider Identifiers (NPI)
Since the Open Payments dataset included a provider's first name, last name, specialty, and full address I knew that this would be the best method for me to link NPIs to the Open Payments ID. Even though this information was given in the Open Payments dataset it was not included in the Prescriptions dataset. Instead, I utilized a descriptive NPI dataset and joined a provider's first name, last name, specialty, and full address to their records in the Prescriptions dataset based on their NPI. After I had these fields present in both the Open Payments and Prescriptions datasets I could finally perform a linkage.
I iterated through each dataset and created a dictionary of all the NPIs present in the Prescriptions dataset and all the Payment IDs present in the Open Payments dataset. These dictionaries contained the Payment ID or NPI as the key and had another dictionary as the value for these keys. This enclosed dictionary contained the first name, last name, specialty, state, address, and zip code as keys and the values were every value that appeared for that column under that NPI or Payment ID.
Once these dictionaries were made, I created an algorithm that iterated through each dictionary and scored the similarity between an NPI and Payment ID based on the keys of the enclosed dictionary. If an NPI and Payment ID had more than a 90% similarity, they were linked together meaning that x Payment ID belonged to y NPI. Through this method, I was able to link >99% of NPIs present in the Prescriptions dataset to their Payment ID, with less than 0.001% of the linkages being incorrect due to duplicative information.
Exploring linked NPIs & Open Payment IDs
The linkage of the NPIs to Open Payment IDs allowed me to work through the Prescriptions database to compare the mean brand-name prescribing rates of doctors who received and did not receive payments within the same specialty. Using an unequal variances t-test (Welch's t-test), I performed a statistical analysis on each specialty to see if payments affected the mean rate of brand name prescriptions being written. To ensure the data utilized for each specialty was sufficient to serve as an indicator of the population, I only analyzed specialties where there were at least 30 doctors in both the unpaid and paid groups. Through this analysis, I was able to find statistically significant results displaying that within the specialties of Family Medicine, Internal Medicine, and Opthalmology displaying that doctors in these specialties who received payments were writing brand-name prescriptions at higher rates than their colleagues who did not receive payments.
Search Function
Limitations
Correlation & Causation
It is important to note that this analysis is not all-encompassing and has many limitations which could significantly alter its results.
- Generic Alternatives
A factor that I was not able to account for was checking if there are any current generic alternatives for the name-brand drug a doctor prescribed. This is important as doctors who have to prescribe their patients' medications without any generic alternatives would naturally have higher brand name percentages compared to those with the choice to prescribe generic alternatives. - Medicare Part D
The prescription data was drastically limited to only a small subset of the entire American population. Since these prescriptions were mainly written for an older population and certain people with disabilities, it is unlikely to be indicative of the medications prescribed for the entire American population. Results may differ if the analysis was able to encompass data for all prescriptions written rather than just Medicare participants, however, brand-name versus generic prescription rates would likely remain constant. - NPI linkage
Since the focus of this study was on the doctors who received payments and had written Medicare Part D prescriptions, NPIs for doctors not appearing in the Medicare Part D Prescriptions dataset were not considered. Only doctors who were a part of the Medicare Part D dataset and had received payments were included in this analysis. - Confounding Factors
It's important to understand several other factors may have influenced doctors to write a greater or smaller number of brand-name prescriptions beyond those incorporated within this analysis. Whether it be a doctor's patient base, location, drug recalls, or other external forces without more information it is hard to determine whether receiving payments truly affects the number of brand-name drugs a doctor prescribes. The findings simply show a correlation between the two but do NOT imply causation.
Future Plans
Work In Progress
- Adding More States
I would love to add more states beyond my home state of New Jersey to the search function. This would allow nearly the entire population to search and find their doctor. However, due to the limitations of the web hosting service regarding the amount of data necessary for the payments and prescriptions information, this will likely remain an offline feature. - Classification/Prediction Engine
With the preprocessing and connection of the datasets complete, a classification engine could be built to identify the doctors most influenced by payments. These doctors would be the ones who prescribe brand-name drugs at higher rates due to the payments they receive compared to the mean doctor in their specialty. This could also be taken a step further and turned into a prediction engine to determine which doctors not included in this study would fall into this category. On the flip side, this would also be useful for companies looking for doctors to target.