Applied Clinical Trials
Sentiment analysis can give health care organizations a competitive edge in understanding what customers think about their healthcare experience, to help reduce costs and improve care service and to lead to new clinical research and treatments. In addition, it taps into a new channel of pharmacovigilance input information that can enable Marketing Authorization Holders to keep abreast of opinions on the safety of their products in real time.
Sentiment analysis can give health care organizations a competitive edge in understanding what customers think about their healthcare experience, to help reduce costs and improve care service and to lead to new clinical research and treatments. In addition, it taps into a new channel of pharmacovigilance input information that can enable Marketing Authorization Holders to keep abreast of opinions on the safety of their products in real time.
In the context of medicinal products and devices, sentiment can be referred to an adverse event experience but also a positive treatment outcome.
The sentiment can be deducted as final output of a technique that includes the massive collection of some unstructured information from any source selected as relevant and their processing aimed to identify and extract the implicit subjective judgment or evaluation.
The main goal of the proof of concept (PoC) is to show the applicability of a sentiment analysis approach to clinical data, in the context of social media monitoring, data analysis and reporting. In order to obtain a sensible quantitative analysis we collected data from different sources. It should be noted that because this is a proof of concept we have planned for a limited data investigation. Therefore, by increasing numbers in a production environment we will gain remarkable quantitative improvements and analysis refinements.
This proof of concept focuses on sentiment analysis of opinions shared on the Web about two products for the treatment of melanoma marketed by GSK and Roche. Our research involved Roche’s single-agent Zelboraf (vemurafenib) and GSK's combination of Mekinist (Trametinib) and Tafinlar (dabrafenib).
In this study we gathered users comments from three different types of data sources, i.e., online patient fora, Facebook public fan pages[1] and the micro-blogging Twitter from which we have been collecting tweets daily for a one-month period (to overcome Twitter’s API Search limitations on the number and date of tweets that can be collected).[2]
After a preliminary standard analysis, we noticed that in some cases the standard emotional valences associated with particular words were not correct in this specific domain. As an example, in colloquial English the words “positive” and “negative” almost always carry respectively a positive and negative valence, while if they refer to a treatment or a disease in a clinical context, the two words might have a completely different meaning. For this purpose, we conducted a customization of the sentiment analysis algorithm using rules specialized to the healthcare domain.
For future analyses, we plan for a further customization refinement in order to better deal with the social media based unstructured terminology (abbreviations, slang, spelling mistakes, etc.) both in the general sentiment analysis of texts and in adverse event descriptions.
We have been collecting tweets about Roche and GSK’s therapies for a one-month period and then performed sentiment analysis using the following procedure:
Finally, we visualized results using dashboards and we created reports to compare the results in tabular or other descriptive form.
Once we have created the workflows we can re-run them multiple times to repeat the analysis of the tweets in order to progressively monitor opinions expressed on Twitter about the two companies.
We collected tweets about GSK and Roche’s treatments using as search terms both the trade names (Zelboraf, Tafinlar) and the INNs (vemurafenib, dabrafenib). In order to improve the analysis we added the Bristol-Myers Squibb’s treatment for metastatic melanoma Yervoy (Ipilimumab) to the search terms.
We then created a bar chart showing general sentiment polarity percentages of the tweets (Figure 1). The numbers on the horizontal axis represent the sentiment polarities assigned to the tweets by the sentiment classification algorithm; in particular, a positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment. The higher a positive number is the better the opinion, while the higher a negative number is the worse the opinion. Percentages on the bars indicate respectively the number of tweets classified with the corresponding polarity compared to the total of the tweets. We summed the percentages corresponding to positive numbers to get the total of positive results and then we did the same for negative cases.
It turned out that the overall tone of texts was neutral (as shown by the 62.04% peak at 0.0 in Figure 1) followed by positive comments (33.54%), while only negative ones were a very small number (4.44%).
Figure 1. Polarities associated to tweets about Roche and GSK therapies[3]
The algorithm of Entity Extraction returned a list of entities, i.e. nouns of products, brands, companies, etc. that are visualized using tag clouds (Figure 2). These interactive tools are used to show the most relevant entities and the importance of each tag is shown with font size or color. As an example, in Figure 3 and in Figure 4 respectively, we showed sentiment polarities associated to the entities “Vemurafenib” and “Dabrafenib” using the same type of chart as in Figure 1.
Figure 4. Sentiment polarities of entities extracted from tweets containing the entity “dabrafenib”
We extracted patient comments about Zelboraf treatment and GSK's therapy Dabrafenib from three online patient fora, respectively the Macmillan's Online Community's forum, the Cancer Survivors Network discussion board and the DailyStrength health-related social network, in order to recollect patients’ experiences with the two treatments. Specifically, we carried out the following tasks:
We then visualized results using dashboards and created reports.
We can compare the results to understand which of the two companies (treatments) are the most appreciated and why (extracting sentiment words).
We used sentiment analysis on the collected texts and then we showed the results using bar charts (Figure 5 and Figure 6). Same as above, the numbers on the horizontal axis represent the sentiment polarities assigned to the texts by the sentiment classification algorithm; in particular, a positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment. Percentages on the bars indicate respectively the number of texts classified with the corresponding polarity compared to the total of the texts. We totaled the percentages corresponding to positive numbers to get the total of positive results and then we did the same for negative cases.
Using sentiment analysis on comment texts about Zelboraf treatment it resulted that the overall tone was positive (40%) even if there was also a small number of high negative cases (-10, -8.5), while the neutral cases corresponded to the 23.85% (Figure 5)
Doing the same analysis on the comments about the GSK therapy we retrieved similar results since positive comments were still the majority (49%) followed by negatives (32%) and neutral comments (18.89%) (Figure 6).
Figure 5. Sentiment polarities associated to comments about Zelboraf therapy
This positive attitude is confirmed by predominance of positive sentiment words these are words expressing the sentiments retrieved in the texts, which could be visualized using a tag cloud (Figure 7).
Figure 7. Most frequent sentiment words
Same as in the Twitter example, the algorithm of Entity Extraction returned a list of entities that included the different therapies (“Zelboraf”, “Dabrafenib”, “Pazopanib”, “Interferon”) and we retrieved several side effects such as “fevers”, “back pain”, “skin rash”, associated with the entity “Dabrafenib” and “nausea”, “fatigue” “rash”, “cough”, “headache” associated with the entity “Zelboraf”.
We extracted data from three no-profit melanoma organization Facebook fan pages that are Facebook public pages where each organization shares news and information about Melanoma and other users are able to leave comments to these posts.
More specifically, we extracted a total of 836 user comments associated to the messages posted by the organization, in order to collect melanoma experiences shared in the comments.
We used the following procedure of analysis:
We used sentiment analysis on the comment texts and then we showed the results using a bar chart (Figure 7). The numbers on the vertical axis represent the sentiment polarities assigned to the texts by the sentiment classification algorithm; in particular, a positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment. Percentages on the bars indicate respectively the number of texts classified with the corresponding polarity compared to the total of the texts. We summed the percentages corresponding to positive numbers to get the total of positive results and then we did the same for negative cases. As we can see from the bar chart in Figure 7, the overall tone associated to comments is neutral, as shown by the 47.74% peak at 0.0. Differently to the previous examples, in this case the range of the sentiment polarities is more scattered with more fluctuation of high positive and high negative sentiment scores (9.5, 10, -12, -15, -14).
Entity extraction enabled us to retrieve references of GSK and Roche treatment (“Vemurafenib”, “Dabrafenib”, “Trametinib”,“combo”) as well as other melanoma treatments references, such as Yervoy (Ipilimumab), Dacarbazine, IL-2, Interferon Alfa, and we could also use sentiment classification on these entities in order to understand users opinions about them. In Figure 8 we showed some treatment entities using a tag-cloud.
The review of the comments in the social media postings enabled us to retrieve several references to side effects such as “diarrhea”, “fever”, “fatigue”, “colitis”, “itching”, “weight loss” and treatments associated to them (Dabrafenib, Yervoy, Interferon). In some cases, different expressions are retrieved referring to the same adverse event, such as “tiredness”, “feel fatigued” to mean “fatigue”. Some of the side effects mentions are shown with other entities in the tag-cloud in Figure 9.
This work was a first attempt to demonstrate the feasibility of a sentiment analysis approach that used digital media monitoring to gain insight into the clinical use profile of the GSK and Roche melanoma treatments. These two products were selected for this pilot because they have recently obtained marketing approval and therefore elicited considerable media attention.
Taking into account that our algorithm was still in pilot mode, this first high-level analysis has shown that sentiment analysis has the potential of showing differences between the two products, as well as information about similar treatments and adverse events. We are aware that these results depend on the number of sources analyzed and precision of the analysis may improve by increasing the number of sources.
Our future intent will be to conduct additional analyses to improve the method, both in terms of data sources reviewed and analyzed as well as the algorithm refinement. The plan is to focus on the following aspects:
In closing, we present a list of the data sources we have used for the analysis.
This list might be increased, both in scope (number of social media sites as well as Twitter search terms and types of therapies) and in time frame, for further analyses.
Over 1000 Twitter's timeline messages dated from 01/11/2014 to 01/12/2014 containing the following keywords:
- Vemurafenib
- Dabrafenib
- Ipilimumab
- Zelboraf
- Tafinlar
- Yervoy
Forum:
- Macmillan Cancer Support's Online Community:
http://community.macmillan.org.uk/search/default.aspx#q=dabrafenib
http://community.macmillan.org.uk/search/default.aspx#q=zelboraf
- Cancer Survivors Network Community:
http://csn.cancer.org/forum/145/search?body=Zelboraf&title=
- DailyStrength patient forum:
Facebook:
- AIM at Melanoma public Facebook fan page:
https://www.facebook.com/AIMatMelanoma
- Melanoma UK public Facebook fan page:
https://www.facebook.com/MelanomaUK
- Melanoma International Foundation public Facebook fan page:
https://www.facebook.com/MelanomaInternationalFoundation
Disclosure of potential conflict of interest:
This publication was supported by Ethical GmbH and ALTILIA srl. There was no support by or involvement of Roche and GSK.
[1] A Facebook public fan page is the official profile of a Company or Organization on the social network
[2] Historical Archives of Tweets, available from licensed tweet providers (e.g. Gnip), could be used to go beyond the limits dictated by Twitter Inc. for the online availabity of tweets issued in the past.
[3] A positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment
[4] The size of each word is related to the number of its occurrences in texts; the bigger the number of text containing a word, the bigger the font size. The colors are not significant.
How Digital Technology and Remote Assessment Strategies Can Aid Clinical Trial Research
July 24th 2020While there's been hopeful news on treatments and vaccines, sponsors should plan to discuss necessary strategies and contingencies at the outset of new studies or re-opening of halted studies during the COVID-19 pandemic.