Sentiment Analysis: Understand Your Healthcare Customers

March 17, 2015

News

Article

Applied Clinical Trials

Sentiment analysis can give health care organizations a competitive edge in understanding what customers think about their healthcare experience, to help reduce costs and improve care service and to lead to new clinical research and treatments. In addition, it taps into a new channel of pharmacovigilance input information that can enable Marketing Authorization Holders to keep abreast of opinions on the safety of their products in real time.

In the context of medicinal products and devices, sentiment can be referred to an adverse event experience but also a positive treatment outcome.

The sentiment can be deducted as final output of a technique that includes the massive collection of some unstructured information from any source selected as relevant and their processing aimed to identify and extract the implicit subjective judgment or evaluation.

Proof of concept: Roche-GSK competition on melanoma therapy

The main goal of the proof of concept (PoC) is to show the applicability of a sentiment analysis approach to clinical data, in the context of social media monitoring, data analysis and reporting. In order to obtain a sensible quantitative analysis we collected data from different sources. It should be noted that because this is a proof of concept we have planned for a limited data investigation. Therefore, by increasing numbers in a production environment we will gain remarkable quantitative improvements and analysis refinements.

Specifications

This proof of concept focuses on sentiment analysis of opinions shared on the Web about two products for the treatment of melanoma marketed by GSK and Roche. Our research involved Roche’s single-agent Zelboraf (vemurafenib) and GSK's combination of Mekinist (Trametinib) and Tafinlar (dabrafenib).

In this study we gathered users comments from three different types of data sources, i.e., online patient fora, Facebook public fan pages[1] and the micro-blogging Twitter from which we have been collecting tweets daily for a one-month period (to overcome Twitter’s API Search limitations on the number and date of tweets that can be collected).[2]

Customization

After a preliminary standard analysis, we noticed that in some cases the standard emotional valences associated with particular words were not correct in this specific domain. As an example, in colloquial English the words “positive” and “negative” almost always carry respectively a positive and negative valence, while if they refer to a treatment or a disease in a clinical context, the two words might have a completely different meaning. For this purpose, we conducted a customization of the sentiment analysis algorithm using rules specialized to the healthcare domain.

For future analyses, we plan for a further customization refinement in order to better deal with the social media based unstructured terminology (abbreviations, slang, spelling mistakes, etc.) both in the general sentiment analysis of texts and in adverse event descriptions.

Sources and analysis

Twitter

We have been collecting tweets about Roche and GSK’s therapies for a one-month period and then performed sentiment analysis using the following procedure:

A data collection task enabled us to extract tweets searching them by keywords (in particular, we used INNs for all drugs involved as well as the trade names). Search for abbreviations, vernacular designations, acronyms, etc. of the products was out of scope in the PoC step.

We used a general sentiment analysis algorithm of tweet texts to find out the general tone expressed by users

Entity extraction enabled us to point out sentiments associated to each of the two companies and relative treatments

Finally, we visualized results using dashboards and we created reports to compare the results in tabular or other descriptive form.

Once we have created the workflows we can re-run them multiple times to repeat the analysis of the tweets in order to progressively monitor opinions expressed on Twitter about the two companies.

Results

We collected tweets about GSK and Roche’s treatments using as search terms both the trade names (Zelboraf, Tafinlar) and the INNs (vemurafenib, dabrafenib). In order to improve the analysis we added the Bristol-Myers Squibb’s treatment for metastatic melanoma Yervoy (Ipilimumab) to the search terms.

We then created a bar chart showing general sentiment polarity percentages of the tweets (Figure 1). The numbers on the horizontal axis represent the sentiment polarities assigned to the tweets by the sentiment classification algorithm; in particular, a positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment. The higher a positive number is the better the opinion, while the higher a negative number is the worse the opinion. Percentages on the bars indicate respectively the number of tweets classified with the corresponding polarity compared to the total of the tweets. We summed the percentages corresponding to positive numbers to get the total of positive results and then we did the same for negative cases.

It turned out that the overall tone of texts was neutral (as shown by the 62.04% peak at 0.0 in Figure 1) followed by positive comments (33.54%), while only negative ones were a very small number (4.44%).

Figure 1. Polarities associated to tweets about Roche and GSK therapies[3]

The algorithm of Entity Extraction returned a list of entities, i.e. nouns of products, brands, companies, etc. that are visualized using tag clouds (Figure 2). These interactive tools are used to show the most relevant entities and the importance of each tag is shown with font size or color. As an example, in Figure 3 and in Figure 4 respectively, we showed sentiment polarities associated to the entities “Vemurafenib” and “Dabrafenib” using the same type of chart as in Figure 1.

Figure 4. Sentiment polarities of entities extracted from tweets containing the entity “dabrafenib”

Patient Forum

We extracted patient comments about Zelboraf treatment and GSK's therapy Dabrafenib from three online patient fora, respectively the Macmillan's Online Community's forum, the Cancer Survivors Network discussion board and the DailyStrength health-related social network, in order to recollect patients’ experiences with the two treatments. Specifically, we carried out the following tasks:

Data collection using a Web extractor to gather comments containing the arguments of interest;

Sentiment analysis to classify texts into three groups in terms of positive, negative or neutral polarity;

Entity extraction enabled us to retrieve treatments mentions and classify the sentiment associated to them and, more important, to monitor side effect mentioned.

We then visualized results using dashboards and created reports.

We can compare the results to understand which of the two companies (treatments) are the most appreciated and why (extracting sentiment words).

Results

We used sentiment analysis on the collected texts and then we showed the results using bar charts (Figure 5 and Figure 6). Same as above, the numbers on the horizontal axis represent the sentiment polarities assigned to the texts by the sentiment classification algorithm; in particular, a positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment. Percentages on the bars indicate respectively the number of texts classified with the corresponding polarity compared to the total of the texts. We totaled the percentages corresponding to positive numbers to get the total of positive results and then we did the same for negative cases.

Using sentiment analysis on comment texts about Zelboraf treatment it resulted that the overall tone was positive (40%) even if there was also a small number of high negative cases (-10, -8.5), while the neutral cases corresponded to the 23.85% (Figure 5)

Doing the same analysis on the comments about the GSK therapy we retrieved similar results since positive comments were still the majority (49%) followed by negatives (32%) and neutral comments (18.89%) (Figure 6).

Figure 5. Sentiment polarities associated to comments about Zelboraf therapy

This positive attitude is confirmed by predominance of positive sentiment words these are words expressing the sentiments retrieved in the texts, which could be visualized using a tag cloud (Figure 7).

Figure 7. Most frequent sentiment words

Same as in the Twitter example, the algorithm of Entity Extraction returned a list of entities that included the different therapies (“Zelboraf”, “Dabrafenib”, “Pazopanib”, “Interferon”) and we retrieved several side effects such as “fevers”, “back pain”, “skin rash”, associated with the entity “Dabrafenib” and “nausea”, “fatigue” “rash”, “cough”, “headache” associated with the entity “Zelboraf”.

Facebook

We extracted data from three no-profit melanoma organization Facebook fan pages that are Facebook public pages where each organization shares news and information about Melanoma and other users are able to leave comments to these posts.

More specifically, we extracted a total of 836 user comments associated to the messages posted by the organization, in order to collect melanoma experiences shared in the comments.

We used the following procedure of analysis:

Collected data using a specific task to extract posts and comment messages searching them from the specific Facebook fan pages

Used a general sentiment analysis algorithm on the comments’ text to find out the general tone expressed by users

Retrieved references to the therapies of interest using entity extraction

Visualized results using charts

Results

We used sentiment analysis on the comment texts and then we showed the results using a bar chart (Figure 7). The numbers on the vertical axis represent the sentiment polarities assigned to the texts by the sentiment classification algorithm; in particular, a positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment. Percentages on the bars indicate respectively the number of texts classified with the corresponding polarity compared to the total of the texts. We summed the percentages corresponding to positive numbers to get the total of positive results and then we did the same for negative cases. As we can see from the bar chart in Figure 7, the overall tone associated to comments is neutral, as shown by the 47.74% peak at 0.0. Differently to the previous examples, in this case the range of the sentiment polarities is more scattered with more fluctuation of high positive and high negative sentiment scores (9.5, 10, -12, -15, -14).

Entity extraction enabled us to retrieve references of GSK and Roche treatment (“Vemurafenib”, “Dabrafenib”, “Trametinib”,“combo”) as well as other melanoma treatments references, such as Yervoy (Ipilimumab), Dacarbazine, IL-2, Interferon Alfa, and we could also use sentiment classification on these entities in order to understand users opinions about them. In Figure 8 we showed some treatment entities using a tag-cloud.

The review of the comments in the social media postings enabled us to retrieve several references to side effects such as “diarrhea”, “fever”, “fatigue”, “colitis”, “itching”, “weight loss” and treatments associated to them (Dabrafenib, Yervoy, Interferon). In some cases, different expressions are retrieved referring to the same adverse event, such as “tiredness”, “feel fatigued” to mean “fatigue”. Some of the side effects mentions are shown with other entities in the tag-cloud in Figure 9.

Conclusions and future work

This work was a first attempt to demonstrate the feasibility of a sentiment analysis approach that used digital media monitoring to gain insight into the clinical use profile of the GSK and Roche melanoma treatments. These two products were selected for this pilot because they have recently obtained marketing approval and therefore elicited considerable media attention.

Taking into account that our algorithm was still in pilot mode, this first high-level analysis has shown that sentiment analysis has the potential of showing differences between the two products, as well as information about similar treatments and adverse events. We are aware that these results depend on the number of sources analyzed and precision of the analysis may improve by increasing the number of sources.

Our future intent will be to conduct additional analyses to improve the method, both in terms of data sources reviewed and analyzed as well as the algorithm refinement. The plan is to focus on the following aspects:

Obtain an in-depth analysis of different melanoma treatments and compare their strengths and weaknesses on the basis of sentiment analyses

Use the above social media sites reviewed to compare the risk – benefit profile of the GSK and Roche products against the Core Data Sheet / label. Specifically, verify if hitherto unknown adverse events could be found and whether for labeled adverse events their frequency differs from what is expected as per label. Conversely, confirm that the current benefit – risk profile of both products described in the SmPC is corroborated by data on social media.

Appendix: Data Sources

In closing, we present a list of the data sources we have used for the analysis.

This list might be increased, both in scope (number of social media sites as well as Twitter search terms and types of therapies) and in time frame, for further analyses.

Twitter

Over 1000 Twitter's timeline messages dated from 01/11/2014 to 01/12/2014 containing the following keywords:

- Vemurafenib

- Dabrafenib

- Ipilimumab

- Zelboraf

- Tafinlar

- Yervoy

Forum:

- Macmillan Cancer Support's Online Community:

http://community.macmillan.org.uk/search/default.aspx#q=dabrafenib

http://community.macmillan.org.uk/search/default.aspx#q=zelboraf

- Cancer Survivors Network Community:

http://csn.cancer.org/forum/145/search?body=Zelboraf&title=

- DailyStrength patient forum:

http://www.dailystrength.org/

Facebook:

- AIM at Melanoma public Facebook fan page:

https://www.facebook.com/AIMatMelanoma

- Melanoma UK public Facebook fan page:

https://www.facebook.com/MelanomaUK

- Melanoma International Foundation public Facebook fan page:

https://www.facebook.com/MelanomaInternationalFoundation

Disclosure of potential conflict of interest:

This publication was supported by Ethical GmbH and ALTILIA srl. There was no support by or involvement of Roche and GSK.

[1] A Facebook public fan page is the official profile of a Company or Organization on the social network

[2] Historical Archives of Tweets, available from licensed tweet providers (e.g. Gnip), could be used to go beyond the limits dictated by Twitter Inc. for the online availabity of tweets issued in the past.

[3] A positive number indicates a positive sentiment, a negative number indicates a negative sentiment while 0.0 indicates a neutral sentiment

[4] The size of each word is related to the number of its occurrences in texts; the bigger the number of text containing a word, the bigger the font size. The colors are not significant.