Cognitive AI is a unique market, where seemingly similar services are available at up to 100x different prices.
Also, the quality of service depends on how data used by the vendor to train its models is similar to the data to be processed. All these make the comparison and evaluation a must.
At the same time, the evaluation is expensive and cumbersome because of the vast difference in the service APIs. Intento simplifies it by providing the same interface to all those services.
Recently we have integrated many Sentiment Analysis services. In this overview, we share some insights we got during the integration. We compare 15 Cloud Sentiment Analysis services, which support a total of 23 languages. The prices vary from $1 to $99 per 10,000 API calls across base tiers of different vendors.
Slides from this report are also available on Slideshare.
We have compared 15 Cloud Sentiment Analysis services with pre-built models, provided via public APIs.
Four of them are hosted on the Algorithmia marketplace. We evaluate services with public API and public pricing, with two exceptions: Salesforce Einstein, which bundles with Salesforce subscription and TheSay PreCeive, which does not have public pricing.
Here is the list in alphabet order:
- LexSent by hyindao at Algorithmia
- Sentiment Analysis by mtman at Algorithmia
- Sentiment Analysis by nlu at Algorithmia
- Social Sentiment Analysis by nlu at Algorithmia
- Amazon Web Services Comprehend
- Aylien Text Analysis
- Boson NLP Sentiment Analysis
- Google Cloud Natural Language
- IBM Watson NLU
- Meaning Cloud Sentiment Analysis
- Microsoft Cognitive Services Text Analytics
- Repustate Text Analytics
- Salesforce Einstein Language
- Twinword Sentiment Analysis
- TheSay PreCeive
You may find the full list of services we compared together with links and some other details in the Slideshare deck.
Sentiment analysis aims to determine the attitude of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. [Wikipedia]
Here we focus on the services that analyze the expressed attitude towards an object, as opposed to the Emotion Analysis, which evaluates the emotional state of the writer.
Sentiment polarity is the main metric of sentiment. Most of the services represent it by a sentiment score within some range between negative and positive ([-1,1] or [0,1]). The alternative approach is to return a set of sentiment labels with a confidence score attached to each label. Different services provide labels for a different number of steps on this scale. One important distinction is the Mixed sentiment, which may be indicated by a particular label in the latter approach but typically cannot be expressed in the former. Meaning Cloud solves that by introducing the sentiment agreement score. Google Cloud’s sentiment magnitude may refer to the same problem, if it may be high for neutral sentiment (needs testing).
IBM Watson, Meaning Cloud and Salesforce Einstein enable building custom models to account for specific slang or language used in some specific domain. Some other services allow for adding custom sentiment dictionaries (list of words with attached sentiment polarity scores), which we did not count as custom models here.
Sometimes sentence-level sentiment score is not enough. One case is when the subject can be evaluated along different dimensions or aspects. An example is a restaurant review, which may combine sentiment towards service, meals, and prices in one sentence.
Aspect-based models are always domain specific. We have found aspect-based sentiment in Aylien, Meaning Cloud and Repustate, with different domain models available at each of the services.
Another way to get more details is to perform entity extraction and then to analyse sentiment towards each of the entities mentioned in the sentence. This is supported by Google Cloud Natural Language. However, it is not clear if it is augmented with the cross-sentence anaphora resolution, which should be especially important for conversational texts.
Additionally, Aylien and Meaning Cloud provide sentiment subjectivity score, measuring how subjective is the writer opinion.
Surprisingly, only Meaning Cloud provides explicit irony detection. It is not clear if it is used in other models implicitly.
Combined, all services we compared support 23 languages, while a single service supporting 17 languages at most (Microsoft and Repustate).
English is obviously the most supported (13 vendors, as two LexSent and Boson NLP are Chinese-only), followed by Spanish (8 vendors), German and French (6 vendors each).
Sentiment Analysis services are typically priced per amount of API calls. Two important exclusions are:
- Amazon Comprehend is billed for “NLP units”, one unit per 100 symbols processed with a minimum of 3 unit per request. We count it as 3 units per request.
- Services available via Algorithmia are billed by the API request execution time, which is a little bit counterintuitive (the slower it works, the more client pays). We use average request time provided by Algorithmia. However, they change almost daily hence the prices may be not accurate.
We do not include Salesforce Einstein and TheSay PreCeive into our price comparison as they do not provide the public pricing.
Overall, the pricing models differ a lot, from true pay-as-you-go to flat monthly payments with quotas and overage. We have estimated volume-based price per 10,000 API call for each of the services.
We split all services in 4 price groups: $1-$3, $4-$10, $15-$30 and > $50 per 10,000 API calls.
There are three “strange” tiers where price increases with the volume. Most likely, for Repustate and Twinword it should incentivize clients to contact sales as bigger volume discount requires some commitment. For Aylien, the price spike may be because of the wider range of services available at this tier at the same price.
This huge 100x price difference is probably because the least expensive services perform only Sentiment Analysis, while others bundle it with different amount of other Text Analytics services.
Here, we do not account for document length limits imposed by difference services, which may significantly change the per-call price for a specific use.
As we show in feature comparison, Sentiment Analysis services from different vendors may be used interchangeably as sentiment scores can be estimated from labels and confidence, and vice versa. There’s a little uncertainty over mixed sentiment.
The 100x price difference (even 130x with the top AWS tier) requires careful analysis of feature requirements and performance evaluations. We’re working on that.
Today, the language coverage is limited to 23 languages. The list may be increased using Machine Translation, which is available for roughly 13,000 language pairs.