Sorry State of Sentiment Analysis?

While playing around with the searchable tweet map I brought back to life via node.js so that it could work with the twitter 1.1 api, I figured it would be cool to see if it were possible to add a sentiment analysis component to it. Maybe I could subtly color-code the tweets regarding sentiment.

It turns out there are really easy ways to do get an estimate of sentiment with just a few lines of code and one of the node sentiment modules, but the ones I found all were based on single-word approaches. All seemed to use AFINN from 2011 that of course can be wrong a lot of the time when sentiment is derived from multiple words together, rather than summing up individual word scores. And that is true a lot of the time.

It's a hard problem, I guess, but I was disappointed.

I came across a recent open-source engine by Stanford's NLP group, but they seemed to mainly just use Amazon's Mechanical Turk to get the sentiment values for movie reviews from a few years ago, coupled with a lot of fancy stats/methods to derive a model. This is in java, and is huge. Not a few lines in node.js (unless you hit their demo web page every time you want to get the sentiment for a tweet - which I wouldn't recommend).

After a little bit of googling and reading various recent papers I even more disappointed with where things seem to be.

Of course, twitter is sitting on the goldmine.

I seem to recall that sometimes twitter will return some sort of sentiment value in the guts of the tweet json, but not all the time. The fact that they don't just return this all the time tells me that they probably don't have any sort of model that can estimate these sentiment quantities quickly. Or I guess they could be holding back on giving those values out freely, but that seems a bit counter to their general openness.

Nevertheless, I had assumed we were farther along in this area. Hopefully, I'm missing something trivial here.

No comments:

Post a Comment

Popular Posts