Deep Learning Sentiment Analysis for Movie Reviews using Neo4j

Sentiment analysis uses natural language processing to extract features of a text that relate to subjective information found in source materials.

Movie Review Sentiment Analysis

A movie review website allows users to submit reviews describing what they either liked or disliked about a particular movie. Being able to mine these reviews and generate valuable meta data that describes its content provides an opportunity to understand the general sentiment around that movie in a democratized way. That’s a pretty cool thing if you think about it. Using machine learning we can democratize subjectivity about anything in the world. We can make an objective analysis of subjective content, giving us the ability to better understand trends around products and services that we can use to make better decisions as consumers.

Sentiment Analysis Data Model

One of the major barriers to unlocking this ability is in the way we structure and transform our data. The current state-of-the-art methods include approaches such as Naive Bayes, Support Vector Machines, and Maximum Entropy. The challenges imposed by these approaches still remains in how features are extracted from a text and structured as data in a way that is least costly in terms of performance. I decided to focus on solving the problem of performance, in the way features are selected and extracted, and the availability of that data as the number of features grow over time.

Using a feature selection algorithm I describe here, I used the Graph Database Neo4j to solve the challenge of data transformation and availability. While the state of the art natural language parsing algorithms are focused on sentence structure, I’ve decided to pursue a statistical approach to natural language grammar induction. My approach focuses on generalizations across a vast corpus of text, generating new features using deep learning to predict features with the highest probability of being present to the left or right of a new feature.

Graph-based NLP Example

Let’s assume that the phrase “one of the worst” has been extracted as a feature of a set of texts. The reason that this phrase was extracted was that a phrase that it was descended from had determined that this particular phrase was the most statistically relevant, meaning that the phrase had the best chance of being matched after the parent phrase. Using Neo4j we can determine the line of inheritance that produced this phrase as a feature.


#JavaOne Replay: #eBay and #JavaServer #Faces

During JavaOne, Sushma Sharma and Ken Paulsen of eBay did a session on Gandalf: ‘eBay, Connecting Buyers and Sellers Globally via JavaServer Faces’.

Gandalf is a ‘Quick Listing tool’. Gandalf is JSF based tool that let non professional eBay users list their items in order to sell them. Since Gandalf targets non professional users, the tool and the its user interface has to be intuitive and simple. And as mentioned in during the session, that type of users represent the largest population of amongst the eBay sellers. Gandalf an application that is very demanding in terms or features and requirements. Security is obviously a top requirement but accessibility, responsiveness, … are also very important.

And last but not least, Gandalf is widely used. On a typical day, Gandalf is used by around 200,000 sellers who are adding around 800,000 new listings (again, this is per day!). And that number can grow up to 2.5 millions listings on a peak day! So Gandalf is not really a typical enterprise application, it is more a large scale end-user facing web application.

During their session, Sushma and Ken have discussed JSF, how JSF can scale, … they have also shared a few JSF tips. Despite the small technical issues (e.g. a few audio hiccups), this J1 session replay is particularly interesting as it clearly kills the ‘JSF doesn’t scale’ myth as this session clearly demonstrate that well designed JSF applications can cope with very demanding requirements.

Surprise! Java is fastest for server-side Web apps

Looks like Oracle’s continued push for Java everywhere, from the “Internet of things” on down, isn’t just based on hype. At least one set of numbers puts Java’s performance head and shoulders above that of the competition for server-side Web frameworks. But is performance alone enough to win over the non-Java faithful?

Since March 2013, software development firm TechEmpower has been running an ongoing series of performance benchmarks for dozens of popular Web application server frameworks, such as Ruby on Rails or Django. Each successive round of tests has benefited from community feedback, with the benchmarks themselves released as open source on GitHub. Those interested in having their own frameworks benchmarked can fork the code, add their own tests, and submit the results.

When the seventh round of TechEmpower benchmarking concluded at the end of October — with 84 frameworks and some 200 different test permutations — the dust settled to reveal that many of the frameworks that performed best across the board were Java-based. Four frameworks in particular stand out: Gemini, Grizzly (created to allow easy use of Java’s New I/O API), Undertow, and Vertx.

What’s most striking is how many of the frameworks that are better known — such as Sinatra for Ruby, various ASP.Net frameworks, and the aforementioned Django for Python — had performance that ranked sometimes orders of magnitude below the big winners. The new kid on the block, Node.js, did exhibit impressive performance, but still only clocked one-fourth to one-third the performance of the fastest contenders.

Oracle’s been beating the Java drum quite loudly of late as a one-size-fits-all solution to — well, everything, but definitely as a solution for building robust Web services. Its plans for Java 8 involve unifying Java’s various editions to make it easier to write code across both embedded devices and servers — which, if the ARM-in-the-server contingent has its way, may come to resemble each other more. And one of the biggest Java-related pushes is Project Avatar, a JavaScript and HTML5 services layer for Java that works with — guess what? — Grizzly.–java-is-fastest-for-server-side-web-apps.html

Java IO Faster Than NIO – Old is New Again!

Alex Blewitt tweeted an article by Paul Tyma titled: Thousands of Threads and Blocking I/O: The old way to write Java servers is new again. Paul is the Founder/CEO of ManyBrain, the creator of Mailinator.

Paul’s 65-slide presentation is a fast read for anyone interested in Java I/O, especially in a client/server setup. What makes the presentation interesting is Paul began his research of IO vs NIO with the presumptions that all Java developers are running around with: NIO is faster than IO because it’s asynchronous and non-blocking.

The more research he did, the more he found everyone repeating that claim, but a complete lack of benchmarks and research to go with it. Paul sat down and wrote up a quick “blast the server with data” benchmark and found in every case the NIO-based server was 25% slower than the blocking, thread-based IO server.