Java is often criticized for hard parsing CSV datasets, poor matrix and vectors manipulations. This makes it hard to easy and efficiently implement certain types of machine learning algorithms. In many cases data scientists choose R or Python languages for modeling and problem solution and you as a Java developer should rewrite R algorithms in Java or integrate many small Python scripts in Java application.
But why so many Highload tools like Cassandra, Hadoop, Giraph, Spark are written in Java or executed on JVM? What is the secret of successful implementation and running? Maybe we should forget old manufacturing approach when we separate developers from research engineers in production projects?
During the report, we will discuss typical Data Mining tasks, advantages and disadvantages of Hadoop ecosystem, battle between Spark and Hadoop for a place under the Sun, difference between popular Machine Learning tools and libraries.
Attendees of my talk will become more familiar with different abbreviations and buzz words and also will get useful tips about self-education way in this area.