Passionate data mining and backend application developer.
Work projects are related to Java web-frameworks, Highload, NoSQL technologies.
Possibilities of the Hadoop/Spark projects are used for data processing. Research projects are related to large-scale road graph data processing, social network analysis, data mining, machine learning, traffic jams prediction, mathematical models of complex systems.
He is leader of the Google Developer Group in Russia and a Joker/Mobius program committee member. Also, he organized Java, Android conferences and GDG DevFest in Omsk during 2013-14. Now he is working in EPAM as Senior Training and Development Specialist.
Java is often criticized for hard parsing CSV datasets, poor matrix and vectors manipulations. This makes it hard to easy and efficiently implement certain types of machine learning algorithms. In many cases data scientists choose R or Python languages for modeling and problem solution and you as a Java developer should rewrite R algorithms in Java or integrate many small Python scripts in Java application.
But why so many Highload tools like Cassandra, Hadoop, Giraph, Spark are written in Java or executed on JVM? What is the secret of successful implementation and running? Maybe we should forget old manufacturing approach when we separate developers from research engineers in production projects?
During the report, we will discuss typical Data Mining tasks, advantages and disadvantages of Hadoop ecosystem, battle between Spark and Hadoop for a place under the Sun, difference between popular Machine Learning tools and libraries.
Attendees of my talk will become more familiar with different abbreviations and buzz words and also will get useful tips about self-education way in this area.