facebook-pixel

Hadoop and Java Expert for Deployment

Industry Hi-Tech, Software

Specialization Or Business Function

Technical Function Devops

Technology & Tools Big Data and Cloud (Hortonworks, Apache Hadoop, Hadoop YARN), Programming Languages and Frameworks (Java), DevOps Tools (Docker)

CLOSED FOR BIDDING

Project Description

About Us:

We are a web-hosted platform, which does fully automated machine learning with cutting-edge state of the art performance, which can be deployed either on the cloud or on-premise. Once training datasets are connected to the platform, absolutely no human action is needed in order to get available models in production and get new predictions from incoming data. We fully automate regression, classification, multi-classification, segmentation, and time-series forecasting tasks.

When a user launches a use case, it starts a complex workflow composed of multiple tasks: data acquisition, dataset analysis, feature engineering, hyper-optimization, and modeling, blending...

We use essentially docker. You connect to a web interface, launch machine learning use cases and see all the data science workflow being done automatically. We use Kubernetes for cluster deployment in the cloud and deployment on a single VM with docker for on-premise.

We take  advantage of clusters (kubernetes) to deploy these containers.

Our solution is composed of two types of services:

- Long running services running as docker containers

- Short living tasks: running as jobs scheduled by docker/kube

Long Running Services: 

- website: Our web interface where users can interact with our platform either from the website or automatically through the APIs (nodejs)

- server_engine: service in charge of starting the automated machine learning jobs using the docker/kube client API (python)

Objective:

 There is a high demand of deploying on premise on Hadoop clusters (Hortonworks). The objective is to develop the service that launches jobs with YARN.

Users access our platform through the website or the API. When they launch use cases, a microservice (called server_engine in python) launches the different tasks either through the docker daemon to launch containers or the Kubernetes API to launch pods. Users never get to access the infrastructure, this would be different in a Hadoop cluster.

The objectives are the following:

  • website (should run on an edge node in a docker container) should authenticate users by kerberos auth and propagate identity to server_engine 
  • server_engine should be able to launch YARN containers on behalf of the user who is launching the use case. (should run on an edge node in a docker container)
  • The containers launched through YARN should obviously run on Data nodes, in consequence they can't run in docker container (datanodes don't have docker), they should run on behalf of the user.
  • This gives the ability for the admins to monitor resource usage on the cluster per user per queue.
  • the YARN containers access hdfs resources on behalf of the user as well.
  • ML tasks can take some time, they shouldn't fail right after the users logs out.
  • Kerberos should be used to launch YARN containers in the name of the user who's launching the use case.

Either way if we would have the possibility to impersonate someone it should have the right to impersonate limited users (in a specific group?) on limited access.

Requirements:

  • Hadoop cluster (Hortonworks)
  • Create kerberized YARN applications.
  • Kerberos (delegation / proxy ...)
  • Java dev.
  • (python, node)

The objective is to help the team to finish asap (end of August, beginning of September)

In your proposal : Please tell us more about your experience with Hadoop clusters , other requirements mentioned above and any other relevant experience. Also tell us more on how you would approach this.

 

Project Overview

  • Posted
    August 01, 2018
  • Planned Start
    August 08, 2018
  • Delivery Date
    August 31, 2018
  • Preferred Location
    From anywhere

Client Overview


EXPERTISE REQUIRED

Matching Providers