Industry Hi-Tech, Software
Specialization Or Business Function
Technical Function Devops
Technology & Tools Big Data and Cloud (Hortonworks, Apache Hadoop, Hadoop YARN), Programming Languages and Frameworks (Java), DevOps Tools (Docker)
About Us:
We are a web-hosted platform, which does fully automated machine learning with cutting-edge state of the art performance, which can be deployed either on the cloud or on-premise. Once training datasets are connected to the platform, absolutely no human action is needed in order to get available models in production and get new predictions from incoming data. We fully automate regression, classification, multi-classification, segmentation, and time-series forecasting tasks.
When a user launches a use case, it starts a complex workflow composed of multiple tasks: data acquisition, dataset analysis, feature engineering, hyper-optimization, and modeling, blending...
We use essentially docker. You connect to a web interface, launch machine learning use cases and see all the data science workflow being done automatically. We use Kubernetes for cluster deployment in the cloud and deployment on a single VM with docker for on-premise.
We take advantage of clusters (kubernetes) to deploy these containers.
Our solution is composed of two types of services:
- Long running services running as docker containers
- Short living tasks: running as jobs scheduled by docker/kube
Long Running Services:
- website: Our web interface where users can interact with our platform either from the website or automatically through the APIs (nodejs)
- server_engine: service in charge of starting the automated machine learning jobs using the docker/kube client API (python)
Objective:
There is a high demand of deploying on premise on Hadoop clusters (Hortonworks). The objective is to develop the service that launches jobs with YARN.
Users access our platform through the website or the API. When they launch use cases, a microservice (called server_engine in python) launches the different tasks either through the docker daemon to launch containers or the Kubernetes API to launch pods. Users never get to access the infrastructure, this would be different in a Hadoop cluster.
The objectives are the following:
Either way if we would have the possibility to impersonate someone it should have the right to impersonate limited users (in a specific group?) on limited access.
Requirements:
The objective is to help the team to finish asap (end of August, beginning of September)
In your proposal : Please tell us more about your experience with Hadoop clusters , other requirements mentioned above and any other relevant experience. Also tell us more on how you would approach this.
Matching Providers