Looking for a Data Scientist with strong Python Application Development Experience with experience using Dask & Kubernetes in large application pipeline frameworks.
Role Description
We are looking for someone to help us modify our existing data science pipeline over to Dask. We have started the process of migration but are currently running into issues when utilizing Dask’s internal distributed client and with Kubernetes. Our need is for someone to come in, get an understanding of our codebase, and help us get the pipeline up and running, ultimately in a Kubernetes environment. Our current pipeline is essentially a large ETL pipeline of unstructured text that incorporates various other NLP activities like text cleaning, NER, scoring/ranking, document deduplication, etc. We are migrating the pipeline over to a distributed system to keep up with the massive quantities of unstructured data our system now has access to.
Skills Required