Industry Real Estate, Software
Specialization Or Business Function R&D (Performance Analysis, Design Optimization)
Technical Function Analytics (What if/Scenario Analysis, Data Mining, Real-time Analytics, Machine Learning, Time Series Analysis, Descriptive Analysis, Spatial Analysis, Location Analytics)
Technology & Tools Business Intelligence and Visualization, Big Data and Cloud (Amazon Elastic MapReduce, MySQL, Amazon Kinesis, Elasticsearch, AWS Elastic Beanstalk, AWS Identity & Access Mgmt (IAM), Amazon EC2, Amazon Web Services, Linux), Data Analysis and AI Tools, Programming Languages and Frameworks (Scala, Go, R, Python, PySpark), Mapping and GIS (Mapbox)
About Us:
We provide rich analytics for customer workplace optimization. The Company’s SaaS offering uses AI-powered analytics to enable customers to:
Our SaaS product helps migrate clients away from the historically time-consuming, and often error-prone, manual data gathering and analysis required for workplace optimization metrics such as employee attendance and real estate total cost of occupancy (“TCO”).
The platform:
1.Aggregates and leverages real time office use data that already exists
2.Harmonizes multiple overlapping and complementary data sources using state of the art data science (including machine learning algorithms and advanced statistical methods)
3.Produces robust analytics and actionable insights:
We have targeted its data analytics platform at Fortune 500 & Global 2000 corporate tenants with many knowledge workers and large amounts of leased or owner-occupied office space, such as Cisco, ExxonMobil, Lenovo, DELL EMC, BP, Abbvie, Comcast, T. Rowe Price, Hilton and Uber, among others.
The problem:
Upgrade our existing manually run pipeline which consists of a family of Jupyter notebooks and Python scripts to an automated pipeline that runs (CRON?) when triggered by a green flag from the ETL process. ETL is being developed by a team of Scala developers to deliver raw data for processing in Parquet
This process is asynchronous with our SaaS product, which is a FE-BE dyad used by customers. Our SaaS product queries the results output from the pipeline, which are stored in MySQL and ElasticSearch.
We need a senior data scientist to work with our team of talented, but junior, data scientists to develop an automated, optimized production pipeline and create new output tables for a future re-build of the FE-BE SaaS.
Expertise Needed: Python, Spark, AWS, Jupyter, MySQL, ElasticSearch, Parquet.
Data Sources: post-ETL, all raw data will be in Parquet. Some lookups might be needed to MySQL.
Current Technology Stack: Python, Spark, AWS, Jupyter, MySQL, ElasticSearch, Parquet
Deliverable: Automated pipeline, sys ops automation on AWS, code base in github, documentation in-code and in Confluence.
In your proposal please tell us more about:
Matching Providers