Industry Consumer Goods and Retail, Media and Advertising, Hi-Tech, Software
Specialization Or Business Function Media and Advertising
Technical Function Data Management, Data Warehousing, Software and Web Development
Technology & Tools Programming Languages and Frameworks (Python)
The Problem:
We currently have some proprietary data that has been collected over a few years that we would like to supplement with other data. We need to build a robust data pipeline to feed relational tables, which will then be used to create additional features for our models. This data would include our own proprietary data currently in .csv/.xlsx format, as well as pulling from online sources like Facebook, Twitter, Instagram, Soundcloud, YouTube, Spotify, Pandora, Hypebeast and other brand marketing sources/eCommerce websites. The goal is to figure out how to contextualize the current audience, hype, size, relevance, and potential future of bands along with the same for brands. We will also need to keep track of the value of both bands as well as brands over time.
Future phases will include working with our data scientists to improve current models using this data, but this phase is focused primarily on creating the pipeline and making it updatable on a daily basis. We are here to help you to understand exactly what data we want from each data source, and to help guide the creative process on other possible sources that could help.
Deliverables: This first phase consists of three basic steps:
1) Create scripts to create the pipeline for each data source using their API (no scraping will be needed, we know that breaks pretty much all the time)
2) Create the relational database to hold all of the data for each source with unique keys to connect all tables (each source will be it's own table)
3) Connect all of the current data to the same database (csv's and xlsx files, small data)
No GUI is necessary, but all the code must be executable and we prefer that it is written in Python. The databases can be whatever databases you are most comfortable building and whichever databases fit this size of data. We want this pipeline collecting data at the frequency of once a day. We will have a budget for hosting the data as it grows, although we would like a small enough sample of the data to know what we are looking at first.
We have some username/passwords that can be used for API's to many of the data sources we are asking for data from, and will always be available for communication to ensure that everything needed is provided. This is in no way a 'silo'ed project, we want to be involved as much as we can so that we can get this pipeline built properly.
Domain expertise in the following areas are would be a plus but is not mandatory.
In your proposal please tell us more about:
After the interview and once you get a better understand of our current status we would like the proposal to include:
Matching Providers