Big Data Architects should have extensive experience with all aspects of building end-to-end Data platforms and architecture. This should include data ingestion/integration, data storage, data transformation, data processing, data deployment, data operations, and data cataloging. The experts in this TalentCloud should be able to design and help big data developers develop a platform capable of executing operational, analytic, and predictive workloads that serve thousands of applications and support machine learning deployment and inferencing.
Required Skills
- Extensive experience as a data architect, data engineer, database internals, and building data-intensive architectures and applications
- Deep understanding of distributed systems and distributed databases
- Extensive experience with ETL, Batch processing, and stream processing
- Deep expertise with frameworks like Spark and Kafka and ecosystems around them
- Deep understanding of Big Data Ingestion/Integration/Storage/Processing, transformation/ETL tools and technologies, and understanding of related concepts (such as data cataloging and curation, etc.)
- Track record of implementing Big Data solutions with large enterprises and start-ups
- Deep Knowledge of foundation infrastructure requirements such as Networking, Storage, and Hardware Optimization with Hands-on experience with Amazon Web Services (AWS)
- Design and Implementation and tuning experience in the Big Data Ecosystem, (such as Hadoop, Spark, Presto, Hive), Database (such as Oracle, MySQL, PostgreSQL, MS SQL Server), and Data Warehouses (such as Redshift, Teradata, Vertica)
- Experience with one or more SQL Engines on large data – Presto, Impala, Dremio, SparkSQL
- Good understanding of Data Governance – encompassing – Data Catalogs, Data AUditing, Lineage, Metadata and Master data management
- Experience with DataOps processes and tools
- Programming experience with one or more – Java, Scala, Python
- Deep experience to ensure Non-functional requirements like (NFRs) on the platform like – scalability, performance, availability, reliability, fault-tolerance
Preferred Skills
- Experience with Data Warehousing, Data Modeling, Data Marts, Data Virtualization, MPP based architectures like Redshift, Vertica, BigQuery, etc.
- Deep wide-scale Experience with relational databases like – Postgres, MySQL
- NoSQL Databases architectures and data modeling with one or more of the main types of NoSQL databases
- Key-Value data stores – Redis, DynamoDB, Riak
- Document databases – MongoDB, CouchDB, Couchbase
- Graph Databases – Neo4J
- Wide column databases – Cassandra, HBase, Scylla
- Time Series databases – InfluxDB, TimeScale
- Search engines and databases – Elastic Search, Solr