Job Description: Develop production-quality software for the processing and analysis of data: Ensure data quality in the pipeline form data acquisitions to the model. Work in a distributed -memory multiprocessing environment. Process, cleanse, and verify the integrity of data used for analysis Evaluate new data sources for predictive signals Identify and refine features in data that are used for modeling Contribute to our understanding of Data Parse, normalize and understand historical data Define metrics and build and analyzed dashboards, reports and key datasets to derive data-informed Aid in the design of real-time data acquisition systems Close collaboration with data engineers, software engineers, UI/UX designers, and business intelligence analysts Participate in data strategy and infrastructure discussions and help define requirements for data structure and data retention Work with large volumes of structured and unstructured data from multiple data sources and design and implement data pipelines to clean and merge these data for research and modeling Coordinate with different functional teams to implement models and monitor outcomes Preferred Skills: Expert programming skills, including but not limited to one or many of the following: a. Python, Pandas, Numpy, Matlab, R, b. Postgres/SQL, c. HPC, and d. Linux/UNIX: command line tools, filesystems, signals, pipes. An interest in working on a wide spectrum of projects, including data quality assurance, data onboarding, and feature detection. Adept at executing every stage of the ML development lifecycle in a business setting; from initial requirements gathering through final model deployment, including iterative measurement and improvement. Demonstrated experience building and deploying AI / Machine Learning solutions, at scale. Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks. Experience manipulating large data sets through statistical software or other methods. Experience working with Big Data Frameworks and languages including but not limited to Hadoop, Pig/Hive, Spark, MapReduce. Understanding of statistics including hypothesis testing, p-values, confidence intervals, regression, classification, and optimization. Strong algorithmic problem-solving skills. Solid experience working with data visualization technologies (including but not limited to Tableau, Qlik, Power BI and others). Superior verbal, visual and written communication skills to educate and work with cross functional teams. Demonstrated ability to drive projects. Ability to communicate effectively and work independently with little supervision to deliver on time quality products. Ability to work in large, collaborative teams to achieve organizational goals, and passionate about building and supporting a fully diverse, inclusive and innovative team culture. Ability to understand and apply Federal, State, or Commission rules, regulations, policies, and procedures relating to data management including but not limited to data security and data compliance (GDPR, HIPAA, FERPA, etc.). Superb ability/willingness to share technical knowledge with others. The ability to naturally explain difficult technical topics to everyone from data scientists to engineers to business partners and leaders. Ability to establish and maintain effective working relationships with others. Accountability, Communication, Empowerment, Flexibility, Integrity, Respect, Teamwork Proven ability to work ethically and with integrity. Team player at all times. Continuously contribute to team performance improvement and collaboration.