Hi, welcome to my Data Science portfolio.

On this page, I demonstrate my skills in transforming data into social impact using Data Science tools and economic research methods.

You will also find my professional experiences, skills, tools and concepts involving Data Science. Feel free to contact me via the links at the bottom of the page.

ABOUT ME

My name is Nina Menezes Cunha.

I am a lifelong learner with a relentlessly curious mind and proactive problem-solving approach.
My Stanford PhD in Economics of Education fuels my passion for turning complex data into social impact.

Over 10+ years, I've deployed machine learning and causal inference across more than 10 countries, leading large-scale education experiments with 100,000+ students. My technical expertise spans Python (including Pandas, NumPy, Scikit-learn), SQL (BigQuery, PostgreSQL), and cloud platforms (GCP), combined with rigorous econometrics to build solutions from predictive models to AI applications.

As Senior Researcher at FHI 360 and World Bank consultant, I've designed end-to-end data systems - from statistical analysis and natural language processing to interactive dashboards and policy recommendations. I now channel this expertise into Amooora, my startup developing AI-powered solutions for the lesbian community, enhanced by my recent completion of Le Wagon's Data Science Bootcamp.

When I'm not building data solutions, you'll find me biking coastal trails 🌊, enjoying the beach 🏖️, or catching the latest films 🎬. An old soul at heart, I treasure quiet evenings with tea 🫖—whether playing trumpet or percussion 🎺🥁 to jazz and Brazilian popular music, or turning in early 🌙. Proudly Brazilian 🇧🇷 and openly lesbian 🏳️‍🌈, I thrive where culture, nature, and community intersect.

PROFESSIONAL EXPERIENCE

AMOOOORA - Founder/CEO (Dec 2024 - Present | São Paulo, Brazil)
• Developing a data-driven app for the lesbian community using ML techniques to optimize content delivery and engagement
• Implementing predictive analytics for user retention strategies and community-building initiatives
• Leveraging data-driven insights to improve platform accessibility and tailor content
FHI 360 - Senior Research Associate (May 2018 - Aug 2023 | Washington, DC)
• Led impact evaluations using causal inference methods (DID, IV, synthetic control) across Ghana, Malawi, and Latin America
• Designed psychometric tools using factor analysis to measure teacher well-being in Uganda/Guatemala/El Salvador
• Managed analysis of 50K+ student datasets to drive education policy recommendations
STANFORD UNIVERSITY - Senior Researcher (Jan 2013 - Apr 2018 | Stanford, CA)
Research
• Led RCTs with 25,000+ students using causal inference and machine learning
• Designed behavioral nudges that improved student attendance and test scores
• Published peer-reviewed research on teacher effectiveness interventions
Teaching
• Co-led a yearlong seminar on Topics in Brazilian Education (2013-2015), designing course content and facilitating student engagement
• Used diverse teaching strategies, including student presentations, guest speakers, and policy development exercises
• Assisted in a flipped classroom statistics course (2014-2015) for Master's students, coaching them in applying statistical models to research questions
WORLD BANK - Consultant (Aug 2015 - Feb 2016 | Ceará, Brazil)
• Designed data collection pipelines for education assessments across 350 schools
• Trained enumerators and ensured data integrity for large observational studies
MOVVA - Consultant (Feb 2015 - Dec 2016 | São Paulo, Brazil)
• Supervised data collection and analysis for 400 public schools (30,000 students)
• Developed data visualization dashboards for policy stakeholders
FEDERAL UNIVERSITY OF MINAS GERAIS - Research Assistant (Feb 2010 - Apr 2012 | Minas Gerais, Brazil)
• Conducted econometric modeling using longitudinal data of 3,500 students
• Collaborated on education policy research projects

EDUCATION

LE WAGON - Data Science Bootcamp (Jan 2025 - Mar 2025)
• Data Science & ML: Analyzed large datasets (SQL, Python, BigQuery), built statistical and ML models (classification, NLP, deep learning), and deployed them (GCP, Docker, FastAPI).
• Project Leadership: Led a team project from data engineering to ML pipelines, delivering actionable business insights.
STANFORD UNIVERSITY - Ph.D. in Economics of Education (Sep 2012 - Apr 2018)
• Large-Scale Education Experiments: Designed and executed 3 RCTs (A/B tests across 289 schools, 25,000+ students) using causal inference methods (regression analysis, behavioral nudges).
• Data Science for Education Policy: Transformed statistical insights into actionable ed-tech solutions adopted by government partners.
FEDERAL UNIVERSITY OF MINAS GERAIS - M.A. in Economics (Jan 2010 - Apr 2012)
• Religion & Education Causal Analysis: Applied machine learning techniques (quantile regression, OLS) to Brazilian longitudinal youth data, revealing how religious socialization improves academic performance.
SÃO PAULO SCHOOL OF ECONOMICS (FGV) - B.A. in Economics (Jan 2006 - Dec 2009)
• Economics of Education Research: Programmed large-scale data analysis in Stata using machine learning techniques (fixed-effects models, instrumental variables) on Brazilian education panel data (1992-2007).

TECHNICAL SKILLS

Data Engineering & Processing

Python: Pandas, NumPy, BeautifulSoup, Requests
SQL: PostgreSQL, BigQuery
Data Storage: SQLite, MySQL
Data Pipelines: Prefect, FastAPI

Statistics & Visualization

Statistical Analysis: Regression, A/B Testing, Causal Inference
Data Visualization: Matplotlib, Seaborn, Plotly
Dashboards: Streamlit
Business Intelligence: Data Storytelling

Machine Learning & AI

ML & AI Models: Classification, Regression, Clustering, Time Series, NLP, Multi-Agent Systems
Feature Engineering: Dimensionality Reduction, Class Imbalance Handling, Feature Selection, Model Tuning
Performance & Explainability: AI Metrics, Model Interpretability (SHAP, Lime, Attention Mechanisms)
Frameworks: Scikit-learn, XGBoost, TensorFlow, Keras, PyTorch, Hugging Face, Transformers, RNNs

Cloud & Deployment

Cloud Computing: Google Cloud Platform, Compute Engine, Cloud Storage
MLOps: MLflow, Docker, CI/CD
Version Control: Git, GitHub, GitLab
APIs & Deployment: FastAPI, Streamlit, Cloud-based AI Solutions

CERTIFICATIONS

2024-December – Google Advanced Data Analytics Professional Certificate
Google via Coursera | Credential ID: 98C47QXOLHBA
2023-October – Applied Machine Learning in Python
University of Michigan via Coursera | Credential ID: D8NC5S5AK5ZQ
2023-October – Introduction to Computer Science with Python Part II
University of São Paulo via Coursera (Portuguese) | Credential ID: T6DRTCP8AMDD
2023-September – Applied Plotting, Charting & Data Representation in Python
University of Michigan via Coursera | Credential ID: D8NC5S5AK5ZQ
2023-September – Introduction to Data Science in Python
University of Michigan via Coursera | Credential ID: 8TMUB39YBDTR
2023-September – Introduction to Computer Science with Python Part I
University of São Paulo via Coursera (Portuguese) | Credential ID: VGRZNWMSK5GJ
2021-May – Categorical Structural Equation Modeling
Statistical Horizons
2021-May – Applied Measurement Modeling
CenterStat by Curran-Bauer Analytics
2021-May – Introduction to Structural Equation Modeling
CenterStat by Curran-Bauer Analytics

SOFT SKILLS

Analytical & Problem-Solving

Analytical Thinking
Critical Thinking
Proactive Problem-Solving
Data Storytelling

Leadership & Collaboration

Leadership & Mentorship
Collaboration & Teamwork
Cross-Cultural Collaboration
Stakeholder Management

Adaptability & Growth

Resilience & Growth Mindset
Adaptability
Lifelong Learning
Multidisciplinary Agility

Languages

English: Fluent
Portuguese: Native
Spanish: Advanced
French: Basic

FEATURED DATA SCIENCE PROJECTS

Amooora Connection Algorithm

March 2025

Developed a next-generation matching system for LGBTQ+ women and non-binary individuals using deep learning and natural language processing. Leveraged an OkCupid dataset of 24,000+ profiles to build a three-pillar solution: (1) A density-based DBSCAN clustering model that identifies organic communities with 32% better cohesion than traditional approaches, (2) An optimized text processing pipeline using LDA topic modeling to extract meaningful connection signals from open-ended responses, and (3) A synthetic image generation system (proof-of-concept) for UI prototyping. The final algorithm prioritizes authentic connections over demographic filters, achieving a 0.51 silhouette score while intentionally breaking conventional matching boundaries to foster unexpected but meaningful relationships.

This project demonstrates how machine learning can create more inclusive social platforms by challenging traditional matching paradigms. Key innovations include our hybrid approach combining DBSCAN's density-based clustering with LDA topic modeling for text reduction, and the ethical decision to exclude gender/orientation filters after quantitative analysis showed they created artificial barriers. The system serves as both a technical foundation for Amooora's future platform and a case study in building connection algorithms that prioritize community belonging over categorical matching. Implemented as an interactive Streamlit demo showcasing how data science can drive social impact.

Methods & Tools:

Clustering algorithms (DBSCAN, K-Means comparison).

Natural Language Processing (LDA topic modeling, BERT embeddings).

Model evaluation (silhouette scoring, cluster validation).

Python (TensorFlow, Scikit-learn, Gensim, NLTK, spaCy).

Google Cloud Platform (Compute Engine, Cloud Storage).

Containerization (Docker, Docker-compose).

API development (FastAPI).

Interactive dashboards (Streamlit).

Computer vision (OpenCV, Keras for synthetic images).

Parental Monitoring & Student Outcomes

Accepeted at American Economic Journal: Economic Policy (2025)

This large-scale randomized experiment studied how information interventions affect parental monitoring and student achievement across 289 Brazilian schools (25,000+ students). Using A/B testing methodology, we compared two treatment arms: (1) An information group receiving weekly SMS updates with child-specific attendance/effort data, and (2) A salience group receiving attention-redirecting messages without personalized data. Our causal inference analysis revealed both interventions improved test scores by 0.3 standard deviations, despite only the information group developing accurate beliefs about attendance levels.

The study employed machine learning techniques to analyze monitoring patterns from parent surveys and administrative data. Regression analysis showed both treatments increased parental monitoring intensity, with feature importance analysis identifying specific behavioral changes driving outcomes. An additional experiment using message frequency randomization demonstrated parents optimize monitoring effort under attentional constraints. Results inform predictive model development for educational interventions targeting parental engagement.

Methods & Tools:

Randomized controlled trial (RCT) design.

Causal inference (DID, IV regression).

Large-scale data collection (289 schools).

Text message intervention system.

Statistical modeling (OLS, logistic regression).

Feature selection for behavioral predictors.

Performance metric analysis (test scores, promotion rates).

Python and R for data analysis.

Developing a New Tool for International Youth Programs

Peer-Reviewed Publications | 2021-2023

Developed a machine learning-powered assessment tool to measure social-emotional skills in 1,794+ youth across Uganda and Guatemala. Using dimensionality reduction techniques (PCA and factor analysis), we transformed 160+ initial survey questions into a validated 48-item instrument measuring four core competencies: Positive/Negative Self-Concept, Higher-Order Thinking, and Social-Communication Skills.

Our multi-stage validation pipeline included: (1) Exploratory Factor Analysis to identify latent constructs from high-dimensional survey data, (2) Confirmatory Factor Analysis to test measurement models, and (3) Multi-Group Invariance Testing demonstrating cross-cultural validity (CFI > 0.95 across all subgroups). The system achieved strong measurement invariance (ΔCFI < 0.01) across country, gender, and socioeconomic status - enabling reliable program evaluation in diverse low-resource settings. The instrument is publicly available in English and Spanish for use and adaptation, with full documentation provided in the development paper linked below.

Methods & Tools:

Dimensionality reduction (PCA, EFA, CFA).

Measurement invariance testing (multi-group CFA).

Psychometric validation pipelines.

Stata for data cleaning and preparation.

R (lavaan, psych packages) for factor analysis.

Mplus for advanced structural equation modeling.

Cross-cultural validation frameworks.

Survey data quality control systems.

Teacher Wellbeing Measurement & Intervention

Peer-Reviewed Publications | 2021-2024

Developed and validated a machine learning-powered assessment tool (WHAT) to measure teacher wellbeing in conflict-affected areas, using dimensionality reduction techniques (PCA/EFA) on survey data from 1,659 Salvadoran educators. Our factor analysis pipeline identified key wellbeing constructs with strong psychometric properties (CFI = 0.92, RMSEA = 0.04), enabling precise measurement in high-stress environments.

In the cluster-randomized controlled trial (N=430 treatment, 398 control), we applied causal inference methods to evaluate a social-emotional learning intervention. Despite null effects on most outcomes, our mixed-effects modeling revealed important insights about intervention delivery modes and teacher stress patterns. The system demonstrated strong measurement invariance across diverse educator populations.

Methods & Tools:

Dimensionality reduction (PCA, EFA, CFA).

A/B testing framework (cluster-RCT design).

Causal inference (difference-in-differences).

Psychometric validation pipelines.

Mixed-methods analysis (quant + qualitative).

Stata/R for statistical modeling.

Measurement invariance testing.

Survey data quality control systems.

Educational Resource Equity Analysis

Peer-Reviewed Publication | 2021

Developed a novel methodological framework to quantify and compare educational resource allocation equity across 53,469 Brazilian public schools (30% of national coverage). Using SAEB/Prova Brasil 2015 data, we standardized resources into three dimensions: teacher quality, school physical environment, and instructional environment, then contrasted allocations between high- and low-needs schools via multidimensional disparity indices.

Our outputs-driven approach identified systemic inequities, with high-needs schools receiving 15-30% fewer resources per student despite greater need. The framework's adaptability allows subnational comparisons (e.g., Northeast vs. Southeast Brazil) and integration with international datasets for cross-country equity benchmarking.

Methods & Tools:

Large-scale data integration (SAEB/Prova Brasil census).

Resource standardization frameworks (3-dimension model).

Equity metrics (Gini coefficients, disparity indices).

Geospatial analysis (regional comparisons).

Statistical modeling (OLS, quantile regression).

Policy impact simulation.

Stata/R for data processing.

Data visualization (equity dashboards).

Research Paper

Ceará Teacher Effectiveness Program

Peer-Reviewed Publication 2018

This large-scale randomized controlled trial demonstrated that a low-cost coaching program (delivered via Skype at $2.40/student) significantly improved teaching practices across 350 public schools in Ceará, Brazil. Our causal inference analysis showed the intervention increased teachers' instructional time by 28% and boosted student engagement by 0.4 standard deviations, with particularly strong effects in math and Portuguese.

The program targeted classroom practice malleability through peer collaboration, addressing research showing wide within-school teacher quality variation. Using mixed-effects regression modeling, we found the virtual coaching model overcame traditional barriers to professional development in low-resource settings. The state government is now scaling this evidence-based program statewide based on our findings.

Methods & Tools:

Cluster-randomized controlled trial (350 schools).

Causal inference (multilevel modeling).

Cost-effectiveness analysis.

Classroom observation data processing.

Stata and R for statistical analysis.

Power calculations for field experiments.

Implementation fidelity tracking.

Scalability assessment framework.

Research Paper

PUBLICATIONS

Bettinger, E.; Cunha, N. M.; Lichand, G; Madeira, R. (2024). Are the effects of informational interventions driven by salience? American Economic Journal: Economic Policy.
Soares, F.; Cunha, N.M. (2024). The effects of adding social-emotional learning to a comprehensive education intervention in El Salvador on teacher well-being: a mixed methods evaluation. Educational Research and Evaluation, 29(3–4), 201–229.
Omoeva, C., Cunha, N. M., Kyllonen, P., Gates, S., Martinez, A., & Burke, H. M. (2023). Developing a new tool for international youth programs: The YouthPower Action Youth Soft Skills Assessment (YAYSSA). European Journal of Psychological Assessment.
Cunha N.M.; Martinez A., Kyllonen P.; Gates S. (2021). Cross-Country Comparability of a Social-Emotional Skills Assessment Designed for Youth in Low-Resource Environments. International Journal of Testing, 21:3-4, 182-219.
Lichand, G; Bettinger, E.; Cunha, N. M.; Madeira, R. (2021). The Psychological Effects of Poverty on Investments in Children’s Human Capital. University of Zurich, Department of Economics, Working Paper No. 349.
Omoeva C., Cunha N. M.; Moussa W. (2021). Measuring equity in education resource allocation: An output-based approach. International Journal of Educational Development, v.87.
Soares, F.; Cunha, N. M.; Frisoli, P. (2021). How do we know if teachers are well? The wellbeing holistic assessment for teachers tool. Journal on Education in Emergencies, 7 (2).
Bruns, B.; Costa, L.; Cunha, N. M. (2018). Through the looking glass: can classroom observation and coaching improve teacher performance in Brazil? Economics of Education Review.
Cunha, N. M.; Rios Neto, E. L. G.; Hermeto, A. M (2014). Religiosity and school performance: a study case of Brazilian youth in the metropolitan region of Belo Horizonte. Pesquisa e Planejamento Econômico (Rio de Janeiro), v. 44, p. 71-116.

CONTACT

Feel free to contact me in case of questions about my projects, data science opportunities and any other reason you think is relevant ;)

ninamcunha@gmail.com
LinkedIn
Github