Profile

Hello, I’m Cynthia Song

A Data Professional

About Me

Hello!

I'm a data professional with over 7 years of experience specializing in programming and statistics.

I have a solid background and a profound understanding of machine learning algorithms.

Besides dealing with numbers, I also love ploting beautiful visualizations.

Recently, I learned full stack web development and got fascinated by it.

Even if this site was built by my own, I'm acutally a team player too.

Like Helen Keller said: "Alone we can do so little; together we can do so much."

In my spare time, I like drawing, cooking and traveling.

Skills

  • Python
  • R
  • SQL
  • Microsoft Excel
  • Tableau
  • Git
  • scikit-learn
  • numpy
  • pandas
  • nltk
  • Spark
  • HTML/CSS
  • JavaScript
  • Node.js
  • React
  • GCP/AWS
  • Docker
  • Kubernetes

I'm Available For Hire

“You can dream, create, design and build the most wonderful place in the world…but it requires people to make the dream a reality.”
- Walt Disney

Data Scientist

I'm professional at data preprocessing and machine learning algorithms. I have hands on experience with random forest, gradient boosting, support vector machine, k-meams clustering, logistic regression and natural language processing.

Data Engieer

I'm a ninja that could combine data and engineering together. I'm experienced in Google big query, Amazon aws and Quoble database.

Full Stack Developer

With the web development bootcamp taught by Dr.Angela Yu, I had two months intensive training, I mastered HTML, CSS, JavaScript, Node.js, MongoDB and React. Besides after working with several full stack projects, I know docker and kubernetes for industry level production.



Experiences

Work

2021 - Present

Data Scientist / Full Stack Engineer – Box

Strategic Decisions Platforms & Analyses Project

  • The main goal of my work is to help Box transfer data from data center to public cloud.
  • Our team's purpose is to build a real time system that monitor the cost for each public cloud of each engineering team.
2020

Data Engineer – Gallup

World Poll Survey Project

  • Emphasize on transferring traditional world poll survey system into a more efficient and modern system.
  • Participate in implementing front-end user friendly interfaces using Python, ReactJS and JavaScript and building connections using API to the back-end databases.
  • This project will significantly shorten the survey processing time from 6 months to 48 hours.

Artificial Social Intelligence for Successful Teams Project

  • Clean and pre-process data into a much more concise and efficient way from json format to csv format.
  • The data is from virtual reality game and will be further used to study and predict the behavior of players.
2019

Data Science Intern – Digitas North America

Samsung Mobile Project

  • Applied SQL based on Google big cloud platform and implement Python to self-extract tables from complicated databases.
  • Calculate the migration rate of people switching from their current phones to the latest Samsung phone model.
  • Operated R to test statistical significance and visualize results and found associating optimal number of advertisements to serve audience for each phone group.

Macy’s Beauty Project

  • Collaborate with other interns from different fields to form a business strategy to promote Macy's beauty sales toward Gen Z customers.
  • Excel was utilized as the primary tool to analyze survey results. A/B testing skills were applied to measure the success of metrics we created.
  • Our ultimate research findings yielded a proposed strategy to launch influencer-customized beauty boxes.
2015 - 2017

Research Assistant – Stony Brook University

Scientific Research Project

  • Applied statistical methods like regression analysis to analyze large datasets. Operated Matlab to visualize results.
  • Water vapor prediction is found to be crucial for cyclone center pressure prediction accuracy.
  • Thesis publication link

Projects

2019

IST 718 - Big Data Analytics Class Project

FIFA19 Players Analysis Project

  • Perform machine learning algorithms in Python Spark environment following pipelines procedure.
  • Predict FIFA football players overall score based on his skills and performance using algorithms such as Linear Regression, Random Forest Regressor, and Gradient Boosting Regressor.
  • Operate k-means cluster to group players with similarity.
2019

IST 707 - Data Analytics Class Project

Airbnb Hosts Review Score Prediction Project

  • Operate machine learning algorithms in Python to predict Airbnb hosts' review score.
  • Specific machine learning models include Linear Regression, Random Forest Regressor, and Gradient Boosting Regressor.
  • Grid search and cross validation techniques were applied to tune and analyze models.
2020

IST 736 - Text Mining Class Project

Trump's Tweets Analysis Project

  • Utilized Tweepy to scrape Trump's recent tweets.
  • Pre-processing procedures include remove stop words but keep negation words, filter out non-alphabetical characters, lower the case and remove empty tweets.
  • In this project, nltk packages, regular expression and feature engineering like count vectorizer, boolean vectorizer and TF-IDF are implemented.
  • Support Vector Machine is applied for sentiment analysis and LDA(Latent Dirichlet Allocation) is operated for topic modeling.

Education

2020

Syracuse University

Master of Science | Major: Applied Data Science | GPA: 3.97

2017

Stony Brook University

Master of Science | Major: Atmospheric Sciences | GPA: 3.5

2014

Nanjing University of Information Science and Technology

Bachelor of Science | Major: Atmospheric Sciences | GPA: 3.63


Projects Demo


“As practice makes perfect, I can not but make progress; each drawing one makes, each study one paints, is a step forward.”
- Vincent van Gogh

Airbnb

IST 707 - Data Analytics Course Project

Airbnb Hosts Review Scores Prediction.

Tags:
Machine Learning, Random Forest, Gradient Boosting, Linear Regression

View More
Fifa

IST 718 - Big Data Analytics Course Project

FIFA19 Players Analysis.

Tags:
Spark, Machine Learning, Regression, Clustering

View More
Trump

IST 736 - Text Mining Course Project

Trump Tweets Analysis.

Tags:
Text Mining, NLP, TF-idf, Count Vectorizer, Topic Modeling, LDA

View More
Twitter

IST 664 - Natural Language Processing Course Project

Text Mining Summary

Tags:
Natural Language Processing, Text Mining

View More

Contact