Teaching
I am currently affiliate as a Scholar at Economics Department, University of California, Berkeley.
I am also an Instructor for Data Science Class (for executives) at the [Awesome] General Assembly in San Francisco!
Data Science (Presentations avaliable for students - here)
Course Information:
Data Science (DAT8) @ the General Assembly, San Francisco.
Hours: Tues/Thurs 6.30-9.30 pm
Locations: 501 Folsom St., San Francisco.
Course's Website: Click Here
Course's Repo: Click Here
Topics:
Lecture 1 / Week 1 - Introduction and Overview
Lecture 2 / Week 2 - Introduction to Statistics
Lecture 3 / Week 2 - Classic Linear Regression Model
- Method of Moments (MOM)
- OLS Estimation Method
- Goodness-of-Fit
- Non-Linear Relationship
- Transform a Nonlinear Relationship into a Linear Model
- Accuracy
- Heteroskedasticity
- Multicollinearity
Lecture 4 / Week 3 - Hypothesis Testing
(Decision rule, Critical Values, P-values, Confidence Intervals, F-Statistic, etc.)
Lecture 5 / Week 3 - Introduction to Choice Modeling
- Logit
- Probit
- Mix-Logit
- Nested Logit
- Random Coefficient (Extra Readings - here)
Lecture 6 / Week 4 - Introduction to Python (python, ipython, git)
Lecture 7 / Week 4 - Introduction to Python’s libraries for Data Science
Lecture 8 / Week 5 - Time Series Analysis + Identification Strategy (IV2SLS)
Lecture 9 / Week 5 - SQL and Relational Theory + Project Pitch in class (5 minutes each)
Lecture 10 / Week 6 - APIs and semi-structured data
- Structuring Data (Vectorizing + TF-IDF)
Lecture 11 / Week 6 - Midterm
Lecture 12 / Week 7 - Regularization
Lecture 13 / Week 7 - Principal Components Analysis
Lecture 14 / Week 8 - Clustering: Hierarchical and K-Means
Lecture 15 / Week 8 - Identification Strategy / Peer Review (Speed Dating Style)
Lecture 16 / Week 9 - Googler Guest Speaker / TBD
Lecture 17 / Week 9 - Grid Search and Parameter Selection
Lecture 18 / Week 10 - IPython.parallel & StarCluster + AWS (Amazon Web Services)
Lecture 19 / Week 10 - Hadoop Distributed File System and Streaming (Map Reduce)
Lecture 20 / Week 11 - Open Office Hours
Lecture 21 / Week 11 - Present white paper
Lecture 22 / Week 12 - Last Lecture - Where to go?
Books (recomended, avaliable for students here):
- Statistics/Econometrics: Introductory Econometrics: A Modern Approach, by Jeffrey M. Wooldridge
- Doing Data Science, by O'Reilly Media
- Python: Python for Data Analysis - Data Wrangling with Pandas, NumPy, and IPython, by O'Reilly Media
- Choice Modeling and Simulation (Markov Chain Monte Carlo Simulation):
Discrete Choice Methods with Simulation, Kenneth Train (UC Berkeley)
Seminars and Talks:
Choice Modeling for Executives
Choice modelling attempts to model the decision process of an individual or segment in a particular context. Choice modelling may be used to estimate non-market environmental benefits and costs.
In addition Choice Modelling is regarded as the most suitable method for estimating consumers’ willingness to pay for quality improvements in multiple dimensions. The Nobel Prize for economics was awarded to a principal proponent of the Choice Modelling theory, Daniel McFadden.[3]
Our goal is to understand the behavioral process that leads to the agent’s choice, by taking a causal perspective. There are factors that collectively determine, or cause, the agent’s choice. Some of these factors are observed by the researcher and some are not.
** Please contact me in order to get the full presentation **
Time Series Analysis for Executives
Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing values of a single time series at different points in time.
** Please contact me in order to get the full presentation **
Data Science (General Assemb.ly)
Given the prevalence of technologies and the amount of data available in the online world about users, products, and the content that we generate, businesses can be making so much more well-informed decisions if this vast amount of data was more deeply analyzed through the use of data science. The data science course provides the tools, methods, and practical experience to enable you to make accurate predictions about data, which ultimately leads to better decision-making in business, and the use of smarter technology (think recommendation systems or targeted ads).
This course will provide you with technical skills in machine learning, algorithms, and data modeling which will allow you to make accurate predictions about your data. You will be creating your models using R and Python so you will gain a good grasp of these two programming languages. Furthermore, you will learn how to parse and clean your data which can take up to 70% of your time as a data scientist.