- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
The ability to locate and acquire important data is a valuable skill for doing data analysis and data science. We’ll explore many sources and repositories for valuable data acquisition such as open government and university datasets. We’ll also explore popular social APIs (e.g., Facebook, Spotify, Twitter) and domain-specific APIs (e.g., healthcare, news, science and math) that store a wealth of data. Further, we’ll discuss methods to query web servers, and request and parse data to extract the information you need. We’ll also explore scraping various types of data from websites and how to read and extract text from documents (e.g., PDF, Word) along with methods to clean and store sourced and scraped data.
Course Code/Duration:
BDT120 / 1 Day
Learning Objectives:
After this course, you will be able to:
- Explore Variety of Public Data Repositories
- Understand Effective Means to Search for Valuable Data
- Use the Python Programming Language to Source and Scrape Data
- Use Popular Social and Domain-specific APIs to Access Data (e.g., Slack)
- Extract Text from Documents (e.g., data in PDFs, Word)
- Access PDF Tables
- Scrape Data from Web Pages
- Clean Scraped Data
- Store Sourced and Scraped Data
- Basic Python Programming
- Anyone interested in working with Data
- Anyone interested in working with Data
Course Outline:
Overview of Data Sourcing
- Public Open Datasets
- Government Data
- University Data
Milestone 1: Explore public data repositories
Introduction to the Python Programming Language
- Installing Anaconda
Milestone 2: Learn how to use Jupyter Notebooks
Using Public APIs (Application Programming Interfaces)
- Explore Popular and Domain-specific APIs
- Common Conventions
- Parsing JSON
Milestone 3: Access a public API (e.g., Facebook, Twitter, Google)
- Extracting Text from Documents
Milestone 4: Extract data from PDFs
Overview of Data Scraping
- Introduction to BeautifulSoup
- Parsing HTML and Javascript
Milestone 5: Scrape data from a website
Cleaning Scraped Data
Storing Sourced and Scraped Data
Conclusion: Next steps
Structured Activity/Exercises/Case Studies:
- Milestone 1: Explore public data repositories
- Milestone 2: Learn how to use Jupyter Notebooks
- Milestone 3: Access a public API (e.g., Facebook, Twitter, Google)
- Milestone 4: Extract data from PDFs
- Milestone 5: Scrape data from a website
Training material provided:
Yes (Digital format)