Data Acquisition

Public Data

Private Data

Manual Data Acquisition

Application Programming Interfaces (APIs):


  • APIs are built around the HTTP Request/Response Cycle
  • Many APIs request that users sign up and obtain an API key that uniquely identifies them and records of all a user’s requests

The API process

💻 Client

Request

⚙ API

Search

🗄 Server

Response

Ethics — With all data acquisition methods, there are ethical considerations that are important to make:


  • Who owns the data uploaded to a website by users?
  • When and how should users of services be notified that data about them is being acquired?
  • What kinds of data should be restricted from being acquired about users?
  • How can users protect their privacy and know when it has been breached?

Data acquisition should be:

  • Fair
  • Transparent
  • Respectful

Making an API request


Reference: Data Acquisition Methods | Codeacademy

1) Import requests library

2) Use .get() method to return the data from the desired URL

3) Use .json method to access decoded JSON data as a Python object

4) Import csv library

5) Use .writerows() method to convert JSON to CSV

6) Import pandas library and use the .read_csv() function to read the CSV data into a dataframe object

Rules of scraping:

  • Check the website's T&Cs
  • Do NOT spam the website with a ton of requests
  • Make one request to one webpage per second

Web scraping steps

3) Import BeautifulSoup from bs4

4) Use .content function to turn the website into a BeautifulSoup object

5) To retrieve the relevant info, you can use:


  • Tags
  • .find_all()
  • .select()
  • .get_text()
  • .from_dict() from pandas