Data Acquisition
Public Data
Private Data
Manual Data Acquisition
🤖 Web scraping tools:
References:
Application Programming Interfaces (APIs):
- APIs are built around the HTTP Request/Response Cycle
- Many APIs request that users sign up and obtain an API key that uniquely identifies them and records of all a user’s requests
The API process
💻 Client
Request
⚙ API
Search
🗄 Server
Response
References:
Ethics — With all data acquisition methods, there are ethical considerations that are important to make:
- Who owns the data uploaded to a website by users?
- When and how should users of services be notified that data about them is being acquired?
- What kinds of data should be restricted from being acquired about users?
- How can users protect their privacy and know when it has been breached?
Data acquisition should be:
- Fair
- Transparent
- Respectful
1) Import requests library
2) Use .get() method to return the data from the desired URL
3) Use .json method to access decoded JSON data as a Python object
4) Import csv library
5) Use .writerows() method to convert JSON to CSV
6) Import pandas library and use the .read_csv() function to read the CSV data into a dataframe object
Rules of scraping:
- Check the website's T&Cs
- Do NOT spam the website with a ton of requests
- Make one request to one webpage per second
Web scraping steps
3) Import BeautifulSoup from bs4
4) Use .content function to turn the website into a BeautifulSoup object
5) To retrieve the relevant info, you can use:
- Tags
- .find_all()
- .select()
- .get_text()
- .from_dict() from pandas