Getting Started
Welcome, Data Analyst/Engineer.
Your mission is to audit the 2023-2025 financial records for TechGadget Inc. This environment simulates a fragmented corporate ecosystem where data is split between a modern API and a legacy warehouse system.
1. Authentication
All requests require a valid X-API-KEY. Without this header, the server will return a 403 Forbidden error.
Header: X-API-KEY Value: Your UUID from the portal
2. Recommended Tools (No-Code Extraction)
Before writing Python scripts, we recommend using a GUI client to explore the data.
Option A: Thunder Client (Recommended)
This is a lightweight VS Code extension. It allows you to test APIs without leaving your coding environment.
- Install the Thunder Client extension in VS Code.
- Click the ⚡ icon on the sidebar and select New Request.
- Set the method to
GETand enter the Base URL.(https://herdataproject.com/learning/api/) - Go to the Headers tab. Add
X-API-KEYin the "Name" column and your UUID in the "Value" column. - Hit Send. Use the "Save to File" option to export the JSON response.
Option B: Postman
The industry standard for API development. Best for saving "Collections" of different endpoints.
- Create a New Request. Under the Headers tab, add
X-API-KEY. - Click Send. To extract data, click the Save Response button to download the JSON output.
Option C: Insomnia
A streamlined tool focused on speed and clean UI. Use this if you want a distraction-free data preview.
3. API Endpoints
Base URL: https://herdataproject.com/learning/api/
GET /orders/
Accesses the master transaction list (1,000,000+ rows). This endpoint uses Cursor Pagination to prevent memory overflow.
| Field | Type | Description |
|---|---|---|
order_id | Integer | Primary Key |
status | String | Shipped, Processing, Cancelled, Pending |
order_date | Date | ISO-8601 Format |
GET /order-details/
Line-item data. Use the ?order_id= filter to see what products are in a specific order.
GET /customers/
The Customer Directory. Returns names, emails, and regional data.
4. The "Missing Data" Rule
To simulate real-world data engineering, Product Costs are restricted and not available via JSON. To calculate profit margins, you must:
- Fetch product names/IDs from the API.
- Write a script to scrape the Legacy Warehouse Inventory.
- Merge the scraped HTML data with your API JSON data in Python/Pandas using
product_idas the join key.
5. Python Implementation
Use this boilerplate to handle the large-scale dataset with pagination logic.
import requests import time
HEADERS = {"X-API-KEY": "YOUR_KEY_HERE"} URL = "https://herdataproject.com/learning/api/"
def fetch_all_records(start_url): current_url = start_url results = []
while current_url:
response = requests.get(current_url, headers=HEADERS)
if response.status_code == 200:
data = response.json()
results.extend(data['results'])
print(f"Retrieved total: {len(results)} records")
# Follow the cursor to the next page
current_url = data.get('next')
# Optional: Add delay to avoid 429 errors
# time.sleep(0.2)
else:
print(f"Error: {response.status_code}")
break
return results
records = fetch_all_records(URL)6. Error Reference
| Code | Meaning | Action |
|---|---|---|
401/403 | Unauthorized/Forbidden | Check if your API Key is active in the portal and passed correctly in the header. |
429 | Rate Limit | You are sending too many requests. Add time.sleep(0.5) to your loop. |
500 | Server Error | Database is likely under heavy load. Check your query parameters and try again. |