Getting Started

Welcome, Data Analyst/Engineer. 

Your mission is to audit the 2023-2025 financial records for TechGadget Inc. This environment simulates a fragmented corporate ecosystem where data is split between a modern API and a legacy warehouse system.

1. Authentication

All requests require a valid X-API-KEY. Without this header, the server will return a 403 Forbidden error.

Header: X-API-KEY Value: Your UUID from the portal

2. Recommended Tools (No-Code Extraction)

Before writing Python scripts, we recommend using a GUI client to explore the data.

Option A: Thunder Client (Recommended)

This is a lightweight VS Code extension. It allows you to test APIs without leaving your coding environment.

  1. Install the Thunder Client extension in VS Code.
  2. Click the ⚡ icon on the sidebar and select New Request.
  3. Set the method to GET and enter the Base URL.(https://herdataproject.com/learning/api/)
  4. Go to the Headers tab. Add X-API-KEY in the "Name" column and your UUID in the "Value" column.
  5. Hit Send. Use the "Save to File" option to export the JSON response.

Option B: Postman

The industry standard for API development. Best for saving "Collections" of different endpoints.

  • Create a New Request. Under the Headers tab, add X-API-KEY.
  • Click Send. To extract data, click the Save Response button to download the JSON output.

Option C: Insomnia

A streamlined tool focused on speed and clean UI. Use this if you want a distraction-free data preview.

3. API Endpoints

Base URL: https://herdataproject.com/learning/api/

GET /orders/

Accesses the master transaction list (1,000,000+ rows). This endpoint uses Cursor Pagination to prevent memory overflow.

FieldTypeDescription
order_idIntegerPrimary Key
statusStringShipped, Processing, Cancelled, Pending
order_dateDateISO-8601 Format

GET /order-details/

Line-item data. Use the ?order_id= filter to see what products are in a specific order.

GET /customers/

The Customer Directory. Returns names, emails, and regional data.

4. The "Missing Data" Rule

To simulate real-world data engineering, Product Costs are restricted and not available via JSON. To calculate profit margins, you must:

  1. Fetch product names/IDs from the API.
  2. Write a script to scrape the Legacy Warehouse Inventory.
  3. Merge the scraped HTML data with your API JSON data in Python/Pandas using product_id as the join key.

5. Python Implementation

Use this boilerplate to handle the large-scale dataset with pagination logic.

import requests import time
HEADERS = {"X-API-KEY": "YOUR_KEY_HERE"} URL = "https://herdataproject.com/learning/api/"
def fetch_all_records(start_url): current_url = start_url results = []
while current_url:
    response = requests.get(current_url, headers=HEADERS)
    if response.status_code == 200:
        data = response.json()
        results.extend(data['results'])
        print(f"Retrieved total: {len(results)} records")
        
        # Follow the cursor to the next page
        current_url = data.get('next')
        
        # Optional: Add delay to avoid 429 errors
        # time.sleep(0.2) 
    else:
        print(f"Error: {response.status_code}")
        break
return results
records = fetch_all_records(URL)

6. Error Reference

CodeMeaningAction
401/403Unauthorized/ForbiddenCheck if your API Key is active in the portal and passed correctly in the header.
429Rate LimitYou are sending too many requests. Add time.sleep(0.5) to your loop.
500Server ErrorDatabase is likely under heavy load. Check your query parameters and try again.