Interview Coding Take-Homes: Part 3#
UCLA Health - Population Analysis#
This take-home was about the data ingestion and cleaning for the UCLA Health population project. The goal was to take messy JSON from an API and turn it into a clean CSV, so the analysis and visualization could focus on the real questions without getting bogged down in data cleanup.
Goals#
Produce a CSV with a fixed header order and stable types.
Keep each transformation small and unit-testable.
Fail fast on missing required fields and log useful diagnostics.
Why this approach#
Even though the dataset was small and straightforward, it had all the typical real-world issues: keys with spaces, inconsistent casing, and numbers as strings. Instead of using heavy tools, I focused on keeping things clear, with predictable output and a simple flow I could easily follow.
Repository#
The implementation is in the population-analysis repo under the Analysis/ directory. See the source and examples at:
Process overview#
Load JSON from the API or from the cached sample used in tests.
Rename and normalize keys so everything aligns with the internal schema.
Coerce types (strings -> ints/dates) and fail if required data is missing.
Emit two CSVs: one raw mapping for auditing and one cleaned file for analysis.
Key implementation details#
Handling messy JSON keys#
The prompt demanded a tight mapping between API fields and output columns.
Because the API returns keys with spaces and wobbly casing, the data model
leans on [JsonPropertyName] attributes to translate them into clean C#
properties:
1 [JsonPropertyName("ID State")]
2 public string IdState { get; set; } = "";
3 public string State { get; set; } = "";
4 [JsonPropertyName("ID Year")]
5 public int IdYear { get; set; }
6 public string Year { get; set; } = "";
7 public int Population { get; set; }
8 [JsonPropertyName("Slug State")]
9 public string SlugState { get; set; } = "";
This keeps the mapping explicit and makes it obvious which API field feeds each column.
Fetching and caching the data#
The application calls the public census API, but I do not re-fetch on every test run. Instead the client caches the payload locally so the API stays quiet while I iterate: