Untitled — Python Coding — Nerchuko Academy

Min-Max Scaling (from Scratch)

MEDIUM

Implement a function that performs Min-Max scaling on a given 1D NumPy array. The function should transform the features to a range, typically [0, 1]. Do not use scikit-learn's MinMaxScaler.

The formula for Min-Max scaling is:

X_scaled = (X - X_min) / (X_max - X_min)

Handle the edge case where X_max - X_min is zero (i.e., all elements in the array are the same). In this case, all scaled values should be 0 (or 0.5, or as specified).

Examples:

Example 1:

Input: arr = np.array([10, 20, 30, 40, 50])
Output: np.array([0.  , 0.25, 0.5 , 0.75, 1.  ])

Example 2:

Input: arr = np.array([15, 5, 20, 10])
Output: np.array([0.666..., 0.    , 1.    , 0.333...])

Example 3 (Edge Case):

Input: arr = np.array([7, 7, 7])
Output: np.array([0., 0., 0.])  (or [0.5, 0.5, 0.5] depending on convention for this case)

Constraints:

The input arr will be a 1D NumPy array of numbers.
The array can contain positive, negative, or zero values.
The array will not be empty.

Function Signature (Python):

import numpy as np

class Solution:
    def min_max_scale(self, arr: np.ndarray) -> np.ndarray:
        # Your code here
        pass

Solution: Min-Max Scaling with NumPy

The Goal: Min-Max scaling is a way to transform numeric data so that all values fall within a specific range, usually 0 to 1. This is useful in many machine learning algorithms where features with larger value ranges might dominate those with smaller ranges.

The formula is: (current_value - minimum_value_in_array) / (maximum_value_in_array - minimum_value_in_array).

If a value is the minimum, it becomes 0.
If a value is the maximum, it becomes 1.
Other values fall proportionally in between.

Implementation using NumPy

We will use NumPy's array operations for efficiency.

Calculate X_min (minimum value in the array).
Calculate X_max (maximum value in the array).
Calculate the range: range_val = X_max - X_min.
Edge Case: If range_val is 0 (meaning all elements in the array are the same), then all scaled values should be 0 (or another constant like 0.5, depending on convention. We'll use 0 here). This prevents division by zero.
Otherwise, apply the formula: (arr - X_min) / range_val. NumPy handles the element-wise subtraction and division.

import numpy as np

class Solution:
    def min_max_scale(self, arr: np.ndarray) -> np.ndarray:
        # Ensure the input is a NumPy array and has elements
        if not isinstance(arr, np.ndarray):
            arr = np.array(arr, dtype=np.float64) # Convert to float for division
        else:
            # Ensure it's float for division, but preserve original if already float
            if arr.dtype.kind not in 'fc': # 'f' for float, 'c' for complex
                 arr = arr.astype(np.float64)

        if arr.size == 0:
            return np.array([]) # Or raise error for empty array

        X_min = np.min(arr)
        X_max = np.max(arr)
        
        range_val = X_max - X_min
        
        # Handle the edge case where all elements are the same (range_val is 0)
        if range_val == 0:
            # All values are the same. Scaled values will be 0.
            # (Or 0.5 if preferred, or np.full(arr.shape, 0.5))
            return np.zeros_like(arr, dtype=np.float64) 
            
        X_scaled = (arr - X_min) / range_val
        return X_scaled

# Example Usage:
# sol = Solution()
# arr1 = np.array([10, 20, 30, 40, 50])
# print(f"Original: {arr1}, Scaled: {sol.min_max_scale(arr1)}")

# arr2 = np.array([15., 5., 20., 10.]) # Using floats
# print(f"Original: {arr2}, Scaled: {sol.min_max_scale(arr2)}")

# arr3 = np.array([7, 7, 7])
# print(f"Original: {arr3}, Scaled: {sol.min_max_scale(arr3)}")

# arr4 = np.array([-5, 0, 5, 10])
# print(f"Original: {arr4}, Scaled: {sol.min_max_scale(arr4)}")

Explanation of Code:

The input arr is converted to a NumPy float array if it isn't already, to ensure floating-point division behaves as expected.
np.min(arr) and np.max(arr) efficiently find the minimum and maximum values.
The code explicitly checks if range_val is zero. If so, it returns an array of zeros of the same shape as the input. This avoids division by zero and adheres to a common convention for this edge case.
If range_val is not zero, the scaling formula is applied. NumPy's broadcasting rules allow subtracting a scalar (X_min) from an array and dividing an array by a scalar (range_val) element-wise.

Complexity Analysis:

Time Complexity: O(N), where N is the number of elements in the array.

np.min() and np.max() each take O(N) time to scan the array.
The element-wise arithmetic operations (subtraction and division) also take O(N) time.

Space Complexity: O(N) if a new array is returned for the scaled values (as is typical). If the operation were to be done in-place (modifying the original array, though not shown here and generally not recommended for function inputs unless specified), space could be O(1) auxiliary. The initial .astype() also creates a copy if types differ.

Key Takeaways for Interviews:

Understand the Formula: Clearly state the Min-Max scaling formula.
NumPy Proficiency: Demonstrate the use of np.min(), np.max(), and NumPy's vectorized arithmetic.
Edge Case Handling: Crucially, address the case where X_max - X_min = 0 (all elements are identical). Explain how you'd handle division by zero and what the output should be (e.g., all 0s or all 0.5s).
Data Types: Mention the importance of using floating-point numbers for the calculation to get fractional results from the division.
Purpose of Scaling: Briefly explain why Min-Max scaling is used (e.g., to bring features to a common scale for certain machine learning algorithms that are sensitive to feature magnitudes, like gradient descent or distance-based algorithms).
Alternative Scaling Range: While [0, 1] is common, acknowledge that Min-Max scaling can be adapted to other ranges (e.g., [-1, 1]) by modifying the formula slightly: X_scaled = a + (X - X_min)*(b - a) / (X_max - X_min) for range [a, b].