Aggregate using Python Spark (pyspark)
Finally I am getting hands on with data processing and here I am posting a simple aggregate task using Python Spark. The task is to calculate the aggregate spend by customer and display the data in sorted order. Aggregation is a simple reduce job on the key value pairs of customer ID and each individual spend.
Spark provides sorting by key [sortByKey()] out of the box, but to sort by value, one needs to provide a lambda to the more generic sortBy() function.
Spark provides sorting by key [sortByKey()] out of the box, but to sort by value, one needs to provide a lambda to the more generic sortBy() function.
Comments
Post a Comment