Guangning Yu's Blog

Test PySpark max()/min() function

2020-07-13 09:22:44  |  Spark

test.csv

  1. key,a,b,c
  2. a,1,,-1
  3. a,2,,
  4. a,3,,4

test.py

  1. from pyspark.sql import SparkSession
  2. from pyspark.sql import functions as F
  3. spark = SparkSession \
  4. .builder \
  5. .appName("spark-app") \
  6. .getOrCreate()
  7. spark.sparkContext.setLogLevel("WARN")
  8. df = spark.read.csv("test.csv", header=True)
  9. res = df.groupBy(["key"]).agg(*[
  10. F.max("a"),
  11. F.max("b"),
  12. F.max("c"),
  13. F.min("a"),
  14. F.min("b"),
  15. F.min("c"),
  16. ])
  17. print (res.toPandas())

spark-submit test.py

  1. key max(a) max(b) max(c) min(a) min(b) min(c)
  2. 0 a 3 None 4 1 None -1