Guangning Yu's Blog
Home
Code
Data
Setup
Industry
MachineLearning
Archive
Test PySpark max()/min() function
2020-07-13 09:22:44
|
Spark
**test.csv** ``` key,a,b,c a,1,,-1 a,2,, a,3,,4 ``` **test.py** ``` from pyspark.sql import SparkSession from pyspark.sql import functions as F spark = SparkSession \ .builder \ .appName("spark-app") \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") df = spark.read.csv("test.csv", header=True) res = df.groupBy(["key"]).agg(*[ F.max("a"), F.max("b"), F.max("c"), F.min("a"), F.min("b"), F.min("c"), ]) print (res.toPandas()) ``` **spark-submit test.py** ``` key max(a) max(b) max(c) min(a) min(b) min(c) 0 a 3 None 4 1 None -1 ```
Previous:
Mount the Amazon EFS File System on the EC2 Instance
Next:
Install Azure Cli on Mac