| crosstab {SparkR} | R Documentation |
Computes a pair-wise frequency table of the given columns. Also known as a contingency table. The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned.
## S4 method for signature 'SparkDataFrame,character,character' crosstab(x, col1, col2)
x |
a SparkDataFrame |
col1 |
name of the first column. Distinct items will make the first item of each row. |
col2 |
name of the second column. Distinct items will make the column names of the output. |
a local R data.frame representing the contingency table. The first column of each row
will be the distinct values of col1 and the column names will be the distinct values
of col2. The name of the first column will be "col1_col2". Pairs
that have no occurrences will have zero as their counts.
crosstab since 1.5.0
Other stat functions: approxQuantile,
approxQuantile,SparkDataFrame,character,numeric,numeric-method;
corr, corr,
corr, corr,Column-method,
corr,SparkDataFrame-method;
cov, cov, cov,
cov,SparkDataFrame-method,
cov,characterOrColumn-method,
covar_samp, covar_samp,
covar_samp,characterOrColumn,characterOrColumn-method;
freqItems,
freqItems,SparkDataFrame,character-method;
sampleBy, sampleBy,
sampleBy,SparkDataFrame,character,list,numeric-method
## Not run:
df <- read.json("/path/to/file.json")
ct <- crosstab(df, "title", "gender")
## End(Not run)