Title: | Implementing Methods for Spatial Fuzzy Unsupervised Classification |
---|---|
Description: | Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>). |
Authors: | Jeremy Gelb [aut, cre] , Philippe Apparicio [ctb] |
Maintainer: | Jeremy Gelb <[email protected]> |
License: | GPL-2 |
Version: | 0.3.4.1 |
Built: | 2024-11-17 01:58:55 UTC |
Source: | https://github.com/jeremygelb/geocmeans |
Function to adjust the spatial weights so that they represent semantic distances between neighbours
adjustSpatialWeights(data, listw, style, mindist = 1e-11)
adjustSpatialWeights(data, listw, style, mindist = 1e-11)
data |
A dataframe with numeric columns |
listw |
A nb object from spdep |
style |
A letter indicating the weighting scheme (see spdep doc) |
mindist |
A minimum value for distance between two observations. If two neighbours have exactly the same values, then the euclidean distance between them is 0, leading to an infinite spatial weight. In that case, the minimum distance is used instead of 0. |
A listw object (spdep like)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") Wqueen2 <- adjustSpatialWeights(dataset,queen,style="C")
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") Wqueen2 <- adjustSpatialWeights(dataset,queen,style="C")
A Landsat 8 image of the bay of Arcachon (France), with a resolution of 30mx30m and 6 bands: blue, green, red, near infrared, shortwave infrared 1 and shortwave infrared 2. The dataset is saved as a Large RasterBrick with the package raster and has the following crs: EPSG:32630. It is provided as a tiff file.
load_arcachon()
load_arcachon()
A spaRast with 6 bands
wavelength: 0.45-0.51
wavelength: 0.53-0.59
wavelength: 0.64-0.67
wavelength: 0.85-0.88
wavelength: 1.57-1.65
wavelength: 2.11-2.29
https://earthexplorer.usgs.gov/
# loading directly from file Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans")) names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2") # loading with the provided function Arcachon <- load_arcachon()
# loading directly from file Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans")) names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2") # loading with the provided function Arcachon <- load_arcachon()
Return bar plots to compare groups
barPlots(data, belongmatrix, ncol = 3, what = "mean")
barPlots(data, belongmatrix, ncol = 3, what = "mean")
data |
A dataframe with numeric columns |
belongmatrix |
A membership matrix |
ncol |
An integer indicating the number of columns for the bar plot |
what |
Can be "mean" (default) or "median" |
a barplot created with ggplot2
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) barPlots(dataset, result$Belongings) ## End(Not run)
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) barPlots(dataset, result$Belongings) ## End(Not run)
Check that the obtained groups are stable by bootstrap
boot_group_validation( object, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random", verbose = TRUE, seed = NULL )
boot_group_validation( object, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random", verbose = TRUE, seed = NULL )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans |
nsim |
The number of replications to do for the bootstrap evaluation |
maxiter |
An integer for the maximum number of iterations |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
init |
A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic. |
verbose |
A boolean to specify if the progress bar should be displayed. |
seed |
An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected. |
Considering that the classification produced by a FCM like algorithm depends on its initial state, it is important to check if the groups obtained are stable. This function uses a bootstrap method to do so. During a selected number of iterations (at least 1000), a sample of size n (with replacement) is drawn from the original dataset. For each sample, the same classification algorithm is applied and the results are compared with the reference results. For each original group, the most similar group is identified by calculating the Jaccard similarity index between the columns of the two membership matrices. This index is comprised between 0 (exact difference) and 1 (perfect similarity) and a value is calculated for each group at each iteration. One can investigate the values obtained to determine if the groups are stable. Values under 0.5 are a concern and indicate that the group is dissolving. Values between 0.6 and 0.75 indicate a pattern in the data, but a significant uncertainty. Values above 0.8 indicate strong groups. The values of the centres obtained at each iteration are also returned, it is important to ensure that they approximately follow a normal distribution (or are at least unimodal).
A list of two values: group_consistency: a dataframe indicating the consistency across simulations each cluster ; group_centres: a list with a dataframe for each cluster. The values in the dataframes are the centres of the clusters at each simulation.
## Not run: data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) validation <- boot_group_validation(Cmean, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random") ## End(Not run)
## Not run: data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) validation <- boot_group_validation(Cmean, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random") ## End(Not run)
Check that the obtained groups are stable by bootstrap with multicore support
boot_group_validation.mc( object, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random", verbose = TRUE, seed = NULL )
boot_group_validation.mc( object, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random", verbose = TRUE, seed = NULL )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans |
nsim |
The number of replications to do for the bootstrap evaluation |
maxiter |
An integer for the maximum number of iterations |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
init |
A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance based method resulting in more dispersed centres at the beginning. Both of them are heuristic. |
verbose |
A boolean to specify if the progress bar should be displayed. |
seed |
An integer to control randomness, default is NULL |
For more details, see the documentation of the function boot_group_validation
A list of two values: group_consistency: a dataframe indicating the consistency across simulations each cluster ; group_centres: a list with a dataframe for each cluster. The values in the dataframes are the centres of the clusters at each simulation.
## Not run: data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) future::plan(future::multisession(workers=2)) validation <- boot_group_validation.mc(Cmean, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random") ## make sure any open connections are closed afterward if (!inherits(future::plan(), "sequential")) future::plan(future::sequential) ## End(Not run)
## Not run: data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) future::plan(future::multisession(workers=2)) validation <- boot_group_validation.mc(Cmean, nsim = 1000, maxiter = 1000, tol = 0.01, init = "random") ## make sure any open connections are closed afterward if (!inherits(future::plan(), "sequential")) future::plan(future::sequential) ## End(Not run)
Calculate the Local Moran I for a numeric raster
calc_local_moran_raster(rast, window)
calc_local_moran_raster(rast, window)
rast |
A SpatRaster or a matrix |
window |
The window defining the neighbour weights |
A SpatRaster or a matrix depending on the input with the local Moran I values
Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans")) names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2") rast <- Arcachon[[1]] w <- matrix(1, nrow = 3, ncol = 3) calc_local_moran_raster(rast, w)
Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans")) names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2") rast <- Arcachon[[1]] w <- matrix(1, nrow = 3, ncol = 3) calc_local_moran_raster(rast, w)
Calculate the global Moran I for a numeric raster
calc_moran_raster(rast, window)
calc_moran_raster(rast, window)
rast |
A SpatRaster or a matrix |
window |
The window defining the neighbour weights |
A float: the global Moran I
Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans")) names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2") rast <- Arcachon[[1]] w <- matrix(1, nrow = 3, ncol = 3) calc_moran_raster(rast, w)
Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans")) names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2") rast <- Arcachon[[1]] w <- matrix(1, nrow = 3, ncol = 3) calc_moran_raster(rast, w)
Calculate the Calinski-Harabasz index of clustering quality.
calcCalinskiHarabasz(data, belongmatrix, centers)
calcCalinskiHarabasz(data, belongmatrix, centers)
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
The Calinski-Harabasz index (Da Silva et al. 2020) is the ratio between the clusters separation (between groups sum of squares) and the clusters cohesion (within groups sum of squares). A greater value indicates either more separated clusters or more cohesive clusters.
A float: the Calinski-Harabasz index
Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcCalinskiHarabasz(result$Data, result$Belongings, result$Centers)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcCalinskiHarabasz(result$Data, result$Belongings, result$Centers)
Calculate the Davies-Bouldin index of clustering quality.
calcDaviesBouldin(data, belongmatrix, centers)
calcDaviesBouldin(data, belongmatrix, centers)
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
The Davies-Bouldin index (Da Silva et al. 2020) can be seen as the ratio of the within cluster dispersion and the between cluster separation. A lower value indicates a higher cluster compacity or a higher cluster separation. The formula is:
with:
So, the value of the index is an average of values. For each cluster, they represent
its worst comparison with all the other clusters, calculated
as the ratio between the compactness of the two clusters and the separation
of the two clusters.
A float: the Davies-Bouldin index
Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcDaviesBouldin(result$Data, result$Belongings, result$Centers)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcDaviesBouldin(result$Data, result$Belongings, result$Centers)
Calculate ELSA statistic for a hard partition. This local indicator of spatial autocorrelation can be used to determine where observations belong to different clusters.
calcELSA(object, nblistw = NULL, window = NULL, matdist = NULL)
calcELSA(object, nblistw = NULL, window = NULL, matdist = NULL)
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a vector of categories. This vector must be filled with integers starting from 1. -1 can be used to indicate missing categories. |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. |
window |
A binary (0,1) matrix representing the neighbours spatial weights when working with rasters. The matrix must have odd dimensions. |
matdist |
A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros. |
The ELSA index (Naimi et al. 2019) can be used to measure local autocorrelation for a categorical variable. It varies between 0 and 1, 0 indicating a perfect positive spatial autocorrelation and 1 a perfect heterogeneity. It is based on the Shanon entropy index, and uses a measure of difference between categories. Thus it can reflect that proximity of two similar categories is still a form of positive autocorelation. The authors suggest to calculate the mean of the index at several lag distance to create an entrogram which quantifies global spatial structure and can be represented as a variogram-like graph.
A depending of the input, a vector of ELSA values or a raster with the ELSA values.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) elsa_valus <- calcELSA(result)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) elsa_valus <- calcELSA(result)
Calculate the explained inertia by a classification
calcexplainedInertia(data, belongmatrix)
calcexplainedInertia(data, belongmatrix)
data |
The original dataframe used for the classification (n*p) |
belongmatrix |
A membership matrix (n*k) |
A float: the percentage of the total inertia explained
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcexplainedInertia(result$Data,result$Belongings)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcexplainedInertia(result$Data,result$Belongings)
Calculate Fukuyama and Sugeno index of clustering quality
calcFukuyamaSugeno(data, belongmatrix, centers, m)
calcFukuyamaSugeno(data, belongmatrix, centers, m)
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
m |
The fuzziness parameter |
The Fukuyama and Sugeno index (Fukuyama 1989) is the difference between the compacity of clusters and the separation of clusters. A smaller value indicates a better clustering. The formula is:
with n the number of observations, k the number of clusters and the mean of the dataset.
A float: the Fukuyama and Sugeno index
Fukuyama Y (1989). “A new method of choosing the number of clusters for the fuzzy c-mean method.” In Proc. 5th Fuzzy Syst. Symp., 1989, 247–250.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcFukuyamaSugeno(result$Data,result$Belongings, result$Centers, 1.5)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcFukuyamaSugeno(result$Data,result$Belongings, result$Centers, 1.5)
Calculate ELSA statistic for a fuzzy partition. This local indicator of spatial autocorrelation can be used to identify areas where close observations tend to belong to different clusters.
calcFuzzyELSA(object, nblistw = NULL, window = NULL, matdist = NULL)
calcFuzzyELSA(object, nblistw = NULL, window = NULL, matdist = NULL)
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a membership matrix. Each row of this matrix must sum up to 1. Can also be a list of rasters, in which case each raster must represent the membership values for one cluster and the sum of all the rasters must be a raster filled with ones. |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. |
window |
A binary (0,1) matrix representing the neighbours spatial weights when working with rasters. The matrix must have odd dimensions. |
matdist |
A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros. |
The fuzzy ELSA index is a generalization of the ELSA index (Naimi et al. 2019). It can be used to measure local autocorrelation for a membership matrix. It varies between 0 and 1, 0 indicating a perfect positive spatial autocorrelation and 1 a perfect heterogeneity. It is based on the Shannon entropy index, and uses a measure of dissimilarity between categories.
either a vector or a raster with the ELSA values.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) elsa_valus <- calcFuzzyELSA(result)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) elsa_valus <- calcFuzzyELSA(result)
Calculate the Generalized Dunn’s index (v43) of clustering quality.
calcGD43(data, belongmatrix, centers)
calcGD43(data, belongmatrix, centers)
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
The Generalized Dunn’s index (Da Silva et al. 2020) is a ratio of the worst pair-wise separation of clusters and the worst compactness of clusters. A higher value indicates a better clustering. The formula is:
The numerator is a measure of the minimal separation between all the clusters i and j given by the formula:
which is basically the Euclidean distance between the centres of clusters and
The denominator is a measure of the maximal dispersion of all clusters, given by the formula:
A float: the Generalized Dunn’s index (43)
Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcGD43(result$Data, result$Belongings, result$Centers)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcGD43(result$Data, result$Belongings, result$Centers)
Calculate the Generalized Dunn’s index (v53) of clustering quality.
calcGD53(data, belongmatrix, centers)
calcGD53(data, belongmatrix, centers)
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
The Generalized Dunn’s index (Da Silva et al. 2020) is a ratio of the worst pair-wise separation of clusters and the worst compactness of clusters. A higher value indicates a better clustering. The formula is:
The numerator is a measure of the minimal separation between all the clusters i and j given by the formula:
where u is the membership matrix and is the column of
u describing the membership of the n observations to cluster
i.
is the center of the cluster i.
The denominator is a measure of the maximal dispersion of all clusters, given by the formula:
A float: the Generalized Dunn’s index (53)
Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcGD53(result$Data, result$Belongings, result$Centers)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcGD53(result$Data, result$Belongings, result$Centers)
Calculate the Negentropy Increment index of clustering quality.
calcNegentropyI(data, belongmatrix, centers)
calcNegentropyI(data, belongmatrix, centers)
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
The Negentropy Increment index (Da Silva et al. 2020) is based on the assumption that a normally shaped cluster is more desirable. It uses the difference between the average negentropy of all the clusters in the partition, and that of the whole partition. A smaller value indicates a better partition. The formula is:
with a cluster, |.| the determinant of a matrix,
j a cluster
|.| the determinant of a matrix
the covariance matrix of the dataset weighted by the membership values to cluster j
the covariance matrix of the dataset
the sum of the membership values to cluster j divided by the number of observations.
A float: the Negentropy Increment index
Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcNegentropyI(result$Data, result$Belongings, result$Centers)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcNegentropyI(result$Data, result$Belongings, result$Centers)
calculate several clustering quality indexes (some of them come from fclust package)
calcqualityIndexes( data, belongmatrix, m, indices = c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia") )
calcqualityIndexes( data, belongmatrix, m, indices = c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia") )
data |
The original dataframe used for the classification (n*p) |
belongmatrix |
A membership matrix (n*k) |
m |
The fuzziness parameter used for the classification |
indices |
A character vector with the names of the indices to calculate, default is : c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index" |
A named list with with the values of the required indices
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcqualityIndexes(result$Data,result$Belongings, m=1.5)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcqualityIndexes(result$Data,result$Belongings, m=1.5)
Calculate the Silhouette index of clustering quality.
calcSilhouetteIdx(data, belongings)
calcSilhouetteIdx(data, belongings)
data |
The original dataframe used for the clustering (n*p) |
belongings |
A membership matrix (n*k) |
The index is calculated with the function SIL.F from the package fclust. When the dataset is too large, an approach by subsampling is used to avoid crash.
A float, the fuzzy Silhouette index
Calculate the diversity (or entropy) index.
calcUncertaintyIndex(belongmatrix)
calcUncertaintyIndex(belongmatrix)
belongmatrix |
A membership matrix |
The diversity (or entropy) index (Theil 1972) is calculated for each observation an varies between 0 and 1. When the value is close to 0, the observation belong to only one cluster (as in hard clustering). When the value is close to 1, the observation is undecided and tends to belong to each cluster. Values above 0.9 should be investigated. The formula is:
with i and observation, j a cluster, k the number of clusters and u the membership matrix.
It is a simplified formula because the sum of each row of a membership matrix is 1.
A vector with the values of the diversity (entropy) index
Theil H (1972). Statistical decomposition analysis; with applications in the social and administrative sciences. North-Holland.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcUncertaintyIndex(result$Belongings)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) calcUncertaintyIndex(result$Belongings)
Function to convert a character vector to a membership matrix (binary matrix). The columns of the matrix are ordered with the order function.
cat_to_belongings(categories) catToBelongings(categories)
cat_to_belongings(categories) catToBelongings(categories)
categories |
A vector with the categories of each observation |
A binary matrix
Create a matrix that can be used as a window when working with rasters. It uses a radius to set to 0 the weights of pixels that are farther than this distance. This is helpful to create circular focals.
circular_window(radius, res)
circular_window(radius, res)
radius |
The size in metres of the radius of the circular focal |
res |
The width in metres of a pixel. It is assumed that pixels are squares. |
The original function comes from here: https://scrogster.wordpress.com/2012/10/05/applying-a-circular-moving-window-filter-to-raster-data-in-r/ but we reworked it to make it faster and to ensure that the result is a matrix with odd dimensions.
A binary weight matrix
# wide of 100 metres for pixels of 2 metres window <- circular_window(100, 2) # row standardisation window_row_std <- window / sum(window)
# wide of 100 metres for pixels of 2 metres window <- circular_window(100, 2) # row standardisation window_row_std <- window / sum(window)
The classical c-mean algorithm
CMeans( data, k, m, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NULL, verbose = TRUE, init = "random", seed = NULL )
CMeans( data, k, m, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NULL, verbose = TRUE, init = "random", seed = NULL )
data |
A dataframe with only numerical variables. Can also be a list of rasters (produced by the package raster). In that case, each raster is considered as a variable and each pixel is an observation. Pixels with NA values are not used during the classification. |
k |
An integer describing the number of cluster to find |
m |
A float for the fuzziness degree |
maxiter |
An integer for the maximum number of iterations |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
standardize |
A boolean to specify if the variables must be centred and reduced (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
verbose |
A boolean to specify if the progress should be printed |
init |
A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic. |
seed |
An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected. |
An S3 object of class FCMres with the following slots
Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) result <- CMeans(dataset,k = 5, m = 1.5, standardize = TRUE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) result <- CMeans(dataset,k = 5, m = 1.5, standardize = TRUE)
The generalized c-mean algorithm
GCMeans( data, k, m, beta, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NULL, verbose = TRUE, init = "random", seed = NULL )
GCMeans( data, k, m, beta, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NULL, verbose = TRUE, init = "random", seed = NULL )
data |
A dataframe with only numerical variables. Can also be a list of rasters (produced by the package raster). In that case, each raster is considered as a variable and each pixel is an observation. Pixels with NA values are not used during the classification. |
k |
An integer describing the number of cluster to find |
m |
A float for the fuzziness degree |
beta |
A float for the beta parameter (control speed convergence and classification crispness) |
maxiter |
An integer for the maximum number of iterations |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
standardize |
A boolean to specify if the variables must be centred and reduced (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
verbose |
A boolean to specify if the progress should be printed |
init |
A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic. |
seed |
An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected. |
An S3 object of class FCMres with the following slots
Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) result <- GCMeans(dataset,k = 5, m = 1.5, beta = 0.5, standardize = TRUE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) result <- GCMeans(dataset,k = 5, m = 1.5, beta = 0.5, standardize = TRUE)
Match the groups obtained from two classifications based on the Jaccard index calculated on the membership matrices.
groups_matching(object.x, object.y)
groups_matching(object.x, object.y)
object.x |
A FCMres object, or a simple membership matrix. It is used as the reference for the ordering of the groups |
object.y |
A FCMres object, or a simple membership matrix. The order of its groups will be updated to match with the groups of object.x |
We can not expect to obtain the groups in the same order in each run of a classification algorithm. This function can be used match the clusters of a first classification with the most similar clusters in a second classification. Thus it might be easier to compare the results of two algorithms or two runs of the same algorithm.
The FCMres object or the membership matrix provided for the parameter object.y with the order of the groups updated.
data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) Cmean2 <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 789, tol = 0.00001, verbose = FALSE) ordered_Cmean2 <- groups_matching(Cmean,Cmean2)
data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) Cmean2 <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 789, tol = 0.00001, verbose = FALSE) ordered_Cmean2 <- groups_matching(Cmean,Cmean2)
Check if an object can be considered as a FCMres object
## S3 method for class 'FCMres' is(object, class2 = "FCMres")
## S3 method for class 'FCMres' is(object, class2 = "FCMres")
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans |
class2 |
Character string giving the names of the classe to test (usually "FCMres") |
A boolean, TRUE if x can be considered as a FCMres object, FALSE otherwise group
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) is(result, "FCMres")
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) is(result, "FCMres")
A dataset containing social and environmental data for the Iris of Lyon (France)
LyonIris
LyonIris
A SpatialPolygonsDataFrame with 506 rows and 32 variables:
a simple OID (integer)
the code of each commune (factor)
the code of each unit area : iris (factor)
the annual daily mean noise exposure values in dB (numeric)
the annual mean of NO2 concentration in ug/m3 (numeric)
the annual mean of PM25 concentration in ug/m3 (numeric)
the annual mean of PM25 concentration in ug/m3 (numeric)
the percentage of people that are 0 to 14 year old (numeric)
the percentage of people older than 64 (numeric)
the percentage immigrants (numeric)
the unemployment rate (numeric)
the percentage of people that obtained the college diploma (numeric)
the median standard of living in euros (numeric)
the percentage of the iris surface covered by trees (numeric)
the X coordinate of the center of the Iris (numeric)
the Y coordinate of the center of the Iris (numeric)
...
https://data.grandlyon.com/portail/fr/accueil
Build some maps to visualize the results of the clustering
mapClusters(geodata = NULL, object, undecided = NULL)
mapClusters(geodata = NULL, object, undecided = NULL)
geodata |
An object of class features collection from sf / ordered like the original data used for the clustering. Can be Null if object is a FCMres and has been created with rasters. |
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix. |
undecided |
A float between 0 and 1 giving the minimum value that an observation must get in the membership matrix to not be considered as uncertain (default = NULL) |
A named list with :
ProbaMaps : a list of tmap maps showing for each group the probability of the observations to belong to that group
ClusterMap : a tmap map showing the most likely group for observation
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) MyMaps <- mapClusters(LyonIris, result$Belongings) ## End(Not run)
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) MyMaps <- mapClusters(LyonIris, result$Belongings) ## End(Not run)
Method to plot the results of a FCM.res object
## S3 method for class 'FCMres' plot(x, type = "spider", ...)
## S3 method for class 'FCMres' plot(x, type = "spider", ...)
x |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix. |
type |
A string indicating the type of plot to show. Can be one of "bar", "violin", or "spider". Default is spider. |
... |
not used |
This S3 method is a simple dispatcher for the functions barPlots, violinPlots and spiderPlots. To be able to use all their specific parameters, one can use them directly.
a ggplot2 object, a list, or NULL, depending on the type of plot requested
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") # rescaling all the variables used in the analysis for (field in AnalysisFields) { LyonIris[[field]] <- scale(LyonIris[[field]]) } # doing the initial clustering dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE) plot(result, type = "spider")
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") # rescaling all the variables used in the analysis for (field in AnalysisFields) { LyonIris[[field]] <- scale(LyonIris[[field]]) } # doing the initial clustering dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE) plot(result, type = "spider")
Function to predict the membership matrix of a new set of observations
predict_membership( object, new_data, nblistw = NULL, window = NULL, standardize = TRUE, ... )
predict_membership( object, new_data, nblistw = NULL, window = NULL, standardize = TRUE, ... )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix. |
new_data |
A DataFrame with the new observations or a list of rasters if object$isRaster is TRUE |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. |
window |
If data is a list of rasters, then a window must be specified instead of a list.w object. It will be used to calculate a focal function on each raster. The window must be a square numeric matrix with odd dimensions (such 3x3). The values in the matrix indicate the weight to give to each pixel and the centre of the matrix is the centre of the focal function. |
standardize |
A boolean to specify if the variable must be centered and reduced (default = True) |
... |
not used |
A numeric matrix with the membership values for each new observation. If rasters were used, return a list of rasters with the membership values.
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") # rescaling all the variables used in the analysis for (field in AnalysisFields) { LyonIris[[field]] <- scale(LyonIris[[field]]) } # doing the initial clustering dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE) # using a subset of the original dataframe as "new data" new_data <- LyonIris[c(1, 27, 36, 44, 73),] new_dataset <- sf::st_drop_geometry(new_data[AnalysisFields]) new_nb <- spdep::poly2nb(new_data,queen=TRUE) new_Wqueen <- spdep::nb2listw(new_nb,style="W") # doing the prediction predictions <- predict_membership(result, new_dataset, new_Wqueen, standardize = FALSE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") # rescaling all the variables used in the analysis for (field in AnalysisFields) { LyonIris[[field]] <- scale(LyonIris[[field]]) } # doing the initial clustering dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE) # using a subset of the original dataframe as "new data" new_data <- LyonIris[c(1, 27, 36, 44, 73),] new_dataset <- sf::st_drop_geometry(new_data[AnalysisFields]) new_nb <- spdep::poly2nb(new_data,queen=TRUE) new_Wqueen <- spdep::nb2listw(new_nb,style="W") # doing the prediction predictions <- predict_membership(result, new_dataset, new_Wqueen, standardize = FALSE)
Function to predict the membership matrix of a new set of observations
## S3 method for class 'FCMres' predict( object, new_data, nblistw = NULL, window = NULL, standardize = TRUE, ... )
## S3 method for class 'FCMres' predict( object, new_data, nblistw = NULL, window = NULL, standardize = TRUE, ... )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix. |
new_data |
A DataFrame with the new observations |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. |
window |
If data is a list of rasters, then a window must be specified instead of a list.w object. It will be used to calculate a focal function on each raster. The window must be a square numeric matrix with odd dimensions (such 3x3). The values in the matrix indicate the weight to give to each pixel and the centre of the matrix is the centre of the focal function. |
standardize |
A boolean to specify if the variable must be centred and reduced (default = True) |
... |
not used |
A numeric matrix with the membership values for each new observation
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") # rescaling all the variables used in the analysis for (field in AnalysisFields) { LyonIris[[field]] <- scale(LyonIris[[field]]) } # doing the initial clustering dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE) # using a subset of the original dataframe as "new data" new_data <- LyonIris[c(1, 27, 36, 44, 73),] new_dataset <- sf::st_drop_geometry(new_data[AnalysisFields]) new_nb <- spdep::poly2nb(new_data,queen=TRUE) new_Wqueen <- spdep::nb2listw(new_nb,style="W") # doing the prediction predictions <- predict(result, new_dataset, new_Wqueen, standardize = FALSE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") # rescaling all the variables used in the analysis for (field in AnalysisFields) { LyonIris[[field]] <- scale(LyonIris[[field]]) } # doing the initial clustering dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE) # using a subset of the original dataframe as "new data" new_data <- LyonIris[c(1, 27, 36, 44, 73),] new_dataset <- sf::st_drop_geometry(new_data[AnalysisFields]) new_nb <- spdep::poly2nb(new_data,queen=TRUE) new_Wqueen <- spdep::nb2listw(new_nb,style="W") # doing the prediction predictions <- predict(result, new_dataset, new_Wqueen, standardize = FALSE)
print a FCMres object
## S3 method for class 'FCMres' print(x, ...)
## S3 method for class 'FCMres' print(x, ...)
x |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans |
... |
not used |
A boolean, TRUE if x can be considered as a FCMres object, FALSE otherwise group
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) result <- CMeans(dataset, k = 5, m = 1.5, standardize = TRUE) print(result, "FCMres")
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) result <- CMeans(dataset, k = 5, m = 1.5, standardize = TRUE) print(result, "FCMres")
Function to select the parameters for a clustering algorithm.
select_parameters( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, seed = NULL, init = "random", verbose = TRUE ) selectParameters( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, seed = NULL, init = "random", verbose = TRUE )
select_parameters( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, seed = NULL, init = "random", verbose = TRUE ) selectParameters( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, seed = NULL, init = "random", verbose = TRUE )
algo |
A string indicating which method to use (FCM, GFCM, SFCM, SGFCM) |
data |
A dataframe with numeric columns or a list of rasters. |
k |
A sequence of values for k to test (>=2) |
m |
A sequence of values for m to test |
alpha |
A sequence of values for alpha to test (NULL if not required) |
beta |
A sequence of values for beta to test (NULL if not required) |
nblistw |
A list of list.w objects describing the neighbours typically produced by the spdep package (NULL if not required) |
lag_method |
A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). Both can be tested by specifying a vector : c("mean","median"). When working with rasters, the string must be parsable to a function like mean, min, max, sum, etc. and will be applied to all the pixels values in the window designated by the parameter window and weighted according to the values of this matrix. |
window |
A list of windows to use to calculate neighbouring values if rasters are used. |
spconsist |
A boolean indicating if the spatial consistency must be calculated |
classidx |
A boolean indicating if the quality of classification indices must be calculated |
nrep |
An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE. |
indices |
A character vector with the names of the indices to calculate, to evaluate clustering quality. default is :c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index". |
standardize |
A boolean to specify if the variable must be centered and reduce (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
maxiter |
An integer for the maximum number of iteration |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
seed |
An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected. |
init |
A string indicating how the initial centers must be selected. "random" indicates that random observations are used as centers. "kpp" use a distance based method resulting in more dispersed centers at the beginning. Both of them are heuristic. |
verbose |
A boolean indicating if a progressbar should be displayed |
A dataframe with indicators assessing the quality of classifications
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- select_parameters(algo = "SFCM", dataset, k = 5, m = seq(2,3,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE) data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- selectParameters(algo = "SFCM", dataset, k = 5, m = seq(2,3,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- select_parameters(algo = "SFCM", dataset, k = 5, m = seq(2,3,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE) data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- selectParameters(algo = "SFCM", dataset, k = 5, m = seq(2,3,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
Function to select the parameters for a clustering algorithm. This version of the function allows to use a plan defined with the package future to reduce calculation time.
select_parameters.mc( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, chunk_size = 5, seed = NULL, init = "random", verbose = TRUE ) selectParameters.mc( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, chunk_size = 5, seed = NULL, init = "random", verbose = TRUE )
select_parameters.mc( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, chunk_size = 5, seed = NULL, init = "random", verbose = TRUE ) selectParameters.mc( algo, data, k, m, alpha = NA, beta = NA, nblistw = NULL, lag_method = "mean", window = NULL, spconsist = TRUE, classidx = TRUE, nrep = 30, indices = NULL, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NA, maxiter = 500, tol = 0.01, chunk_size = 5, seed = NULL, init = "random", verbose = TRUE )
algo |
A string indicating which method to use (FCM, GFCM, SFCM, SGFCM) |
data |
A dataframe with numeric columns |
k |
A sequence of values for k to test (>=2) |
m |
A sequence of values for m to test |
alpha |
A sequence of values for alpha to test (NULL if not required) |
beta |
A sequence of values for beta to test (NULL if not required) |
nblistw |
A list of list.w objects describing the neighbours typically produced by the spdep package (NULL if not required) |
lag_method |
A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). Both can be tested by specifying a vector : c("mean","median"). When working with rasters, the string must be parsable to a function like mean, min, max, sum, etc. and will be applied to all the pixels values in the window designated by the parameter window and weighted according to the values of this matrix. |
window |
A list of windows to use to calculate neighbouring values if rasters are used. |
spconsist |
A boolean indicating if the spatial consistency must be calculated |
classidx |
A boolean indicating if the quality of classification indices must be calculated |
nrep |
An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE. |
indices |
A character vector with the names of the indices to calculate, to evaluate clustering quality. default is :c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index". |
standardize |
A boolean to specify if the variable must be centered and reduce (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
maxiter |
An integer for the maximum number of iteration |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
chunk_size |
The size of a chunk used for multiprocessing. Default is 100. |
seed |
An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected. |
init |
A string indicating how the initial centers must be selected. "random" indicates that random observations are used as centers. "kpp" use a distance based method resulting in more dispersed centers at the beginning. Both of them are heuristic. |
verbose |
A boolean indicating if a progressbar should be displayed |
A dataframe with indicators assessing the quality of classifications
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") future::plan(future::multisession(workers=2)) #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE) ## make sure any open connections are closed afterward if (!inherits(future::plan(), "sequential")) future::plan(future::sequential) data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") future::plan(future::multisession(workers=2)) #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") future::plan(future::multisession(workers=2)) #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE) ## make sure any open connections are closed afterward if (!inherits(future::plan(), "sequential")) future::plan(future::sequential) data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") future::plan(future::multisession(workers=2)) #set spconsist to TRUE to calculate the spatial consistency indicator #FALSE here to reduce the time during package check values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1), alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
spatial version of the c-mean algorithm (SFCMeans, FCM_S1)
SFCMeans( data, nblistw = NULL, k, m, alpha, lag_method = "mean", window = NULL, noise_cluster = FALSE, delta = NULL, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, verbose = TRUE, init = "random", seed = NULL )
SFCMeans( data, nblistw = NULL, k, m, alpha, lag_method = "mean", window = NULL, noise_cluster = FALSE, delta = NULL, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, verbose = TRUE, init = "random", seed = NULL )
data |
A dataframe with only numerical variables. Can also be a list of rasters (produced by the package raster). In that case, each raster is considered as a variable and each pixel is an observation. Pixels with NA values are not used during the classification. |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. |
k |
An integer describing the number of cluster to find |
m |
A float for the fuzziness degree |
alpha |
A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space) |
lag_method |
A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). When working with rasters, a function can be given (or a string which will be parsed). It will be applied to all the pixels values in the matrix designated by the parameter window and weighted according to the values of this matrix. Typically, to obtain an average of the pixels in a 3x3 matrix one could use the function sum (or "sum") and set the window as: window <- matrix(1/9,nrow = 3, ncol = 3). There is one special case when working with rasters: one can specify "nl" (standing for non-local) which calculated a lagged version of the input rasters, using the inverse of the euclidean distance as spatial weights (see the section Advanced examples in the vignette introduction for more details). |
window |
If data is a list of rasters, then a window must be specified instead of a list.w object. It will be used to calculate a focal function on each raster. The window must be a square numeric matrix with odd dimensions (such 3x3). The values in the matrix indicate the weight to give to each pixel and the centre of the matrix is the centre of the focal function. |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
maxiter |
An integer for the maximum number of iterations |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
standardize |
A boolean to specify if the variables must be centred and reduced (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
verbose |
A boolean to specify if the progress should be printed |
init |
A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic. |
seed |
An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected. |
The implementation is based on the following article : doi:10.1016/j.patcog.2006.07.011.
the membership matrix (u) is calculated as follow
the centers of the groups are updated with the following formula
with
vi the center of the group vi
xk the data point k
xk_bar the spatially lagged data point k
An S3 object of class FCMres with the following slots
Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
spatial version of the generalized c-mean algorithm (SGFCMeans)
SGFCMeans( data, nblistw = NULL, k, m, alpha, beta, lag_method = "mean", window = NULL, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NULL, verbose = TRUE, init = "random", seed = NULL )
SGFCMeans( data, nblistw = NULL, k, m, alpha, beta, lag_method = "mean", window = NULL, maxiter = 500, tol = 0.01, standardize = TRUE, robust = FALSE, noise_cluster = FALSE, delta = NULL, verbose = TRUE, init = "random", seed = NULL )
data |
A dataframe with only numerical variables. Can also be a list of rasters (produced by the package raster). In that case, each raster is considered as a variable and each pixel is an observation. Pixels with NA values are not used during the classification. |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. |
k |
An integer describing the number of cluster to find |
m |
A float for the fuzziness degree |
alpha |
A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space) |
beta |
A float for the beta parameter (control speed convergence and classification crispness) |
lag_method |
A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). When working with rasters, a function can be given (or a string which will be parsed). It will be applied to all the pixels values in the matrix designated by the parameter window and weighted according to the values of this matrix. Typically, to obtain an average of the pixels in a 3x3 matrix one could use the function sum (or "sum") and set the window as: window <- matrix(1/9,nrow = 3, ncol = 3). There is one special case when working with rasters: one can specify "nl" (standing for non-local) which calculated a lagged version of the input rasters, using the inverse of the euclidean distance as spatial weights (see the section Advanced examples in the vignette introduction for more details). |
window |
If data is a list of rasters, then a window must be specified instead of a list.w object. It will be used to calculate a focal function on each raster. The window must be a square numeric matrix with odd dimensions (such 3x3). The values in the matrix indicate the weight to give to each pixel and the centre of the matrix is the centre of the focal function. |
maxiter |
An integer for the maximum number of iterations |
tol |
The tolerance criterion used in the evaluateMatrices function for convergence assessment |
standardize |
A boolean to specify if the variables must be centred and reduced (default = True) |
robust |
A boolean indicating if the "robust" version of the algorithm must be used (see details) |
noise_cluster |
A boolean indicatong if a noise cluster must be added to the solution (see details) |
delta |
A float giving the distance of the noise cluster to each observation |
verbose |
A boolean to specify if the progress should be printed |
init |
A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic. |
seed |
An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected. |
The implementation is based on the following article : doi:10.1016/j.dsp.2012.09.016.
the membership matrix (u) is calculated as follow
the centers of the groups are updated with the following formula
with
vi the center of the group vi
xk the data point k
xk_bar the spatially lagged data point k
An S3 object of class FCMres with the following slots
Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = TRUE)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = TRUE)
Start a local Shiny App to explore the results of a classification
sp_clust_explorer( object = NULL, spatial = NULL, membership = NULL, dataset = NULL, port = 8100, ... )
sp_clust_explorer( object = NULL, spatial = NULL, membership = NULL, dataset = NULL, port = 8100, ... )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans |
spatial |
A feature collection (sf) used to map the observations. Only needed if object was not created from rasters. |
membership |
A matrix or a dataframe representing the membership values obtained for each observation. If NULL, then the matrix is extracted from object. |
dataset |
A dataframe or matrix representing the data used for the classification. If NULL, then the matrix is extracted from object. |
port |
An integer of length 4 indicating the port on which to start the Shiny app. Default is 8100 |
... |
Other parameters passed to the function runApp |
## Not run: data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) sp_clust_explorer(Cmean, LyonIris) ## End(Not run)
## Not run: data(LyonIris) #selecting the columns for the analysis AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14", "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed") #rescaling the columns Data <- sf::st_drop_geometry(LyonIris[AnalysisFields]) for (Col in names(Data)){ Data[[Col]] <- as.numeric(scale(Data[[Col]])) } Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE) sp_clust_explorer(Cmean, LyonIris) ## End(Not run)
Utility function to facilitate the spatial diagnostic of a classification
Calculate the following indicators: Moran I index (spdep::moranI) for each column of the membership matrix, Join count test (spdep::joincount.multi) for the most likely groups of each datapoint, Spatial consistency index (see function spConsistency) and the Elsa statistic (see function calcElsa). Note that if the FCMres object given was constructed with rasters, the joincount statistic is not calculated and no p-values are provided for the Moran I indices.
spatialDiag( object, nblistw = NULL, window = NULL, undecided = NULL, matdist = NULL, nrep = 50 )
spatialDiag( object, nblistw = NULL, window = NULL, undecided = NULL, matdist = NULL, nrep = 50 )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix. |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. Can also be NULL if object is a FCMres object. |
window |
If rasters were used for the classification, the window must be specified instead of a list.w object. Can also be NULL if object is a FCMres object. |
undecided |
A float giving the threslhod to detect undecided observations. An observation is undecided if its maximum membership value is bellow this float. If null, no observations are undecided. |
matdist |
A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros. |
nrep |
An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency |
A named list with :
MoranValues : the moran I values for each column of the membership matrix (spdep::MoranI)
JoinCounts : the result of the join count test calculated with the most likely group for each datapoint (spdep::joincount.multi)
SpConsist : the mean value of the spatial consistency index (the lower, the better, see ?spConsistency for details)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) spatialDiag(result, undecided=0.45, nrep=30)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) spatialDiag(result, undecided=0.45, nrep=30)
Calculate a spatial consistency index
spConsistency( object, nblistw = NULL, window = NULL, nrep = 999, adj = FALSE, mindist = 1e-11 )
spConsistency( object, nblistw = NULL, window = NULL, nrep = 999, adj = FALSE, mindist = 1e-11 )
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix. |
nblistw |
A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. Can also be NULL if object is a FCMres object. |
window |
if rasters were used for the classification, the window must be specified instead of a list.w object. Can also be NULL if object is a FCMres object. |
nrep |
An integer indicating the number of permutation to do to simulate spatial randomness. Note that if rasters are used, each permutation can be very long. |
adj |
A boolean indicating if the adjusted version of the indicator must be calculated when working with rasters (globally standardized). When working with vectors, see the function adjustSpatialWeights to modify the list.w object. |
mindist |
When adj is true, a minimum value for distance between two observations. If two neighbours have exactly the same values, then the euclidean distance between them is 0, leading to an infinite spatial weight. In that case, the minimum distance is used instead of 0. |
This index is experimental, it aims to measure how much a clustering solution is spatially consistent. A classification is spatially inconsistent if neighbouring observation do not belong to the same group. See detail for a description of its calculation
The total spatial inconsistency (*Scr*) is calculated as follow
With U the membership matrix, i an observation, k the neighbours of i and W the spatial weight matrix This represents the total spatial inconsistency of the solution (true inconsistency) We propose to compare this total with simulated values obtained by permutations (simulated inconsistency). The values obtained by permutation are an approximation of the spatial inconsistency obtained in a random context Ratios between the true inconsistency and simulated inconsistencies are calculated A value of 0 depict a situation where all observations are identical to their neighbours A value of 1 depict a situation where all observations are as much different as their neighbours that what randomness can produce A classification solution able to reduce this index has a better spatial consistency
A named list with
Mean : the mean of the spatial consistency index
prt05 : the 5th percentile of the spatial consistency index
prt95 : the 95th percentile of the spatial consistency index
samples : all the value of the spatial consistency index
sum_diff : the total sum of squarred difference between observations and their neighbours
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) # NOTE : more replications are needed for proper inference spConsistency(result$Belongings, nblistw = Wqueen, nrep=25)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) # NOTE : more replications are needed for proper inference spConsistency(result$Belongings, nblistw = Wqueen, nrep=25)
Display spider charts to quickly compare values between groups
spiderPlots(data, belongmatrix, chartcolors = NULL)
spiderPlots(data, belongmatrix, chartcolors = NULL)
data |
A dataframe with numeric columns |
belongmatrix |
A membership matrix |
chartcolors |
A vector of color names used for the spider plot |
For each group, the weighted mean of each variable in data is calculated based on the probability of belonging to this group of each observation. On the chart the exterior ring represents the maximum value obtained for all the groups and the interior ring the minimum. The groups are located between these two limits in a linear way.
NULL, the plots are displayed directly by the function (see fmsb::radarchart)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) spiderPlots(dataset,result$Belongings)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) spiderPlots(dataset,result$Belongings)
Create functions to standardize and unstandardize data
standardizer(x)
standardizer(x)
x |
a numeric vector or a data.frame with only numeric columns. Non numeric columns are dropped. |
If x was a vector, the function returns a list containing two functions : scale and unscale. The first one is an equivalent of the classical function scale(x, center = TRUE, scale = TRUE). The second can be used to reverse the scaling and get back original units. If x was a data.frame, the same pair of functions is returned inside of a list for each numeric column.
data(LyonIris) LyonScales <- standardizer(sf::st_drop_geometry(LyonIris))
data(LyonIris) LyonScales <- standardizer(sf::st_drop_geometry(LyonIris))
Calculate some descriptive statistics of each group
summarizeClusters(data, belongmatrix, weighted = TRUE, dec = 3, silent = TRUE)
summarizeClusters(data, belongmatrix, weighted = TRUE, dec = 3, silent = TRUE)
data |
The original dataframe used for the classification |
belongmatrix |
A membership matrix |
weighted |
A boolean indicating if the summary statistics must use the membership matrix columns as weights (TRUE) or simply assign each observation to its most likely cluster and compute the statistics on each subset (FALSE) |
dec |
An integer indicating the number of digits to keep when rounding (default is 3) |
silent |
A boolean indicating if the results must be printed or silently returned |
A list of length k (the number of group). Each element of the list is a dataframe with summary statistics for the variables of data for each group
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) summarizeClusters(dataset, result$Belongings)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) summarizeClusters(dataset, result$Belongings)
Calculate some descriptive statistics of each group of a FCMres object
## S3 method for class 'FCMres' summary(object, data = NULL, weighted = TRUE, dec = 3, silent = TRUE, ...)
## S3 method for class 'FCMres' summary(object, data = NULL, weighted = TRUE, dec = 3, silent = TRUE, ...)
object |
A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans |
data |
A dataframe to use for the summary statistics instead of obj$data |
weighted |
A boolean indicating if the summary statistics must use the membership matrix columns as weights (TRUE) or simply assign each observation to its most likely cluster and compute the statistics on each subset (FALSE) |
dec |
An integer indicating the number of digits to keep when rounding (default is 3) |
silent |
A boolean indicating if the results must be printed or silently returned |
... |
Not used |
A list of length k (the number of group). Each element of the list is a dataframe with summary statistics for the variables of data for each group
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) summary(result)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) summary(result)
Return a map to visualize membership matrix
uncertaintyMap( geodata, belongmatrix, njit = 150, radius = NULL, colors = NULL, pt_size = 0.05 )
uncertaintyMap( geodata, belongmatrix, njit = 150, radius = NULL, colors = NULL, pt_size = 0.05 )
geodata |
An object of class feature collection from sf ordered like the original data used for the clustering. |
belongmatrix |
A membership matrix |
njit |
The number of points to map on each feature. |
radius |
When mapping points, the radius indicates how far random points will be plotted around the original features. |
colors |
A vector of colors to use for the groups. |
pt_size |
A float giving the size of the random points on the final map (default is 0.05) |
This function maps the membership matrix by plotting random points in polygons, along lines or around points representing the original observations. Each cluster is associated with a color and each random point has a probability to be of that color equal to the membership value of the feature it belongs itself. Thus, it is possible to visualize regions with uncertainty and to identify the strongest clusters.
a map created with tmap
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) uncertaintyMap(LyonIris, result$Belongings) ## End(Not run)
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) uncertaintyMap(LyonIris, result$Belongings) ## End(Not run)
Identify the observation for which the classification is uncertain
undecidedUnits(belongmatrix, tol = 0.1, out = "character")
undecidedUnits(belongmatrix, tol = 0.1, out = "character")
belongmatrix |
The membership matrix obtained at the end of the algorithm |
tol |
A float indicating the minimum required level of membership to be not considered as undecided |
out |
The format of the output vector. Default is "character". If "numeric", then the undecided units are set to -1. |
A vector indicating the most likely group for each observation or "Undecided" if the maximum probability for the observation does not reach the value of the tol parameter
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) undecidedUnits(result$Belongings, tol = 0.45)
data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) undecidedUnits(result$Belongings, tol = 0.45)
Return violin plots to compare the distribution of each variable for each group.
violinPlots(data, groups)
violinPlots(data, groups)
data |
A dataframe with numeric columns |
groups |
A vector indicating the group of each observation |
A list of plots created with ggplot2
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometrie(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) violinPlots(dataset, result$Groups) ## End(Not run)
## Not run: data(LyonIris) AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img", "TxChom1564","Pct_brevet","NivVieMed") dataset <- sf::st_drop_geometrie(LyonIris[AnalysisFields]) queen <- spdep::poly2nb(LyonIris,queen=TRUE) Wqueen <- spdep::nb2listw(queen,style="W") result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE) violinPlots(dataset, result$Groups) ## End(Not run)