This function retrieves and cleans the data for the experiment and survey. It
uses several helper functions to filter and format the data, including
filter_random_accuracy_ids()
, filter_manually_identified_ids()
,
filter_suspicious_rt_ids()
, factor_categories()
, factor_groups()
,
factor_chr_vars()
, factor_strategies()
, and compute_nieq_scores()
.
The cleaned data is returned as a list containing two data frames:
df_expe
and df_survey
. The df_expe
data frame contains the cleaned
experiment data, while the df_survey
data frame contains the cleaned
survey data.
Usage
get_clean_data(
n_groups = 2,
exclude_no_vviq = TRUE,
exclude_no_osivq = TRUE,
exclude_no_raven = TRUE,
exclude_cheated = TRUE,
exclude_distracted = TRUE,
exclude_treatment = FALSE,
exclude_adhd = FALSE,
exclude_asd = FALSE,
exclude_dyslexia = FALSE,
exclude_other = FALSE,
sd_mult = 2.25,
verbose = FALSE
)
Arguments
- n_groups
The number of groups to factor in the data. Must be 2, 3 or 4. 2 divides the sample into Aphants and Typical imagers using the 32 VVIQ criterio, 3 divides the sample into Aphants (VVIQ = 16), Hypophants (VVIQ < 32) and Typical imagers, and 4 also isolates Hyperphants with VVIQ > 75.
- exclude_no_vviq
Logical, whether to exclude participants without VVIQ.
- exclude_no_osivq
Logical, whether to exclude participants without OSIVQ.
- exclude_no_raven
Logical, whether to exclude participants without Raven.
- exclude_cheated
Logical, whether to exclude participants who have cheated (based on self-report).
- exclude_distracted
Logical, whether to exclude participants who have been distracted (based on self-report).
- exclude_treatment
Logical, whether to exclude participants who have a treatment for a neurological or psychiatric disorder.
- exclude_adhd
Logical, whether to exclude participants who have ADHD.
- exclude_asd
Logical, whether to exclude participants who have ASD.
- exclude_dyslexia
Logical, whether to exclude participants who have dyslexia.
- exclude_other
Logical, whether to exclude participants who have other neurological troubles.
- sd_mult
A numeric value indicating how many standard deviations to use for identifying suspicious median RTs. The default is 2.25, which means that median RTs that are more than 2.25 standard deviations inferior to the mean are considered suspiciously fast and potential "spamming".
- verbose
A logical value indicating whether to print verbose messages about the filtering process. Default is
FALSE
.
Value
A list containing two data frames:
df_expe
: The cleaned experiment data.df_survey
: The cleaned survey data.
Examples
clean_data <- get_clean_data(verbose = TRUE)
#>
#> Sample size before accuracy analysis: 137
#> Participants below random accuracy (<= 50%): 8 (5.84%)
#>
#> Sample size before manual examination: 137
#> Manually identified participants:
#> - N without VVIQ: 3 -> Excluded
#> - N without OSIVQ: 6 -> Excluded
#> - N without Raven: 2 -> Excluded
#> - N who cheated: 3 -> Excluded
#> - N who were distracted: 12 -> Excluded
#> - N who had treatment: 4 -> Included
#> - N with ADHD: 7 -> Included
#> - N with ASD: 5 -> Included
#> - N with dyslexia: 2 -> Included
#> - N with other neuro troubles: 2 -> Included
#> Participants to exclude: 24 (17.52%)
#>
#> Sample size before median RTs analysis: 106
#> Participants with median RTs outside 2.25 SDs: 2 (1.89%)
head(clean_data$df_expe)
#> # A tibble: 6 × 19
#> id language group group_2 group_3 expe_phase trial_number problem category
#> <fct> <fct> <fct> <fct> <fct> <fct> <int> <int> <fct>
#> 1 acdn2… fr Typi… Typical Typical expe_bloc… 1 18 Spatial
#> 2 acdn2… fr Typi… Typical Typical expe_bloc… 2 25 Control
#> 3 acdn2… fr Typi… Typical Typical expe_bloc… 3 2 Visual
#> 4 acdn2… fr Typi… Typical Typical expe_bloc… 4 19 Control
#> 5 acdn2… fr Typi… Typical Typical expe_bloc… 5 1 Visual
#> 6 acdn2… fr Typi… Typical Typical expe_bloc… 6 10 Spatial
#> # ℹ 10 more variables: premise_1_rt <dbl>, premise_2_rt <dbl>,
#> # premise_3_rt <dbl>, conclusion_rt <dbl>, rt_total <dbl>, response <fct>,
#> # correct_response <fct>, accuracy <int>, acc_perc <dbl>, median_rt <dbl>
head(clean_data$df_survey)
#> # A tibble: 6 × 112
#> id language age gender group group_2 group_3 country language_native
#> <fct> <fct> <int> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 acdn24772… fr 24 f Typi… Typical Typical fra fr
#> 2 ahos20623… fr 26 f Apha… Aphant… Aphant… fra fr
#> 3 anoo20152… fr 23 m Typi… Typical Typical fra fr
#> 4 arje91119… fr 26 f Typi… Typical Typical fra fr
#> 5 auzb74885… fr 25 f Typi… Typical Typical fra fr
#> 6 azcj31777… fr 28 m Hypo… Aphant… Hypoph… fra fr
#> # ℹ 103 more variables: language_usual <fct>, job <fct>, education <fct>,
#> # field <fct>, vviq_is_complete <lgl>, vviq_total_score <int>,
#> # vviq_q01 <int>, vviq_q02 <int>, vviq_q03 <int>, vviq_q04 <int>,
#> # vviq_q05 <int>, vviq_q06 <int>, vviq_q07 <int>, vviq_q08 <int>,
#> # vviq_q09 <int>, vviq_q10 <int>, vviq_q11 <int>, vviq_q12 <int>,
#> # vviq_q13 <int>, vviq_q14 <int>, vviq_q15 <int>, vviq_q16 <int>,
#> # osivq_is_complete <lgl>, osivq_object <dbl>, osivq_spatial <dbl>, …