Skip to contents

This function retrieves and cleans the data for the experiment and survey. It uses several helper functions to filter and format the data, including filter_random_accuracy_ids(), filter_manually_identified_ids(), filter_suspicious_rt_ids(), factor_categories(), factor_groups(), factor_chr_vars(), factor_strategies(), and compute_nieq_scores(). The cleaned data is returned as a list containing two data frames: df_expe and df_survey. The df_expe data frame contains the cleaned experiment data, while the df_survey data frame contains the cleaned survey data.

Usage

get_clean_data(
  n_groups = 2,
  exclude_no_vviq = TRUE,
  exclude_no_osivq = TRUE,
  exclude_no_raven = TRUE,
  exclude_cheated = TRUE,
  exclude_distracted = TRUE,
  exclude_treatment = FALSE,
  exclude_adhd = FALSE,
  exclude_asd = FALSE,
  exclude_dyslexia = FALSE,
  exclude_other = FALSE,
  sd_mult = 2.25,
  verbose = FALSE
)

Arguments

n_groups

The number of groups to factor in the data. Must be 2, 3 or 4. 2 divides the sample into Aphants and Typical imagers using the 32 VVIQ criterio, 3 divides the sample into Aphants (VVIQ = 16), Hypophants (VVIQ < 32) and Typical imagers, and 4 also isolates Hyperphants with VVIQ > 75.

exclude_no_vviq

Logical, whether to exclude participants without VVIQ.

exclude_no_osivq

Logical, whether to exclude participants without OSIVQ.

exclude_no_raven

Logical, whether to exclude participants without Raven.

exclude_cheated

Logical, whether to exclude participants who have cheated (based on self-report).

exclude_distracted

Logical, whether to exclude participants who have been distracted (based on self-report).

exclude_treatment

Logical, whether to exclude participants who have a treatment for a neurological or psychiatric disorder.

exclude_adhd

Logical, whether to exclude participants who have ADHD.

exclude_asd

Logical, whether to exclude participants who have ASD.

exclude_dyslexia

Logical, whether to exclude participants who have dyslexia.

exclude_other

Logical, whether to exclude participants who have other neurological troubles.

sd_mult

A numeric value indicating how many standard deviations to use for identifying suspicious median RTs. The default is 2.25, which means that median RTs that are more than 2.25 standard deviations inferior to the mean are considered suspiciously fast and potential "spamming".

verbose

A logical value indicating whether to print verbose messages about the filtering process. Default is FALSE.

Value

A list containing two data frames:

  • df_expe: The cleaned experiment data.

  • df_survey: The cleaned survey data.

Examples

clean_data <- get_clean_data(verbose = TRUE)
#> 
#> Sample size before accuracy analysis: 137
#> Participants below random accuracy (<= 50%): 8 (5.84%)
#> 
#> Sample size before manual examination: 137
#> Manually identified participants:
#> - N without VVIQ: 3 -> Excluded
#> - N without OSIVQ: 6 -> Excluded
#> - N without Raven: 2 -> Excluded
#> - N who cheated: 3 -> Excluded
#> - N who were distracted: 12 -> Excluded
#> - N who had treatment: 4 -> Included
#> - N with ADHD: 7 -> Included
#> - N with ASD: 5 -> Included
#> - N with dyslexia: 2 -> Included
#> - N with other neuro troubles: 2 -> Included
#> Participants to exclude: 24 (17.52%)
#> 
#> Sample size before median RTs analysis: 106
#> Participants with median RTs outside 2.25 SDs: 2 (1.89%)
head(clean_data$df_expe)
#> # A tibble: 6 × 19
#>   id     language group group_2 group_3 expe_phase trial_number problem category
#>   <fct>  <fct>    <fct> <fct>   <fct>   <fct>             <int>   <int> <fct>   
#> 1 acdn2… fr       Typi… Typical Typical expe_bloc…            1      18 Spatial 
#> 2 acdn2… fr       Typi… Typical Typical expe_bloc…            2      25 Control 
#> 3 acdn2… fr       Typi… Typical Typical expe_bloc…            3       2 Visual  
#> 4 acdn2… fr       Typi… Typical Typical expe_bloc…            4      19 Control 
#> 5 acdn2… fr       Typi… Typical Typical expe_bloc…            5       1 Visual  
#> 6 acdn2… fr       Typi… Typical Typical expe_bloc…            6      10 Spatial 
#> # ℹ 10 more variables: premise_1_rt <dbl>, premise_2_rt <dbl>,
#> #   premise_3_rt <dbl>, conclusion_rt <dbl>, rt_total <dbl>, response <fct>,
#> #   correct_response <fct>, accuracy <int>, acc_perc <dbl>, median_rt <dbl>
head(clean_data$df_survey)
#> # A tibble: 6 × 112
#>   id         language   age gender group group_2 group_3 country language_native
#>   <fct>      <fct>    <int> <fct>  <fct> <fct>   <fct>   <fct>   <fct>          
#> 1 acdn24772… fr          24 f      Typi… Typical Typical fra     fr             
#> 2 ahos20623… fr          26 f      Apha… Aphant… Aphant… fra     fr             
#> 3 anoo20152… fr          23 m      Typi… Typical Typical fra     fr             
#> 4 arje91119… fr          26 f      Typi… Typical Typical fra     fr             
#> 5 auzb74885… fr          25 f      Typi… Typical Typical fra     fr             
#> 6 azcj31777… fr          28 m      Hypo… Aphant… Hypoph… fra     fr             
#> # ℹ 103 more variables: language_usual <fct>, job <fct>, education <fct>,
#> #   field <fct>, vviq_is_complete <lgl>, vviq_total_score <int>,
#> #   vviq_q01 <int>, vviq_q02 <int>, vviq_q03 <int>, vviq_q04 <int>,
#> #   vviq_q05 <int>, vviq_q06 <int>, vviq_q07 <int>, vviq_q08 <int>,
#> #   vviq_q09 <int>, vviq_q10 <int>, vviq_q11 <int>, vviq_q12 <int>,
#> #   vviq_q13 <int>, vviq_q14 <int>, vviq_q15 <int>, vviq_q16 <int>,
#> #   osivq_is_complete <lgl>, osivq_object <dbl>, osivq_spatial <dbl>, …