Our university IT folks encourage employees to use their box account for data storage, including of sensitive (human subjects research data, medical records, etc.) files. I wasn't pleased to see the Box AI button appear, and asked our IT what exactly it does, and how it impacts file privacy.
We went through several rounds of messages, including these responses: "Yes, the HIPAA protections are still in place with the BOX-AI application. Box AI securely engages an external AI provider to compute embeddings from these text chunks and the user’s question. Advanced embeddings models, such as Azure OpenAI’s ada-02, are utilized for this purpose." and "Box does not allow any access to data to outside vendors other than the isolated environment used to process the data. No retained data is being allowed. The data is processed in a bubble, then the bubble is destroyed when completed essentially."MVPA Meanderings
musings on fMRI analysis, quality control, science, ...
Monday, March 10, 2025
"fun" with Box AI information leakage
Friday, January 10, 2025
intro to working with volume and surface brain data
When preparing to update my surface (gifti) tutorial knitr, I realized that some of the confusion I was trying to address wasn't due to the code so much as the general differences between volume and surface datasets. This post tries to fill in some of those gaps for people using my R scripts. Much of the information in this post is general, but there are many topics I am not going to cover at all - what I describe here is not the only way to work with fMRI brain data; there are many other file formats and software packages.
first, volumes.
library(RNifti); # package for NIfTI image reading https://github.com/jonclayden/RNifti
fname <- paste0(in.path, "S1200_AverageT1w_81x96x81.nii.gz"); # https://osf.io/6hdxv/
if (file.exists(fname)) { under.img <- readNifti(fname); } else { stop(paste("missing:", fname)); }
str(under.img)
# 'niftiImage' num [1:81, 1:96, 1:81] 0 0 0 0 0 0 0 0 0 0 ...
# - attr(*, "pixdim")= num [1:3] 2.4 2.4 2.4
# - attr(*, "pixunits")= chr [1:2] "mm" "Unknown"
# - attr(*, ".nifti_image_ptr")=<externalptr>
# - attr(*, ".nifti_image_ver")= int 2
under.img[25,56,37]
# [1] 605.887
why surfaces?
introduction to gifti files
library(gifti); # for readGIfTI https://github.com/muschellij2/gifti
# surface "underlay anatomy"
fname <- paste0(in.path, "fsaverage5_lh.inflated_avg.surf.gii"); # https://osf.io/nxpa7/
if (file.exists(fname)) { surf.L <- readGIfTI(fname); } else { stop("can't find fsaverage5_lh.inflated_avg.surf.gii"); }
fname <- paste0(in.path, "fsaverage5_rh.inflated_avg.surf.gii"); # https://osf.io/24y9q/
if (file.exists(fname)) { surf.R <- readGIfTI(fname); } else { stop("can't find fsaverage5_rh.inflated_avg.surf.gii"); }
# Schaefer parcellation "overlay" surfaces for each hemisphere
fname <- paste0(in.path, "Schaefer2018_400Parcels_7Networks_order_10K_L.label.gii"); # https://osf.io/7wk8v/
if (file.exists(fname)) { parcels.L <- readGIfTI(fname); } else { stop(paste("missing:", fname)); }
fname <- paste0(in.path, "Schaefer2018_400Parcels_7Networks_order_10K_R.label.gii"); # https://osf.io/7tms8/
if (file.exists(fname)) { parcels.R <- readGIfTI(fname); } else { stop(paste("missing:", fname)); }
str(surf.L$data);
# List of 2
# $ pointset: num [1:10242, 1:3] -0.685 22.359 41.099 9.658 -36.773 ...
# $ triangle: int [1:20480, 1:3] 0 0 0 0 0 3 3 3 4 4 ...
str(surf.R$data)
# List of 2
# $ pointset: num [1:10242, 1:3] -5.74 8.43 43.5 12.6 -40.33 ...
# $ triangle: int [1:20480, 1:3] 0 0 0 0 0 3 3 3 4 4 ..
str(parcels.L$data)
# List of 1
# $ labels: int [1:10242, 1] 57 81 49 180 40 132 14 22 0 143 ...
str(parcels.R$data)
# List of 1
# $ labels: int [1:10242, 1] 61 81 46 110 111 200 21 95 104 119 ...
using and plotting giftis
vals.L <- rep(0, 10242); # start with all vertices 0
map.L <- gifti.map(surf.L.infl$data$pointset, surf.L.infl$data$triangle, vals.L); # map vals.L onto left surface
vals.R <- rep(0, 10242); # start with all vertices 0
vals.R[1800] <- 1; # change vertex 1800 to 1
map.R <- gifti.map(surf.R.infl$data$pointset, surf.R.infl$data$triangle, vals.R); # map vals.R onto right surface
# plot lateral and medial of both hemispheres
plot.surface(map.L, ttl="left hemisphere, blank (all vertices 0)", pos.lims=c(0.5,5.5), which.surface='fsaverage5');
plot.surface(map.R, ttl="right hemisphere, vertex 1800 in red", pos.lims=c(0.5,5.5), which.surface='fsaverage5');
vals.L <- rep(0, 10242); # start with all vertices 0
inds <- which(parcels.L$data[[1]] == 60); # find indices; where 60s are in the 10,242-element vector parcels5.L$data
vals.L[inds] <- 4; # change indices in vals.L for parcel 60 to 4
map.L <- gifti.map(surf.L.infl$data$pointset, surf.L.infl$data$triangle, vals.L); # map vals.L onto left surface
map.R <- gifti.map(surf.R.infl$data$pointset, surf.R.infl$data$triangle, parcels.R$data[[1]]); # map all parcels.R vertices onto right surface
# plot lateral and medial of both hemispheres
plot.surface(map.L, ttl="left hemisphere, parcel 60 in orange", pos.lims=c(0.5,5.5), which.surface='fsaverage5'); # plot lateral and medial, left
plot.surface(map.R, ttl="right hemisphere, parcels 1 to 200", pos.lims=c(0.5,200.5), which.surface='fsaverage5'); # plot lateral and medial, right
Friday, December 6, 2024
"crescent" artifacts revisited
link to cortex volume?
only with DMCC or its MB4 acquisition?
but does it matter?
thoughts?
Tuesday, June 25, 2024
tutorial: making new versions of NIfTI parcellation files
new NIfTI with one parcel only
library(RNifti); # package for NIfTI image reading https://github.com/jonclayden/RNifti
rm(list=ls()); # clear memory
path <- "//storage1.ris.wustl.edu/tbraver/Active/MURI/BraverLab/JoWorking/tutorial/"; # path to input images
p.img <- readNifti(paste0(path, "Schaefer2018_400x7_81x96x81.nii.gz")); # read the parcellation image
# make a new nifti with only parcel 24
new.img <- array(0, dim(p.img)); # blank "brain" of zeros, same size as p.img
new.img[which(p.img == 24)] <- 1; # voxels with value 24 in p.img set to 1
writeNifti(new.img, paste0(path, "p24.nii.gz"), template=paste0(path, "Schaefer2018_400x7_81x96x81.nii.gz"));
# note the template= in the writeNifti function: it specifies that the new file should have the same
# header settings as the original parcellation image; important for both to match properly.
new NIfTI with several parcels given the same value
# make a new nifti with parcels 24, 26, 370, and 231 all given the value 1.
p.ids <- c(24, 26, 231, 360); # parcel ids we want to combine
new.img <- array(0, dim(p.img)); # blank "brain" of zeros, same size as p.img
for (i in 1:length(p.ids)) { new.img[which(p.img == p.ids[i])] <- 1; } # set voxels in p.ids to 1
writeNifti(new.img, paste0(path, "four.nii.gz"), template=paste0(path, "Schaefer2018_400x7_81x96x81.nii.gz"));
new NIfTI with unique numbers for each parcel
# assign a value to each parcel, such as from the results of a statistical test performed on each parcel individually.
# note: 400 is the total number of parcels in the example parcellation. It is usually best not to hard-code
# the number of parcels, but it is here to keep the code as short as possible.
stat.vals <- rnorm(400); # random numbers to plot, one for each parcel
new.img <- array(0, dim(p.img)); # blank "brain" of zeros, same size as p.img
for (i in 1:400) { new.img[which(p.img == i)] <- stat.vals[i]; } # set voxels in parcel i to stat.value i
writeNifti(new.img, paste0(path, "stat.nii.gz"), template=paste0(path, "Schaefer2018_400x7_81x96x81.nii.gz"));
Thursday, April 4, 2024
Corresponding Schaefer2018 400x7 and 400x17 atlas parcels by number
My "default" cortical parcellation is the 400 parcels by 7 networks version of Schaefer2018. I like these parcellations because they're available in all the spaces we use (volume, fsaverage, fsLR (HCP)) and are independent from our analyses; "all parcellations are wrong, but some are useful".
The Schaefer parcellations come in several versions for each resolution, however, and a parcel described by its 7Networks number likely has a different 17Network number. Note that the parcel boundaries are the same for all same-resolution versions: there is only one set of 400 parcels, but which of those 400 parcels is #77 varies between the 7 and 17 network versions. This post describes how to translate parcel numbers between network versions, using a bit of (base) R code.
Logic: since there is only one set of parcels at each resolution, there is only one set of centroid coordinates at each resolution. Thus, we can match parcels across network orderings by centroids.
First, set sch.path to the location of your copy of the Schaefer2018 Parcellations directory and load the 7 and 17 network files: (apologies for the wonky code formatting)
cen7.tbl <- read.csv(paste0(sch.path, "MNI/Centroid_coordinates/Schaefer2018_400Parcels_7Networks_order_FSLMNI152_1mm.Centroid_RAS.csv"));
cen17.tbl <- read.csv(paste0(sch.path, "MNI/Centroid_coordinates/Schaefer2018_400Parcels_17Networks_order_FSLMNI152_1mm.Centroid_RAS.csv"));
Next, make vectors for translating the 7Network number to the 17Network number (and the reverse):
x17to7 <- rep(NA, nrow(cen7.tbl));
for (i in 1:nrow(cen7.tbl)) {
x7to17[i] <- which(cen17.tbl$R == cen7.tbl$R[i] & cen17.tbl$A == cen7.tbl$A[i] & cen17.tbl$S == cen7.tbl$S[i]);
x17to7[i] <- which(cen7.tbl$R == cen17.tbl$R[i] & cen7.tbl$A == cen17.tbl$A[i] & cen7.tbl$S == cen17.tbl$S[i]);
}
Now the vectors can be used to translate parcel numbers: parcel #77 in the 7Networks ordering is parcel #126 in the 17Networks ordering.
# ROI.Label ROI.Name R A S
#77 77 7Networks_LH_DorsAttn_Post_9 -33 -46 41
# ROI.Label ROI.Name R A S
#126 126 17Networks_LH_ContA_IPS_5 -33 -46 41
Note that the parcel's R A S coordinates are the same, but its name and label (number) vary between the two network versions.
Friday, February 23, 2024
OHBM 2023 links
- Brendan Williams, "Reproducible Decision Making for fMRI Quality Control"
- Céline Provins, "Quality control in functional MRI studies with MRIQC and fMRIPrep"
- Daniel Glen, "Quality control practices in FMRI analysis: philosophy, methods and examples using AFNI"
- Dan Handwerker, "The art and science of using quality control to understand and improve fMRI data" slides
- Chao-Gan Yan, "Quality Control Procedures for fMRI in DPABI"
- Rasmus Birn, "Quality control in resting-state functional connectivity: qualitative and quantitative measures"
- Xin Di, "QC for resting-state and task fMRI in SPM"
- Jo Etzel, "Efficient evaluation of the Open QC task fMRI dataset" video
- Rebecca Lepping, "Quality Control in Resting-State fMRI: The Benefits of Visual Inspection"
- Francesca Morfini, "Functional Connectivity MRI Quality Control Procedures in CONN"
I also presented a poster, "Which Acquisition? Choosing protocols for task fMRI studies", #700. Here's some of the introduction and conclusion for an abstract. The test data is already public; the code isn't written up properly, but I could share if anyone is interested.
When planning a task fMRI study, one necessary choice is the acquisition sequence. Many are available, with recommendations varying with hardware, study population, brain areas of interest, task requirements, etc.; it is rare to have only one suitable option. Acquisition protocols for task studies can be difficult to evaluate, since metrics like tSNR are not specific to task-related activity. But task designs can make choosing easier, since there is a known effect to compare the candidate acquisition protocols against.The procedure illustrated here will rarely make the choice of acquisition completely unambiguous, but can indicate which to avoid, and give the experimenters confidence that the chosen sequence will produce usable data. After choosing the acquisition, more piloting should be performed with the study tasks to confirm that image quality and response clarity are sufficient and as expected.
... it's taken me so long to finish this post (started August 2023!) that I'm publishing it without adding a proper explanation of the protocol-choosing logic. Take a look at the poster pdf, and please ask questions or otherwise nudge me for more information if you're interested.
Thursday, June 8, 2023
US researchers: use extra caution with participant gender-related info
Last summer I wrote several times about pregnancy-related data suddenly becoming much more sensitive and potentially damaging to the participant if released. Unfortunately, now we must add transgender status, biological sex at birth (required, e.g., for GUID creation), gender identity, relationship status, and more to the list of data requiring extra care.
The NIH Certificate of Confidentiality thankfully means that researchers can't be required to release data on something like someone's abortion or transgender history for criminal or other proceedings. We are responsible for ensuring that our data is handled and stored securely, so that it isn't accidentally (or purposely) shared and then possibly cause the participant harm. I suggest researchers review their data collection with an eye towards information that may be more sensitive now than it was in the past; is this information required? If so, consider how to store and use it securely and safely.
Also consider what you ask potential participants during screening (and how screening is done in general): people may not wish to answer screening questions about things like pregnancy or hormone treatments. If these possibly-sensitive questions must be asked, consider how to do so while minimizing potential discomfort or risk.
For example, one of our studies can't include pregnant people, so we must ask about pregnancy during the initial phone screenings. We used to ask potential participants about pregnancy separately, but then changed the script so that this sensitive question was in a list, and the participant is asked if any apply. This way, the participant doesn't have to state explicitly that they are pregnant, and the researcher doesn't have any specific notes, or even respond in a specific way (e.g., we don't want them to say something like "sorry Jane Doe, but you can't be in our study now because you're pregnant").
Here's the relevant part of the new screening script:
In order to determine your eligibility for our study, I need to ask you some questions. This will take about 15 minutes.Before we collect your demographic information, I will ready you a list of four exclusionary criteria. If any of these describe you, please answer yes after I read them all; if none apply, please answer no. You do not need to say which ones apply.
- You are a non-native English speaker (you learned to speak English as an adult);
- You are over the age of 45 or under the age of 18;
- You are pregnant or breastfeeding;
- You were born prematurely (before 37 weeks, or if twin, before 34 weeks)
Do any of these describe you? Yes (I am sorry, you do not qualify to be in our study) or No (continue with questions)