- Python 3.8
We'll describe our setup using Python virtual environments:
-
Create and activate your Python virtual environment.
a. For the commands to do this, see: https://docs.python.org/3/library/venv.html
-
Install all package dependencies using
pip install -r requirements.txt
. -
If you plan on committing code, please update the .gitignore so that it includes your virtual environment folder.
You can do this through the ownCloud interface. But we also have written a script to do so.
To download the last 10 trays from ownCloud, simply run:
python download_owncloud_tray_imgs.py --num_trays 10
If you want to download just one tray, run:
python download_owncloud_tray_imgs.py --folder1 20210603_11_58
Let's say our image folder is at: ./_full_tray_imgs/20210603_11_58
We can run the following command:
python experiment_runner.py --config_file ./configs/ml_test_config.yml --experiment_dir ./_full_tray_imgs/20210603_11_58/experiments --img_folder ./_full_tray_imgs/20210603_11_58 --ml_model_type ngboost
Don't worry if you see warning messages -- those are normal.
Let's say all of our image folders are located within : ./_full_tray_imgs
.
To run our analysis on every image folder in a directory, we can use the following command:
bash analyze_experiments.sh ./_full_tray_imgs ./configs/ml_test_config.yml
This bash script searches in a local ./history/processed_folders.txt
, so it won't process folders that have already been analyzed. If you want to re-analyze a folder, simply remove it's name from the ./history/processed_folders.txt
file.
If you want to plot the tray at different processing steps (e.g, after adaptive thresholding), you can modify the steps
attribute in the plot_well_matrix
dictionary. See the in-line comments there for more details.
For each processed image folder, we save it's Python data structure in ./_saved_well_matrices
. If you are done processing an image folder, and don't plan on processing it again, you can delete it's .h5
file inside this folder.
NOTE: If you are only interested in running our algorithm, you can disregard the below. However, if you want to understand how the pipeline works, from updating the website and pulling labels, to training ML models, see the below
The project starts with experiment_runner.py
. This file reads provided ml_test_config.yml
config, and generates experiment results in an experiments
folder. The code is modularized across the following folders:
feature_extraction
: contains the code that does all image processing, and creates feature vectors for machine learningplotting_and_visuals
: code for plotting metrics of the tray over time (rsd, blob area, agglutination score), as well as visualizing the wellstraining_ml
: code which trains the best-performing ML models as a .pkl in the main directory (explained more below).testing_ml
: contains helper functions called when using the ML model for evaluation.well_matrix_creation
: all datastructure code regarding how we store individual wells and well metadataconfigs
: contains theml_test_config.yml
history
: contains .txt files which are read byanalyze_experiments.sh
to avoid duplicate work
To update the website, we use download_owncloud_tray_imgs.py
and download_process_upload.sh
. We do the following:
-
Pick the trays you want to download from ownCloud (e.g,
20210603_11_58
), and download them usingdownload_owncloud_tray_imgs.py
-
Clone the
agglutination-data-storage
repo from Gitlab (https://gitlab.com/wheeler-microfluidics/agglutination-image-db-indiv-well-images) -
Run
download_process_upload.sh
, which partition the downloaded wells into images, saves each image as a .png in theagglutination-data-storage
repo, pushes the new images, and updates the spreadsheet
- The following is how we run the command:
bash download_process_upload.sh ../_full_tray_imgs ../../agglutination-data-storage/indiv_well_imgs
We'll mostly focus on the code inside the training_ml
folder.
pull_labels.py
will pull all labels from the spreadsheet, and analyze those labels. It gets confusion matrices from all annotators, and automatically marks points of disagreement in a .txt with their URLs
create_dataset.py
creates a dataset from the labels, doing the following:
-
Downloads the non-present full_tray_images that correspond to the labelled images.
-
Process the labelled trays into WellMatrix objects. Assign labels to the interior Well objects.
-
Save the labelled WellMatrix objects as .h5, with some metadata that denotes they're labelled.
-
Compute feature vectors, and match those with the labels. Save the raw data as .npz.
train.py
is used to train various ML models, and it saves the best performing one as a .pkl to be used by experiment_runner.py
.
- React, version 17 and above
- You can download this at the following link: https://react-cn.github.io/react/downloads.html
To set up your Google Spreadsheets API project, perform the following steps. Note: these steps are just taken from the following youtube video, from minute 8 to minute 20: https://www.youtube.com/watch?v=yhCJU4aqMb4
- Create a google sheets file in your google drive. Share this file with all associated collaborators.
- Go to console.cloud.google.com, and sign in with your gmail.
- Go to the drop-down selection, and click "New Project"
- Give the project a name, and create it
- Click "Enable API services"
- Type in 'sheets', select the "Google sheets API", and enable it
- Go back to your project, and select "Create credentials"
- Select the 'Google sheets API'
- For "What data will you be accessing", select "User data"
- Fill out the form for "OAuth Consent Screen" as normal
- Skip "Scopes" section
- For "OAuth Client ID", select "Web Application"
- For "Authorized JavaScript Origins", enter the URL of the domain you are hosting this web application on. For example, if you're hosting through GitHub/GitLab, it should be something like ______.github.io. Also add http://localhost:3000 for testing. Add the same domains for your redirect URIs.
- Now you can generate a CLIENT_ID. Copy this, and paste it in the config.js file.
- Click "Create Credentials" again, and this time select "API key"
- Copy the API key, and paste it in the config.js file.
- On the Google Cloud Platform home screen, click on "OAuth consent screen"
- Go to "Test users"
- Add all annotators to this list
Note: in general, you should not have these secrets on your frontend (they should be on a server). However you can specify configurations for your google API project that make it safe to do so.
You have now set up your web app to communicate with your google sheets!
After executing the steps above, type npm i
in your terminal (this installs all dependencies). Then, you can start the app anytime by entering npm start
.
You can use any service you want to deploy your image annotation app. The simplest and easiest setup is to deploy using github / gitlab pages. A guide on that can be found at the following link: https://www.youtube.com/watch?v=2hM5viLMJpA
This is where you can configure your specific application. In the config dictionary:
"appTitle": The title of the application shown at the top of the page,
"adminEmails": A list of emails who are admins of the app,
"sheetsConfig": {
"API_KEY": The API key defined above,
"CLIENT_ID": The client ID defined above,
"SCOPE": The scope of the app (should be left as default value)
},
"users": {
"[email protected]": {
"SPREADSHEET_ID": The ID of the spreadsheet to be mutated,
"SPREADSHEET_RANGE": The page in that spreadsheet
},
"default": default settings of the object described above
},
"annotationRules": An object of annotation rules which describe how things should be annotated.,
"referenceList": A list of img URLs that you can use for reference,
"imgHeight": The height of the annotated image,
"imgWidth": The width of the annotated image,
- All the admins can execute a "Flag as uncertain" action on any annotated image. All uncertain images will shown to the other annotators first.
- Side-by-side annotation rules
- Side-by-side reference images
- Ability to go back and re-annotate images
- Integration with Google Sheets for easy csv processing