Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrepExternalSchema command stuck at 98% completeness #212

Open
AnyaKovalenko opened this issue Jan 23, 2025 · 5 comments
Open

PrepExternalSchema command stuck at 98% completeness #212

AnyaKovalenko opened this issue Jan 23, 2025 · 5 comments
Assignees
Labels
Status: In Progress Has been assigned and is being worked on.

Comments

@AnyaKovalenko
Copy link

AnyaKovalenko commented Jan 23, 2025

Dear Community,

I’m encountering an issue with the PrepExternalSchema command. It initially worked very quickly, but after reaching 98% completeness, it got stuck and hasn’t progressed for over 24 hours.

Here’s the command I used:
chewBBACA.py PrepExternalSchema -g /mnt/d/Bacteria/cgMLST/schema-v1/ -o validated_schema/ --ptf reference_training.trn --cpu 16

I downloaded the N. gonorrhoeae cgMLST v1.0 schema (fasta file) from PubMLST as part of this process. Currently, in the output I have 1641 fasta file per locus and three folders : <dummy_dir>, <NEIS1452_temp>, and short. The short folder also contains 1641 fasta per locus. I am wondering if the process might be stuck while analysing the remaining loci (eight), as the total is 1649 loci?

Could there be something I did wrong? I would greatly appreciate any help in resolving this issue. Very looking forward to receiving your feedback. Thank you !

this is how it looks now:

==================================
  chewBBACA - PrepExternalSchema
==================================
Started at: 2025-01-22T21:38:32

Could not remove validated_schema/dummy_dir
Number of cores: 16
BLAST Score Ratio: 0.6
Translation table: 11
Using a minimum length value of 0 for schema adaptation and 0 to store in the schema config file.
Using a size threshold value of None for schema adaptation and 0.2 to store in the schema config file.
Number of loci to adapt: 1649

Determining the total number of alleles and allele mean length per gene...

Adapting 1649 loci...

 [=================== ] 98%
@AnyaKovalenko
Copy link
Author

Dear Community,

Update on the issue: It is now showing the following error:

Error while running BLASTp for /mnt/d/Gonorrhea/cgMLST/validated_schema/NEIS1452_temp/NEIS1452_rep_protein.fasta
/home/annako/miniconda3/envs/chewie/bin/blastp returned the following error:
b"terminate called after throwing an instance of 'std::__ios_failure'\n  what():  basic_ios::clear: iostream error\n"
 [=================== ] 99%

thank you!

@AnyaKovalenko
Copy link
Author

Dear Community,

Update on the issue: It is now showing the following error:

Error while running BLASTp for /mnt/d/Gonorrhea/cgMLST/validated_schema/NEIS1452_temp/NEIS1452_rep_protein.fasta
/home/annako/miniconda3/envs/chewie/bin/blastp returned the following error:
b"terminate called after throwing an instance of 'std::__ios_failure'\n  what():  basic_ios::clear: iostream error\n"
 [=================== ] 99%

thank you!

Ah now I see it happend because the external SSD has become disconnected or unmounted during the process. I do not understand why.

@rfm-targa rfm-targa self-assigned this Jan 23, 2025
@rfm-targa rfm-targa added the Status: In Progress Has been assigned and is being worked on. label Jan 23, 2025
@AnyaKovalenko
Copy link
Author

AnyaKovalenko commented Jan 24, 2025

Hello Community,
One more update: I removed the FASTA file for the NEIS1452 locus from the initial schema before running the command, and it seems to have completed successfully. However, I encountered a permission error afterward. Please see the details below.

What do you think why it was happen? Is it okay just to remove this locus from the analysis? Are there any steps I need to take to resolve this? Thank you so much for your help!

Number of invalid loci: 0
Wrote list of invalid loci to /mnt/d/Gonorrhea/cgMLST/validated_schema_invalid_loci.txt
Number of invalid alleles: 4232
Wrote list of invalid alleles to /mnt/d/Gonorrhea/cgMLST/validated_schema_invalid_alleles.txt
Wrote file with summary statistics to /mnt/d/Gonorrhea/cgMLST/validated_schema_summary_stats.tsv

Successfully adapted 1648/1648 loci present in the input schema.
Traceback (most recent call last):
  File "/home/annako/miniconda3/envs/chewie/bin/chewBBACA.py", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/annako/miniconda3/envs/chewie/lib/python3.11/site-packages/CHEWBBACA/chewBBACA.py", line 1543, in main
    functions_info[process][1]()
  File "/home/annako/miniconda3/envs/chewie/lib/python3.11/site-packages/CHEWBBACA/utils/process_datetime.py", line 146, in wrapper
    func(*args, **kwargs)
  File "/home/annako/miniconda3/envs/chewie/lib/python3.11/site-packages/CHEWBBACA/chewBBACA.py", line 1070, in run_adapt_schema
    shutil.copy(args.ptf_path, schema_path)
  File "/home/annako/miniconda3/envs/chewie/lib/python3.11/shutil.py", line 432, in copy
    copymode(src, dst, follow_symlinks=follow_symlinks)
  File "/home/annako/miniconda3/envs/chewie/lib/python3.11/shutil.py", line 313, in copymode
    chmod_func(dst, stat.S_IMODE(st.st_mode))
PermissionError: [Errno 1] Operation not permitted: '/mnt/d/Gonorrhea/cgMLST/validated_schema/NCCP11945_training.trn'

@rfm-targa
Copy link
Contributor

Hello @AnyaKovalenko,

Thank you for reporting this issue and all the details. We tried to adapt the N. gonorrhoeae cgMLST schema from PubMLST and found the same issue. Two loci are challenging to adapt: NEIS1452 and NEIS2608. These loci are small, and the sequences have a lot of variation or repeats, which forces the PrepExternalSchema module to iterate a lot to find a set of representative alleles that adequately capture the diversity of each locus. A few differences or a variable number of repeats in small sequences can cause a significant variation in the value used to compare sequences and select representative alleles. The NEIS1452 locus is the most problematic due to the sequence size and content variation, forcing the PrepExternalSchema to compare a lot of potential representative alleles since the alignment score for most alleles is below the threshold.
As you did, we suggest removing locus NEIS1452 from the schema before adapting it with PrepExternalSchema. The IO and permissions issues while copying the training file to the schema's directory are probably related to your system. I recommend you check if the user you are using has permissions to copy the training file. After removing locus NEIS1452 and solving the issue with permissions, you should be able to adapt the schema without issues.
Let us know how it goes.

Best regards,

Rafael

@AnyaKovalenko
Copy link
Author

Hello Rafael,

Thank you so much for your quick feedback and support. It’s truly very helpful.

kind regards,
Anya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Has been assigned and is being worked on.
Projects
None yet
Development

No branches or pull requests

2 participants