Muscato is a software tool for matching a collection of read sequences into a collection of target sequences (e.g. gene sequences). The approach scales efficiently to hundreds of millions of reads and target sequences. A major goal of Muscato is to perform exhaustive multi-mapping, meaning that each read is mapped to as many gene sequences as possible, subject to specified match quality constraints.
Installation
Muscato is written in Go and uses several of the Gnu core utilities. It should run on any Unix-like system on which the Go tool and Gnu utilities are available.
In most cases, installation of Muscato should only require running the following commands in the shell:
go get github.com/kshedden/muscato/...
go get github.com/kshedden/sztool/...
The executables for Muscato and its auxiliary scripts should appear in your GOBIN directory (usually ${HOME}/go/bin if installed in a user account). You will need to add GOBIN to your PATH environment variable when using Muscato. If you are using the Bash shell enter the following lines at the shell prompt, or add them to your .bashrc file to make the changes permanent.
export GOPATH=${HOME}/go
export GOBIN=${GOPATH}/bin
export PATH=${GOBIN}:${PATH}
The easiest way to update Muscato is to run rm -r
on your muscato
source directory (in go/src/github.com/kshedden), then reinstall as
above.
Basic usage
Before running Muscato, you should prepare a version of your target
sequence file using the muscato_prep_targets
program. If your
targets are in a fasta format file, you can simply run:
muscato_prep_targets genes.fasta
Instead of using a fasta input file, it is also possible to use a
plain text file with the format id<tab>sequence<newline>
for each
target sequence. The sequence should consist of the upper-case
characters A, T, G, and C. Any other letters are replaced with 'X'.
The muscato_prep_targets
script accepts a -rev
flag in which
reverse complement target sequences are added to the database along
with the original sequences.
After building the target datafile, you can run muscato. A basic invocation is:
muscato --ReadFileName=reads.fastq --GeneFileName=genes.fasta.sz --GeneIdFileName=genes_ids.txt.sz\
--Windows=0,20 --WindowWidth=15 --MaxReadLength=100
Note that the target files genes.fasta.sz
and genes_ids.sz
were
produced by the muscato_prep_targets
script, run as shown above.
Many other command-line flags are available, run muscato --help
for
more information. The output of muscato --help is here.
The results by default are written to a file named results.txt
, a
tab delimited file with the following columns:
-
Read sequence
-
Matching subsequence of a target sequence
-
Position within the target where the read matches (counting from 0)
-
Number of mismatches
-
Target sequence identifier
-
Target sequence length
-
Number of copies of the read in the read pool
-
Read identifier
The tool also generates a fastq file containing all non-matching reads.
Logging
Several log files are written to the directory muscato_logs/#####
,
where ##### is the same unique id used for the temporary files.
High-level logging messages are written to 'muscato.log'. More
detailed logging information is written to logs specific to each
component of the tool, e.g. 'muscato_screen.log'.
Temporary workspace
Muscato uses a temporary directory for intermediate and logging files,
by default named muscato_tmp/######
, where ###### is a unique id
generated by Muscato. If NoCleanTemp
is set to false (the default),
this directory is
automatically deleted after completion of the muscato run, otherwise
it is retained. If retained, the temporary directory can be safely
deleted when desired.
Testing
There is currently a small collection of unit tests in the tests
directory. To run the tests, enter the test directory and type:
go run test.go
Any errors will be printed to the terminal. Detailed results of the
tests are written to the file test.log
.
Dependencies
Muscato has the following dependencies. The sztool package must me
installed manually with go get, as shown above. All other
dependencies should be automatically installed by go get
when
installing muscato.
github.com/kshedden/sztool
github.com/chmduquesne/rollinghash
github.com/golang-collections/go-datastructures/bitarray
github.com/golang/snappy
github.com/willf/bloom
Issues and feedback
Please file an issue if you encounter any difficulties.