EMBnet Norway presents: Biological sequence mining with MRS and EMBOSS
Course information
Duration: May 23rd - May 27th 2011 (week 21)
Daily schedule: 9:45 - 15:30 hours daily. Each day will start with a 1 hour lecture followed by hands-on tutorial exercises. Midday break between 11:45-12:30.
Location: University of Oslo, Blindern, IMBV building. Further detauls will be emailed to the course participants.
Course Materials:
Course structure:
- Day 1: The lecture will cover: Major Sequence databases, sequence formats and pipeline construction approaches. In the tutorial we will examine basic concepts and login to the command line environment.
- Day 2: The lecture will cover: Basic introduction to EMBOSS, how EMBOSS can be accessed via the command-line and web, what is the EMBOSS 'USA' identifier, how you can access EMBOSS indices, the EMBOSS 'seqret' command, limitations of EMBOSS sequence searching. In the tutorial we will demonstrate: How to login to the command-line and web based EMBOSS environments, how to use 'seqret' to convert sequences from one format to the other, how to interface between EMBOSS and blast to create sequence databases, how to construct simple pipelines.
- Day 3: The lecture will cover: Sequence Retrieval Systems (options, pros and cons), Introduction to MRS, using MRS,searching the MRS indices, accessssing MRS programmatically. In the tutorial, we will cover the following issues:exploring MRS sequence searches in detail, using MRS to examine disease genes, pipeline construction and programmatic MRS access.
- Day 4: This day will be used to solve questions/problems/issues with the tutorials and demonstrate some new EMBOSS features in relation to High Throughput Sequence datasets.
- Day 5: This day will be used by the course participants to discuss present cases related to their work, using EMBOSS and MRS.
Aims of the course:
- To familiarize the participants with the format of most sequence databanks (flat file databases).
- To familiarize the participants with the most commonly used sequence databases.
- To give an introduction to commonly used sequence formats.
- To show various ways of accessing sequence databanks, both command line and web interface.
- To teach participants powerful ways of mining sequences using command line EMBOSS and MRS with the goal of constructing their own pipelines.
- To work on real case studies.
Prerequisites
- Intermediate knowledge of the UNIX shell.
- Some basic scripting ability (Perl is preferred but other scripting languages might be applicable), to be able to modify perl scripts.
Summary
- There are 3 major nucleotide sequence databases available; EMBL, Genbank and DDBJ. In addition, there are several protein sequence databanks, such as UniProt, Swiis-Prot, TrEMBL and PIR. While these databanks often contain much of the same information, the format of them can be different. They are all offered as flat-file databases, which are human-readable files rather than entries in a relational database. For usability and ease of access, these text files are indexed using different tools, so that you can more easily retrieve the sequences you require. In this course, we will look at this in a bit more detail, and look at different indexing services, and the tools that are available for retrieving sequences.
- There are many services which offer web interfaces for retrieving sequences, such as SRS, EMBOSS(EMBOSS Explorer/wEMBOSS) and MRS. We will be looking at these, and compare their functionality. However, if you wish to retrieve many sequences, it is very cumbersome to have to download them one at a time from a web interface. For those cases, we will also look at the command line interfaces for these services, and show how they can easily be used to automate the retrieval of the information, as well as processing the results after they have been retrieved.
- The course will have exercises in utilizing the various tools, and participants are encouraged to use the tools for their own projects on the last day, when they can get help with the use of the systems if needed.
If you have any questions, or wish to sign up, please contact us
here.