This is an introduction to the PHYLIP package of phylogeny inference programs. PHYLIP is a public domain package written by Joe Felsenstein and composed of a large number of programs, which makes it versatile and very powerful.
This Guide is only a simple roadmap to PHYLIP. Useful tips and reminders will be dropped along the way. A basic understanding of phylogeny, evolutive mechanisms and the different theoretical approaches employed in evolutionary analysis is assumed.
PHYLIP programs read their data from a file that should be named "infile". The general format of this file is:
4 40 W  W 0101001111 0101110101 0101110011  1101010110 dmras1 GTCGTCGTTG GACCTGGAGG CGTGGGCAAG  spras GTAGTTGTAG GAGATGGTGG TGTTGGTAAA scras1 GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA scras2 GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA TCCGCGCTCA AGTGCTTTGA TCTGCTTTAA TCTGCTTTGA 1  ((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));
The first line contains the number of species, the number of characters and, possibly, one or more program options. Check the manual of each program to see which options are available.
If any option requires extra information, add it using lines that start with the option's letter followed by all the data.
Next comes the species and character data in separate lines. Each line starts with 10 letters or symbols reserved for the species name and is followed by the characters to analyze. If the characters require more than one line you may use either a sequential or interleaved (like the above example) format.
You may provide one or more user trees to guide the analysis process (option "U" in the program menu): put the number of trees in one line and each subsequent tree in a separate line. Note that some programs (e.g. fitch) require rooted user trees whereas others (e.g. dnapars) need unrooted user trees instead. See the manuals for details.
PHYLIP does not provide programs to generate multiple alignments or distance matrices. These must be built by hand (remember to save the file in "Text Only" or ASCII format) or through the use of other programs like, e.g. Clustal or TreeAlign (be sure to select PHYLIP output format for the data file, inspect it for correctness and rename it to infile).
Many PHYLIP programs use the output of other programs in the package to further process the results. You should rename the appropriate output file (outfile or treefile) to infile before continuing.
A few programs are intended exclusively for converting or preprocessing PHYLIP input data for use with other programs:
|factor||recodes multistate characters into binary datasets.|
|seqboot||bootstraps input data sets (molecular sequences, binary characters, restriction sites or gene frequencies).|
When run, each program will display on your terminal its role, its version and a menu. Your screen will usually look like this:
Nucleic acid sequence Maximum Likelihood method, version 3.572c Settings for this run: U Search for best tree? Yes T Transition/transversion ratio: 2.0000 F Use empirical base frequencies? Yes C One category of substitution rates? Yes G Global rearrangements? No J Randomize input order of sequences? No. Use input order O Outgroup root? Yes, at seq. Number 1 M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change)
To change any option type the corresponding letter from the left column in the menu. You will be prompted for additional information as needed. Numbered options control what results the program will show on your screen and/or the output files and how they will be shown.
Once you are happy with the options selected press "Y" to go on.
Next you have to select the program to use, depending on the type of characters with which you are working, what you want to find out and your preferred evolutionary theory/analysis method:
|dnapars||unrooted parsimony counting base changes by the method of Fitch.|
|dnapenny||finds most parsimonious trees by branch and bound.|
|dnainvar||computes phylogenetic invariants (evolutionary parsimony) for four species using Cavender's method.|
|dnaml||implements the maximum likelihood method|
|dnamlk||applies ML assuming trees must be consistent with a molecular clock, i.e. the leaves are all equidistant from the root.|
|dnacomp||chooses the tree and topology to maximize the number of sites with minimum number of substitutions.|
|dnadist||builds a matrix using either of Jukes-Cantor, Kimura, Jin-Nei or maximum likelihood methods.|
|protpars||unrooted parsimony using a mixed approach of the methods of Fitch and Eck-Dayhoff.|
|protml||unsupported program developed by © Jun Adachi and Masami Hasegawa for ML analysis of amino acid data.|
|protdist||builds a distance matrix by using either Dayhoff's PAM matrix, Kimura's distance or a categories distance.|
|restml||phylogeny reconstruction from restriction sites (not restriction fragments) data by the method of Smouse and Li.|
|contml||uses a Brownian motion model to reconstruct phylogeny by a restricted maximum likelihood method.|
|gendist||computes a distance matrix using Nei's genetic distance, Cavalli-Sforza's chord measure or Reynolds, Weir, and Cockerham's genetic distance.|
|mix||general program mixing Wagner and Camin-Sokal criteria for each character separately.|
|penny||finds all most parsimonious trees applying a branch and bound search strategy.|
|dollop||applies the Dollo and Polymorphism analysis methods.|
|dolpenny||finds all most parsimonious trees of your data using the Dollo and polymorphism parsimony methods applying a branch and bound search.|
|clique||compatibility method for unrooted 2-state characters.|
|fitch||infers phylogenies using Fitch-Margoliash, least-squares of Cavalli-Sforza and Edwards or other similar methods of the same family.|
|kitsch||builds phylogenies using same methods as fitch but assuming an evolutionary clock and that all leaves of the tree are contemporaneous.|
|neighbor||uses the neighbor-joining method of Nei and Saitou and the UPGMA methods to successively cluster lineages.|
Most programs write a report into a file named "outfile" and a representation of the trees found in a file named "treefile". You should rename these files if you want to preserve the results, otherwise they will be overwritten by the next program you run.
Contents of outfile vary from program to program and depending on the output options selected. Usually it contains the name of the program, the input data and the phylogenies plus associated information. Note that in most cases, the produced trees are unrooted.
The trees contained in the treefile are represented in a standard format that uses parentheses to indicate grouping of species. PHYLIP provides programs to construct, manipulate and print trees in graphic format:
|retree||allows interactive construction and manipulation of trees (topology, branch lengths, labels, etc..).|
|drawgram||draw a cladogram or phenogram of a rooted tree.|
|drawtree||draws unrooted phylogenies on a variety of output devices.|
After you select a text font, you will be able to play with the tree interactively using these programs; once you are satisfied with the tree, write it out to a "plotfile" in the format of your choice. This file can then be printed on the appropriate device.
|dnamove||interactive reconstruction of DNA evolution using parsimony.|
|move||interactive reconstruction of 0-1 discrete data evolution using Wagner and Camin-Sokal mixed parsimony (as in mix).|
|dolmove||interactive phylogeny reconstruction using the Dollo and Polymorphism parsimony criteria.|
|contrast||using quantitative characters and a tree, produce independent contrasts, correlations and statistics for those characters; results can be further analyzed in any multivariate statistics package.|
In general, you should do a bootstrap analysis to assert the significance of the results of your phylogenetic analysis. This can easily be done using the following programs:
|seqboot||allows to resample data sets by the bootstrap, jackknife or permutation methods.|
|consense||finds the consensus of a number of trees using Ml methods (including majority-rule and strict consensus).|
To do a bootstrap analysis follow these steps:
Run seqboot on the input dataset, select a shuffling method and specify at least 100 (better 1000 or more) replicates. Do not select option "1" to avoid interference with programs run at the next step. Rename outfile to infile.
Run the phylogeny analysis program of your choice depending on your data type, desired analysis and preferred method with your desired options. Select option "M" (multiple data sets) and enter the number of replicates generated with seqboot.
If you generated distance matrices in the previous step you need to rename outfile to infile and run one of the Distance Matrix programs (fitch, kitsch or neighbor) now to generate the trees.
Rename the treefile to infile and run consense to evaluate the significance of your analysis.
Note that consense gives the bootstrap support for branching order at each branch of the produced tree and not actual branch lengths.
PHYLIP © 1986-1995 by Joseph Felsenstein and the University of Washington (http://evolution.genetics.washington.edu/).