python - Algorithm to decide cut-off for collapsing this tree? -


I have a tree that is composed by comparing the potential DNA position of the weight matrix (PWM or PSSM) similarity (Euclidean distance) There have been regulatory motifs which are 4-9 bp long DNA sequences.

An interactive version of the tree is on iTol (), which you can easily play - just after setting your parameters, just press "Update Paste":

Enter image details here

My specific goal: Terminal nodes / leaves ) Together with their average distance in the nearest parent clade & lt; X (). It is biologically interesting because some genes regulatory DNA framework can be compatible with each other (parologic or orthologic). It can briefly be done through the link iTol GUI, e.g. If you choose X = 0.001 then some motifs fall into the triangle (Motif families).

My question: Can anyone suggest an algorithm that will either produce or visualize which value of X collapsed "Max. Biological or statistical relevance "? Ideally, there will be some obvious step in some of the tree property when plotted against X, which suggests a sensible X to the algorithm. Are there any known algorithms / scripts / packages for this? Maybe code will plot some data against the value of x? I have tried to make vs versus the cluster size () but I do not see the "step increment" clearly, to inform me that the value to use x:

< Img src = "https: / /i.stack.imgur.com/NoHqp.png" alt = "Enter image details here">

My code and data: My The link to the Python script [is here] [8], I have made a heavy comment on this and it will generate the figures and plot above tree for you (distance Cut-offs, X) Use DI-from, DDO and D_step to find out. If you have easy-to-install and Python then you will just need to install et2 by executing these two bish commands:

  apt-get install Python-MMP Python-QT4 Python -PP Python-Mysqldb python-lxml easy_install -U ete2  

I think I Need to know more before I can give specific suggestions but maybe this will help you. I am assuming that not every terminal A sequence, and each internal node is a PSSM.

The calculation application for X is specific. For example, you will get x if you want to break the ultraparolage, you do not get similar to x when you want to collapse all homologs.

Since genes are being created continuously through repetition and exclusivity, there is no value for X, which will discriminate sequences from evolutionary relations. Therefore, I do not expect that you will find satisfactory proxies for determining evolutionary relations between sequences to look at cluster data only.

A more rigid method will produce the gene tree from the genes of each regulatory shape and mix it with a species tree. Additional software is available for Otholog / Inroplication identification.

If you do this, then the internal nodes of your tree will be decorated with the estimated evolutionary event (e.g., repetition, species). Then you can walk on foot to fall nodes of trees, which you do not care about


Comments

Popular posts from this blog

import - Python ImportError: No module named wmi -

Editing Python Class in Shell and SQLAlchemy -

lua - HowTo create a fuel bar -