ATITD Wiki: Users/Calen/Genetics/Shelyak Discussion

Overlapping Genes and the 'K' Terminal Base Quesion

This problem hybrid (Calixes#25) is noted on my Genetics page and in my attached spreadsheet. I believe it also poses a problem for your idea that the 'K' base serves as a terminal marker for every vine sequence (this idea is attributed commonly held knowledge from prior tales - see below). Here is my reasoning: 1) Calixes reports Calixes#25 as a cross between Distraction and Balance; 2) If we accept that Balance was the right-hand parent then, in order to retain the Sugar gene from Balance and create a new Sugar gene at the join, the break and recombination of the two sequences would have to be at the very start, where you have 'KYYY' in your Balance sequence -- or, in other words, the break occurred just after the 'K' and before the 'YYY' (alternately, there could be a 'YY' joined to a 'YY' from the other vine, but let's put that aside for the moment); 3) Ok, so far, but if all the vines start and end with a 'K', then the only possible contribution from the Distraction sequence would be that initial 'K', which rules out a 'YYYY' sequence of overlapping Sugar genes (and yet it must exist either as a 'YYYY' overlapping sequence or two distince 'YYY' sequences); 4) Lastly, my solvent results have so far not turned up any Y elements in the Distraction data at all, which is a more serious problem obviously -- this doesn't mean it is not there, but simply that I have not detected it yet. I think it is possible (especially considering that the parents of Calixes#22 and Calixes#23 seem to be reversed as reported by Calixes) that the left-hand parent of Calixes#25 was not Distraction at all. Furthermore I am predisposed to leave a bit more space on the left-hand side of the overall Balance sequence to allow for some combination of 'Y' with the 'YYY' sequence in Balance, producing the 'YYYY' sequence in Calixes#25 where the break and recombination occurred. ~Calen

This is a good test of the proposed model, but I think we need more data to use Calixes#25 as a test. First, we haven't seen the Knnn left side of Distraction; if it had a Y over within a few bases of the start, that would supply another Y for the mysteriously doubling Sugar. Second, we have a limited understanding of how crossbreeding actually snips genomes and concatenates them. Cali#22 is the only genome we've looked at closely, and we've already seen irregularities there versus what we'd expect given its supposed parents. Third, we haven't seen Calixes#25's genome, so (as you note) it could have different parents than we expect. It's mind-bending to contemplate, but perhaps all of the non-matching parentages we're seeing are the result of the _start_ of the genome coming from the right side splint, while the _end_ comes from the left side splint. I can't quite wrap my mind around that, so I'm hoping we can derive a way to disprove it. ~Shelyak

A final note on the terminal 'K' base marker idea... it seems entirely possible to me that Teppy (or whoever) started out to setup the sequences this way, but that they then got scrambled a bit in their final form as adjustments were made to eliminate certain genes (as you suggest above for the possibility that there was initially a Grapes gene in Balance) and/or order the sequences to provide better recombinant possibilities. ~Calen

That hypothesis will hold up better if you can produce a fragment with a K in the middle and other bases on both sides. I haven't seen one yet in your data or anyone else's. ~Shelyak

The 'K' Skin Gene Question and the Overall Sequence Length Question

It seems likely to me that the 'YRO' sequence appears at least 3 times in Contemplation (the QQKK vine), which probably rules it out as the Skin gene. There are two distinct revelation solvent findings with the YRO sequence: YYROK and GYROG. Even if you use one of these as a bridge between the two halves of the overall sequence (as I did in my proposed solution), that still leaves the other sequence duplicated on both sides, giving 3 occurances total. If you do not use it as a bridge, then I would expect to see 4 occurances total, since this genome seems to be more or less identical on the left and right sides... which could make it another double-gene like Quality perhaps, but right now I tend to doubt this because of how tightly Contemplation seems to fit together in my proposed sequence for that vine. If we expand the total number of elements to 42 or some other number, other possibilities exist. ~Calen

I come up with the following sequence for Contemplation: KRYYRGYROGGG..GRYYROK. The .. indicates there's no way of counting G's in that region with the existing data. Let me know if you see a fragment in your data that doesn't get used in that sequencing. Occam's Razor makes me believe it's more likely genomes are variable length (as they seem to be for wheat) than that there's a great deal of duplication necessary to get to 36 bases. ~Shelyak

I do not see the 'OGGG' or 'GGGGR' sequences used in your sequence. Admitting for the moment that we cannot be sure of the number of 'G' bases between those two sections, it is still important to note that we know there are at least 4 on either side (probably overlapping - my own application of Occam's Razor). I have a variety of reasons for thinking that the genomes are not variable length, so this does not seem like a clear-cut application of Occam's Razor to me. But, without going into great length about that, without a complete genome that is duplicated on the left and right sides, you do not wind up with 2 Quality genes. I strongly believe that the portion of 'RYYR' responsible for the Quality gene is a doubled gene (requires two occurrences to produce one quality mutation). In your suggested genome there are only two occurrences, which would produce just one Quality gene. I believe the overall sequence must be doubled on the left and right sides. Again, Occam's Razor could apply: Teppy wanted a double-Q, double-K gene and worked out the sequence for one of each and then just doubled it. The frequency with witch the same solvent results occur for vines like this suggest statistically that these sequences are simply repeated twice or three times within the overall genome (e.g., Appreciation and Wisdom). ~Calen

OGGG --> KRYYRGYROGGG..GRYYROK; GGGGR -> KRYYRGYROGGG..GRYYROK, assuming 0 or more G's in the .. section. I implicitly am also arguing that it takes only one RYYR sequence (or whatever fragment thereof codes for Q) to produce a Quality mutation. ~Shelyak

Sorry, that should have been 'OGGGG' above. ~Calen

That's still in there: OGGGG -> KRYYRGYROGGG..GRYYROK, assuming 0 or more G's in the .. section. ~Shelyak

With respect to the question over variable length genomes or genomes with repeating sub-sections, here is one piece of my reasoning: The frequency of solvent results from Wisdom like 'GGGGG' seem to suggest that it appears far more frequently than the significant genes. In more detail: with Wisdeom we have 'GGGGG' (seen 5 times) and 'GGG' (seen 16 times) while sequences such as 'OGYRO' and 'GYROG' were obtained only once. This is even more striking in Distraction with 'GGGGG' (seen 6 times) and 'GGGG' (seen 10 times) and 'GGG' (seen 13 times) vs. 'GGGGO' (seen 1 time) and 'OORGG' (seen 1 time). This suggest to me that these 'GGG' sequences are used repeatedly and often as filler between the meaningful gene sequences, which are most likely duplicates on the left and right sides of the overall sequence, or duplicated 3 times overall for vines like Appreciation (QQQ) and Wisdom (KKK). I do not have the mathematical expertise to do a statistical analysis of the frequency of these results, but having watched this data emerge and lived with it for many weeks, the conclusion that these 'GGG' sequences are filler looks and feels right to me. If we assume that this is true, then why have filler if not to pad out the overall sequence to make all of the vine sequences the same length? I also believe that most vines (except Balance and Frivolity perhaps) have repeating sequences to achieve this same end. I would love to see new data to refute this or clarify the issue in any other direction, but I got tired of wasting solvents on these vines and getting so many repeats of the 'GGG' sequences. ~Calen

I agree with you. I think G's are filler. ~Shelyak

Discussion on the Two "K-" Base Sequences in Frivolity

I should state here, to avoid further confusion, that I was confused in my first response to Shelyak. He wasn't talking about the Skin (K) sequence. Instead he was talking about the 'K' base that he states were assumed to be terminating caps on each vine genome in prior tales... so as you read this section you will see my initial confusion and his clarification. -Calen

I am not seeing two K- sequences in my Frivolity data, whether your proposed 'YRO' or my choice of 'OGG'. Can you elaborate? Also, you seem to be ignoring the 'KROO' sequence, which seems to cause some problems with the beginning and ending 'K' idea, since that provides two beginning 'K' sequences along with 'KRRO'. I admit it is possible that I mis-recorded this, however my memory of it was that I was particularly on the lookout for the rare sequences involving 'K', since I already suspected they served as dividers or markers of some sort, and that I double-checked this second, distinct occurence when I recorded it. ~ Calen

KROO and KRRO are the two K- sequences I was refering to (in that they're both sequences starting with the K base -- not Skin mutations). I'm not questioning your data as much as suggesting Frivolity would be a good candidate to re-run to see if the KRO and KRR sequences are reproducible. It'd be a good test for the three-base rev solvent. ~Shelyak

Users > Calen > Genetics > Shelyak Discussion

Overlapping Genes and the 'K' Terminal Base Quesion

The 'K' Skin Gene Question and the Overall Sequence Length Question

Discussion on the Two "K-" Base Sequences in Frivolity