UVA1368 DNA Consensus String
  VW0ZAOA6bLNz 2023年12月10日 14 0


DNA Consensus String

The Hamming distance is the number of different characters at each position from two strings of equal length. For example, assume we are given the two strings “AGCAT” and “GGAAT.” The Hamming distance of these two strings is 2 because the 1st and the 3rd characters of the two strings are different. Using the Hamming distance, we can define a representative string for a set of multiple strings of equal length. Given a set of strings S = {s1, …, sm} of length n, the consensus error between a string y of length n and the set S is the sum of the Hamming distances between y and each si in S. If the consensus error between y and S is the minimum among all possible strings y of length n, y is called a consensus string of S. For example, given the three strings “AGCAT” 、“AGACT” and “GGAAT” the consensus string of the given strings is “AGAAT” because the sum of the Hamming distances between “AGAAT” and the three strings is 3 which is minimal. (In this case, the consensus string is unique, but in general, there can be more than one consensus string.) We use the consensus string as a representative of the DNA sequence. For the example of Figure 1 above, a consensus string of gene X is “GCAAATGGCTGTGCA” and the consensus error is 7.

UVA1368 DNA Consensus String_#include

Input: Your program is to read from standard input. The input consists of T test cases. The number of test cases T is given in the first line of the input. Each test case starts with a line containing two integers m and n which are separated by a single space. The integer m(4 ≤ m ≤ 50) represents the number of DNA sequences and n(4 ≤ n ≤ 1000) represents the length of the DNA sequences, respectively. In each of the next m lines, each DNA sequence is given.

Output: Your program is to write to standard output. Print the consensus string in the first line of each case and the consensus error in the second line of each case. If there exists more than one consensus string, print the lexicographically smallest consensus string.

DNA序列 DNA Consensus String

题面翻译

输入UVA1368 DNA Consensus String_#include_02个长度均为UVA1368 DNA Consensus String_字符串_03UVA1368 DNA Consensus String_算法_04序列,求一个UVA1368 DNA Consensus String_算法_04序列,到所有序列的总UVA1368 DNA Consensus String_#include_06距离尽量小。两个等长字符串的UVA1368 DNA Consensus String_#include_06距离等于字符不同的位置个数,如UVA1368 DNA Consensus String_ci_08UVA1368 DNA Consensus String_算法_09UVA1368 DNA Consensus String_#include_06距离为UVA1368 DNA Consensus String_ci_11(左数第UVA1368 DNA Consensus String_算法_12UVA1368 DNA Consensus String_字符串_13个字符不同)。

输入整数UVA1368 DNA Consensus String_#include_02UVA1368 DNA Consensus String_字符串_03UVA1368 DNA Consensus String_ci_16UVA1368 DNA Consensus String_字符串_17),以及UVA1368 DNA Consensus String_#include_02个长度为UVA1368 DNA Consensus String_字符串_03UVA1368 DNA Consensus String_算法_04序列,(只包含字母UVA1368 DNA Consensus String_算法_21UVA1368 DNA Consensus String_#include_22UVA1368 DNA Consensus String_ci_23UVA1368 DNA Consensus String_字符串_24),输出到UVA1368 DNA Consensus String_#include_02个序列的UVA1368 DNA Consensus String_ci_26距离和最小的UVA1368 DNA Consensus String_算法_04序列和对应的距离。如有多解,要求字典序最小的解。

题目描述

PDF

输入格式

输出格式

Solution

采用贪心策略

str[i][j]为第i个字符串的第j个元素,因为字符串序列是等长的首先比较相同位置的字符,并记录其A、G、C、T的个数,当所有m个字符串的第i个位置遍历完毕,计算A、G、C、T个数最大,当数目相同时使用字典序最小的的并作为std_s[i]

//
// Created by Gowi on 2023/12/2.
//

#include <iostream>
#include <string>

#define N 55

using namespace std;

int main() {
    string str[N];
    int T;
    cin >> T;
    while (T--) {
        int m, n;
        string std_s;
        cin >> m >> n;
        for (int i = 0; i < m; ++i) {
            cin >> str[i];
        }
        int sum = 0;
        for (int i = 0; i < n; ++i) {
            int a = 0, c = 0, g = 0, t = 0, index_n = 0;
            char index_c = 'Z';
            for (int j = 0; j < m; ++j) {
                if (str[j][i] == 'A') {
                    a++;
                } else if (str[j][i] == 'C') {
                    c++;
                } else if (str[j][i] == 'G') {
                    g++;
                } else {
                    t++;
                }
                index_n = max(a, max(c, max(g, t)));
            }
            if (t == index_n && 'T' < index_c) {
                index_c = 'T';
            }
            if (g == index_n && 'G' < index_c) {
                index_c = 'G';
            }
            if (c == index_n && 'C' < index_c) {
                index_c = 'C';
            }
            if (a == index_n && 'A' < index_c) {
                index_c = 'A';
            }
            std_s += index_c;
            if (index_c != 'A') {
                sum += a;
            }
            if (index_c != 'G') {
                sum += g;
            }
            if (index_c != 'C') {
                sum += c;
            }
            if (index_c != 'T') {
                sum += t;
            }
        }
        std_s[n] = '\0';
        cout << std_s << endl;
        cout << sum << endl;
    }
}


【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年12月10日 0

暂无评论

推荐阅读
VW0ZAOA6bLNz
最新推荐 更多

2024-05-17