TY - GEN
T1 - HAMFAST
T2 - 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009
AU - Pappalardo, Francesco
AU - Calonaci, Cristiano
AU - Pennisi, Marzio
AU - Mastriani, Emilio
AU - Motta, Santo
PY - 2009
Y1 - 2009
N2 - Similarity is a vague concept which can be treated in a quantitative manner only using appropriate mathematical representation of the objects to compare and a metric on the space representation. In biology the mathematical representation of structure relies on strings taken from an alphabet of m symbols. Very often binary strings, m = 2, are used. The size of the binary string depends on the complexity of the structure to represent, so the string can be quite long. The Hamming distance is the most used metric with binary strings. The computational effort required to compute the Hamming distance linearly depends on the size of the string. However even a linear effort case may be computational heavy if many computations are required. One of the fastest computational approach to evaluate Hamming distances relies on look-up tables. The computational performance, however, rapidly deteriorates with the size of binary string length, due to cache misses. We present a computational strategy and implementation which can handle huge number of Hamming distance evaluation between binary strings of arbitrary length keeping computational performance competitive.
AB - Similarity is a vague concept which can be treated in a quantitative manner only using appropriate mathematical representation of the objects to compare and a metric on the space representation. In biology the mathematical representation of structure relies on strings taken from an alphabet of m symbols. Very often binary strings, m = 2, are used. The size of the binary string depends on the complexity of the structure to represent, so the string can be quite long. The Hamming distance is the most used metric with binary strings. The computational effort required to compute the Hamming distance linearly depends on the size of the string. However even a linear effort case may be computational heavy if many computations are required. One of the fastest computational approach to evaluate Hamming distances relies on look-up tables. The computational performance, however, rapidly deteriorates with the size of binary string length, due to cache misses. We present a computational strategy and implementation which can handle huge number of Hamming distance evaluation between binary strings of arbitrary length keeping computational performance competitive.
UR - http://www.scopus.com/inward/record.url?scp=70449116216&partnerID=8YFLogxK
U2 - 10.1109/CSIE.2009.223
DO - 10.1109/CSIE.2009.223
M3 - Conference contribution
AN - SCOPUS:70449116216
SN - 9780769535074
T3 - 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009
SP - 569
EP - 572
BT - 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009
Y2 - 31 March 2009 through 2 April 2009
ER -