c# - Find values that appear in all lists (or arrays or collections) -
given following:
list<list<int>> lists = new list<list<int>>(); lists.add(new list<int>() { 1,2,3,4,5,6,7 }); lists.add(new list<int>() { 1,2 }); lists.add(new list<int>() { 1,2,3,4 }); lists.add(new list<int>() { 1,2,5,6,7 });
what best/fastest way of identifying numbers appear in lists?
to 2 lists 1 use x.intersect(y)
.
to several want like:
var intersection = lists.aggregate((x, y) => x.intersect(y));
but won't work because result of lambda isn't list<int>
, can't fed in. might tempt try:
var intersection = lists.aggregate((x, y) => x.intersect(y).tolist());
but makes n-1 needless calls tolist()
relatively expensive. can around with:
var intersection = lists.aggregate( (ienumerable<int> x, ienumerable<int> y) => x.intersect(y));
which applies same logic, in using explicit types in lambda, can feed result of intersect()
in without wasting time , memory creating list each time, , gives faster results.
if came lot can further (slight) performance improvements rolling our own rather using linq:
public static ienumerable<t> intersectall<t>(this ienumerable<ienumerable<t>> source) { using(var en = source.getenumerator()) { if(!en.movenext()) return enumerable.empty<t>(); var set = new hashset<t>(en.current); while(en.movenext()) { var newset = new hashset<t>(); foreach(t item in en.current) if(set.remove(item)) newset.add(item); set = newset; } return set; } }
this assumes internal use only. if called assembly should have error checks, , perhaps should defined perform intersect operations on first movenext()
of calling code:
public static ienumerable<t> intersectall<t>(this ienumerable<ienumerable<t>> source) { if(source == null) throw new argumentnullexception("source"); return intersectalliterator(source); } public static ienumerable<t> intersectalliterator<t>(ienumerable<ienumerable<t>> source) { using(var en = source.getenumerator()) { if(en.movenext()) { var set = new hashset<t>(en.current); while(en.movenext()) { var newset = new hashset<t>(); foreach(t item in en.current) if(set.remove(item)) newset.add(item); set = newset; } foreach(t item in set) yield return item; } } }
(in these final 2 versions there's opportunity short-circuit if end emptying set, pays off if happens relatively often, otherwise it's nett loss).
conversely, if these aren't concerns, , if know we're ever going want lists, can optimise bit further use of count
, indices:
public static ienumerable<t> intersectall<t>(this list<list<t>> source) { if (source.count == 0) return enumerable.empty<t>(); if (source.count == 1) return source[0]; var set = new hashset<t>(source[0]); for(int = 1; != source.count; ++i) { var newset = new hashset<t>(); var list = source[i]; for(int j = 0; j != list.count; ++j) { t item = list[j]; if(set.remove(item)) newset.add(item); } set = newset; } return set; }
and further if know we're going want results in list, , know either won't mutate list, or won't matter if input list got mutated, can optimise case of there being 0 or 1 lists (but costs more if might ever not need output in list):
public static list<t> intersectall<t>(this list<list<t>> source) { if (source.count == 0) return new list<t>(0); if (source.count == 1) return source[0]; var set = new hashset<t>(source[0]); for(int = 1; != source.count; ++i) { var newset = new hashset<t>(); var list = source[i]; for(int j = 0; j != list.count; ++j) { t item = list[j]; if(set.remove(item)) newset.add(item); } set = newset; } return new list<t>(set); }
again though, making method less widely-applicable, has risks in terms of how used, appropriate internal code can know either won't change either input or output after fact, or won't matter.
Comments
Post a Comment