0 points
Submitted by Ivan Chernikov
8 years ago

Effective way to remove duplicates from an original list while (backward) iterating

With help of the link I found rather effective solution (by both performance and memory utilization):

``````def my4_remove_duplicates(lst):
lst.sort()
i = len(lst) - 1
while i > 0:
if lst[i] == lst[i - 1]:
lst.pop(i)
i -= 1
return lst``````

``````Here is my answer, I ended up using "not in"

def remove_duplicates(numbers):
newlist = []
for number in numbers:
if number not in newlist:
newlist.append(number)
return newlist
remove_duplicates([1,2,3,4,5,5,5,6,3,2])``````
501 points
7 years ago

Alternatively, use sets. Sets can’t contain duplicate elements. Convert the original list to a set and back to a list ;)

``````def remove_duplicate(alist):
return list(set(alist))``````
2333 points
Submitted by Klaus Warzecha
8 years ago

Ivan Chernikov 8 years ago

Thanks, Klaus. But it is too easy to use a built-in Python function/constructor set(). The more, the current Python course of CA does not teaches us any set types. I think we should find a solution ourselves as much as possible to get our brains switching on ;-). One of the aim of my example is to show a way modifying iterable object/list on the fly, not just to get one line solution. Look at the thread http://www.codecademy.com/ru/forum_questions/5055ac0237e190000201c873

1 vote

This is the refined (and wrong, as @jwoody has noticed :-) version of @jwoody

``````def refined_remove_duplicates(yourList):
y = len(yourList) - 1
while y > 0:
num_of_duplicates = yourList.count(yourList[y])
if num_of_duplicates > 1:
yourList.remove(yourList[y]) """ this line would be correct if there was a removeall() function in the Python library, or if a programmer realized it  :-) """
y -= num_of_duplicates
else:
y -= 1
return yourList``````

To eliminate exhaustive computing with `count()` function (in case of presence of duplicates) we may do as follows (but there may be a memoization in the original @jwoody’s version, by now I am not a competent though):

``````def refined2_remove_duplicates(yourList):
y = len(yourList) - 1
while y > 0:
num_of_duplicates = yourList.count(yourList[y])
while num_of_duplicates > 1:
yourList.remove(yourList[y])
num_of_duplicates -= 1
y -= 1
y -= 1
return yourList``````

Below is the exact @jwoody version in that here it calls `count()` function in each while loop’s iteration but without extra/inner while loop:

``````def refined_remove_duplicates(yourList):
y = len(yourList) - 1
while y > 0:
if yourList.count(yourList[y]) > 1:
yourList.remove(yourList[y])
y -= 1
return yourList ``````
1595 points
Submitted by Ivan Chernikov
8 years ago

jwoody 8 years ago

I don’t think this will work. With a list like this, myList = [1, 1, 2, 1, 3, 4, 1, 3, 2, 6] I get this [1, 1, 4, 1, 3, 2, 6]

Ivan Chernikov 8 years ago

Oh my God! :-) You are absolutely right! I have forgotten that method remove(x) removes only the first occurrence of x, not all of them. But even if it was so, my version would not work correctly, it would simply delete all uniques of amount > 1 :-) In any case there is no sense in inner while loop, that was that I wanted to accent to. Here is the correct example http://labs.codecademy.com/9zd/2#:workspace

jwoody 8 years ago

ok! That works. Thanks for the correction. I guess I’m over thinking the assignments. Not the first time I’ve done that.

Ivan Chernikov 8 years ago

If you don’t mind, I’ve made some correction to my post and to the example http://labs.codecademy.com/9zd/3#:workspace

this is my version:

def remove_duplicates(yourList): y = len(yourList) - 1 while y > 0:
while yourList.count(yourList[y]) > 1: yourList.remove(yourList[y]) y -= 1 y -= 1 return yourList

309 points
Submitted by jwoody
8 years ago

As for calling `count()` in each iteration, perhaps there is no extra computing, because in each next iteration `count()` “looks through” not the whole `yourList`, but through the result of `count()` function of previous iteration. It is so called “memoization” http://en.wikipedia.org/wiki/Memoization. But I am not sure that this is the case.