4

I've got 2 arrays

    a=(1,2,3,4,5)
    b=(2,4)

the output should be

    c=(1,3,5) 

(wich should be the result of a-b)

I've tried using

    unset a[${b}] 

any ideas?

what I have working now is a loop that runs trough 700,000 iterations

Deyan
  • 53

5 Answers5

5

First of all, if you want to do this for 700,000 iterations, you really should look into something other than bash. Also, what you show is not an array in bash, it's a string1. Arrays are separated by spaces, not commas.

That said, here's a bash way, assuming true arrays:

a=(1 2 3 4 5)
b=(2 4)
c=( $(printf "%s\n" "${a[@]}" "${b[@]}" | sort | uniq -u) )

If you actually have comma-separated strings and not arrays, use this instead:

a=(1,2,3,4,5)
b=(2,4)
c=( $(sed 's/,/\n/g' <(printf "%s\n" "${a[@]}" "${b[@]}") | sort | uniq -u) )

Alternatively, you could use perl instead:

#!/usr/bin/perl
my @A=(1,2,3,4,5);
my @B=(2,4);
my %k;
map{$k{$_}++} (@A,@B);
my @C=grep($k{$_}==1,keys(%k));
print "@C\n";

1Strictly speaking, it's an array with one element but that is essentially the same as a string as far as bash is concerned.

terdon
  • 104,119
1

The complexity the task (O(N^2)) cannot be reduced; assuming only what has been explicitly given (i.e. no hypothesis for the values of both array a and b) the code below will just check if each value present in a is also present in b, and if the value is present, it will break the inner for loop (the only optimization possible with the given hypothesis) and add the value to c; with more clues about the contents of the arrays a and b (i.e. if the values in each array might be repeated and if the arrays are sorted), there might be some more room for improvement:

#!/bin/bash

a=(1 2 3 4 5)
b=(2 4)
for i in ${a[@]}
do
match=0
    for j in ${b[@]}
    do
        if [ "${i}" == "${j}" ]
        then
            match=1
            break
        fi
    done
if [ "${match}" == 0 ]
then
    c+=($i)
fi
done
echo ${c[@]}
kos
  • 41,268
1

In python, entries like:

a=(1,2,3,4,5)
b=(2,4)

are iterables and known as tuples.

Your task can be easily done in python:

#!/usr/bin/env python2
a = (1, 2, 3, 4, 5)
b = (2, 4)
c = tuple(i for i in a if i not in b)
print c

Output :

(1, 3, 5)

Here we have found the values of the tuple a, that do not exist in tuple b and put them in another tuple c. Also note that this operation will be fast and memory efficient for larger data sets as we have used python generator expression.

heemayl
  • 93,925
0

@terdon's accepted answer doesn't actually remove items from a that are also in b, but rather concatenates both lists and removes non-unique values. This is a huge difference when b includes items that are not in a: c will contain values that are in b, but not in a.

Here's a pure Bash solution for actually removing items from a that are also in b (note the additional 6 in b):

a=(1 2 3 4 5)
b=(2 4 6)
c=( $(printf "%s\n" "${a[@]}" "${b[@]}" "${b[@]}" | sort | uniq -u) )

c will consist of 1 3 5 and exclude the unexpected 6.

If your array items might contain whitespaces, use mapfile to construct Array3 instead, as suggested by @David on Stack Overflow:

a=(1 2 3 4 5)
b=(2 4 6)
mapfile -t c < <(printf "%s\n" "${a[@]}" "${b[@]}" "${b[@]}" | sort | uniq -u)

Please note: This assumes that all values in a are unique. Otherwise they won't show up in c. If a contains duplicate values, you must remove the duplicates first (note the duplicate 1 in a; possibly use mapfile if your items contain whitespaces):

a=(1 1 2 3 4 5)
b=(2 4 6)
c=( $({ printf "%s\n" "${a[@]}" | sort -u; printf "%s\n" "${b[@]}" "${b[@]}"; } | sort | uniq -u) )

If you want to replicate the duplicates in a to c, go with @kos' answer. The same is true if a and b are huge: this is a very inefficient solution for a lot of items, even though it's negligible for a few items (<100). If you need to process huge arrays don't use Bash.

0

The perl way. This script displays all elements in array a, which are also contained in array b.

#!/usr/bin/perl
my @a = (1,2,3,4,5);
my @b = (2,4,7,8,9,10);

# Create a hashmap with the entries in b as keys,
# but without values for the keys for a better lookup
# (exists ($hash{$element}))
my %hash;
@hash{@b}=();

foreach my $element (@a) {print "$element " unless exists($hash{$element})}
print "\n";

Output:

1 3 5
A.B.
  • 92,125