12

I am looking for an application that can compare two C++ sources and find the code-meaningful differences (to compare versions which may have been reformatted differently). At the very minimum, something which has the capability for ignoring changes in white spaces, tab spaces and newlines which do not affect the functionality of the source (note that whether a newline is considered whitespace is language-dependent, and C and C++ do so). And, ideally, something that can identify exactly all code-meaningful differences. I am under Ubuntu.

As per diff --help | grep ignore, I expected diff -bBwZ to do reasonably the job (I expected to get some false negatives, to be dealt with later). Nevertheless, it doesn't.

if I have the following files with snippets

test_diff1.txt

    else if (prop == "P1") { return 0; }

and test_diff2.txt

    else if (prop == "P1") {
        return 0;
    }

then

$ diff -bBwZ test_diff1.txt test_diff2.txt
1c1,3
<     else if (prop == "P1") { return 0; }
---
>     else if (prop == "P1") {
>         return 0;
>     }

instead of empty results.

Using a code formatter as a "filter" on both inputs may filter out these differences, but then the resulting output would have to be tied back to the original inputs for the final reporting of differences to keep actual text and line numbers. So the objective is attainable without a need for a compiler properly... I do not know if something is available, though.

Can the objective be attained with diff? Otherwise, is there an alternative (preferably, for command line)?

muru
  • 207,228

3 Answers3

10

You can use dwdiff. From man dwdiff:

dwdiff - a delimited word diff program

Program is very clever - see dwdiff --help:

$ dwdiff --help
Usage: dwdiff [OPTIONS] <OLD FILE> <NEW FILE>
-h, --help                             Print this help message
-v, --version                          Print version and copyright information
-d <delim>, --delimiters=<delim>       Specify delimiters
-P, --punctuation                      Use punctuation characters as delimiters
-W <ws>, --white-space=<ws>            Specify whitespace characters
-u, --diff-input                       Read the input as the output from diff
-S[<marker>], --paragraph-separator[=<marker>]  Show inserted or deleted blocks
                               of empty lines, optionally overriding the marker
-1, --no-deleted                       Do not print deleted words
-2, --no-inserted                      Do not print inserted words
-3, --no-common                        Do not print common words
-L[<width>], --line-numbers[<width>]   Prepend line numbers
-C<num>, --context=<num>               Show <num> lines of context
-s, --statistics                       Print statistics when done
--wdiff-output                         Produce wdiff compatible output
-i, --ignore-case                      Ignore differences in case
-I, --ignore-formatting                Ignore formatting differences
-m <num>, --match-context=<num>        Use <num> words of context for matching
--aggregate-changes                    Allow close changes to aggregate
-A <alg>, --algorithm=<alg>            Choose algorithm: best, normal, fast
-c[<spec>], --color[=<spec>]           Color mode
-l, --less-mode                        As -p but also overstrike whitespace
-p, --printer                          Use overstriking and bold text
-w <string>, --start-delete=<string>   String to mark begin of deleted text
-x <string>, --stop-delete=<string>    String to mark end of deleted text
-y <string>, --start-insert=<string>   String to mark begin of inserted text
-z <string>, --stop-insert=<string>    String to mark end of inserted text
-R, --repeat-markers                   Repeat markers at newlines
--profile=<name>                       Use profile <name>
--no-profile                           Disable profile reading

Test it with:

cat << EOF > test_diff1.txt
    else if (prop == "P1") { return 0; }
EOF

cat << EOF > test_diff2.txt
    else if (prop == "P1") {
        return 0;
    }
EOF

Then launch comparison:

$ dwdiff test_diff1.txt test_diff2.txt --statistics
    else if (prop == "P1") {
        return 0;
    }
old: 9 words  9 100% common  0 0% deleted  0 0% changed
new: 9 words  9 100% common  0 0% inserted  0 0% changed

Please note 100% common above.

N0rbert
  • 103,263
1

I doubt this is something that diff can do. If there are space changes within a line, then it will work (or other similar programs like kompare). At worse, you can do a search-and-replace and collapse tab characters, etc. But what you're asking for whitespace changes beyond a line...

You would need a program that understands the C++ language. Note that all languages are different and Python, in particular, uses whitespace to define code blocks. As such, I doubt any general diff-like program would work with "any" (or a specific) programming language.

You might consider some kind of parser to go through the two source files and then compare the outputs of this parser.

This is beyond my background, but I suggest you look into Lex and Yacc. These are Wikipedia pages; you might want to take a look at this page which gives a concise explanation and an example.

Ray
  • 2,200
0

In similar situation, when I needed to compare two git branches in code-formatting agnostic way, I did this:

  1. created temporary branches:

    $ git co feature-a
    $ git co -b 1
    $ git co feature-b
    $ git co -b 2
    
  2. formatted both branches using clang-format:

    $ git co 1
    $ find . -name '*.cpp' -print0 | parallel -0 -n 1 clang-format -i -style=google
    $ git ci -a -m1 --no-verify
    $ git co 2
    $ find . -name '*.cpp' -print0 | parallel -0 -n 1 clang-format -i -style=google
    $ git ci -a -m2 --no-verify
    
  3. did actual comparison:

    $ git diff -w -b 1 2
    

    (-w -b allows you to ignore space difference, just in case).

You may prefer uncrustify over clang-format (uncrustify's mod_full_brace_if may be used to enforce insertion/removal of curly braces around single-line if's body).

Also, if GNU parallel isn't installed, use xargs - it does the same, but a little bit longer.

wjandrea
  • 14,504