3

I have a json file where I need to remove the last forward slashes only. See the example:

{"url":"http://example.com/vary/file/","originalUrl":"http://example.com/vary/file/","applications":[{.........}]}

I just want the data to look like:

{"url":"example.com/vary/file","originalUrl":"example.com/vary/file","applications":[{.........}]}

How can I do this with sed?

4 Answers4

6

If you insist on using sed, you could just match the /" combination, to remove the last / in every field, assuming it will not occur somewhere you want to keep it (which should be fairly reliable in this case)

$ sed 's|/"|"|g' file
{"url":"http://example.com/vary/file","originalUrl":"http://example.com/vary/file","applications":[{.........}]}

I used | to delimit instead of / to save a backslash. You need g for multiple matches on the same line.

Here's a way to take out the http:// as well in the same call:

$ sed -r 's|"http://([^"]+)/"|"\1"|g' url
{"url":"example.com/vary/file","originalUrl":"example.com/vary/file","applications":[{.........}]}

([^"]+) will match anything between "http:// and /" that isn't a ". We save this part with () and reference with \1.

Zanna
  • 72,312
6

I took the liberty to modify OP's input slightly, because as it stands , it's not properly structured json data (due to the {...} part) and implemented a small python script that works with multiple dictionaries, assuming that we're dealing with a dictionary per line. Additionally, as has been discussed in the comments to the question, OP also wanted to remove http:// part.

The script below implements everything discussed above.

#!/usr/bin/env python
import json,sys

with open(sys.argv[1]) as f: 
    for line in f:
        data=json.loads(line)
        if data["url"][-1] == '/':
            data["url"]=data["url"][:-1].replace('http://','')
        if data["originalUrl"][-1] == '/':
            data["originalUrl"]=data["originalUrl"][:-1].replace('http://','')
        json.dump(data,sys.stdout)
        print("")

Test run:

$ cat input.txt                                                                                 
{"url":"http://example.com/vary/file/","originalUrl":"http://example.com/vary/file/","applications":[{"somedata": "blah"}]}
{"url":"http://another-example.com/vary/file/","originalUrl":"http://example.com/vary/file/","applications":[{"somedata": "blah"}]}
$ ./remove_slash.py input.txt                                                                   
{"url": "example.com/vary/file", "applications": [{"somedata": "blah"}], "originalUrl": "example.com/vary/file"}
{"url": "another-example.com/vary/file", "applications": [{"somedata": "blah"}], "originalUrl": "example.com/vary/file"}
5

A late one:

a simple, purely text based python option:

#!/usr/bin/env python3
import sys

with open(sys.argv[1]) as data:
    for l in data:
        print(("").join(l.strip().replace("http://", "").rsplit("/", 1)))

Or, just for fun, another way of saying it:

#!/usr/bin/env python3
import sys

[print(("").join(l.strip().replace("http://", "").rsplit("/", 1))) for l in open(sys.argv[1])]

doing both the string replacement/removal (http://) and the slash removal in appr. 47 seconds on 14.000.000 million lines, on my ancient system.

To use:

python3 /path/to/script.py /path/to/inputfile > outputfile

Explanation

As usual, python is quite readable, but in detail:

  • rsplit("/", 1) splits the line from the right (hence the r) by the delimiter / only once (hence the 1)
  • l.replace("http://", "") replaces http:// by an empty string
  • ("").join() joins the list, that was created by rsplit() again into a line
Jacob Vlijm
  • 85,475
0

Input JSON file (test.json):

{"url":"http://example.com/vary/file/","originalUrl":"http://example.com/vary/file/"}
  • Code to modify as per requirement and re-write to same file:

    import json
     with open("test.json") as fh:
        data = json.load(fh)
    

    for k,v in data.items(): data[k] = v.replace("http://","").strip("/")

    with open("test.json","w") as fh: json.dump(data,fh)

Output:

{"url": "example.com/vary/file", "originalUrl": "example.com/vary/file"}

All operations at once, replaces http:// with "" and strips / at the end of the string.

replace("http://","").strip("/")
StackGuru
  • 101