6

I need to use tar in a pipeline inside a shell script to archive and compress some files.

After having read the manpage for tar I was able to make tar output to stdout by running it along with the -O argument, but I wasn't able to find out how to make it input from stdin. Since many commands read from stdin if no other input it's specified, I tried:

pv ~/foo | tar c -O ~/foo.tar

but that didn't work:

tar: Cowardly refusing to create an empty archive

Try 'tar --help' or 'tar --usage' for more information.

How can I make tar read input from stdin?

kos
  • 41,268

2 Answers2

4

Using -O with c is ignored, since no files are extracted. You'll get the same result either way:

$ tar c -O foo | tar t
foo
$ tar c foo | tar t  
foo

Which is why I find your error surprising, since tar is no coward if you specify a path.

tar cannot read in file data from stdin for creating an archive, since then tar will have no way of knowing what a file is - where one begins or ends, what its path and metadata is.

muru
  • 207,228
0

This Python script from my Unix.StackExchange answer appends stdin to a tar file, giving it an arbitrary name, using Python's tarfile library. It seeks back in the tar to rewrite the header with the right size at eof. The usage would be:

cat foo | tarappend -t mytar -f foo
tar tvf mytar

Here's the tarappend python script:

#!/usr/bin/python
# concat stdin to end of tar file, with given name. meuh on stackexchange
# $Id: tarappend,v 1.3 2015/07/08 11:31:18 meuh $

import sys, os, tarfile, time, copy
from optparse import OptionParser
try:
    import grp, pwd
except ImportError:
    grp = pwd = None

usage = """%prog: ... | %prog -t tarfile -f filename
Appends stdin to tarfile under the given arbitrary filename.
tarfile is created if it does not exist.\
"""

def doargs():
    parser = OptionParser(usage=usage)
    parser.add_option("-f", "--filename", help="filename to use")
    parser.add_option("-t", "--tarfile", help="existing tar archive")
    (options, args) = parser.parse_args()
    if options.filename is None or options.tarfile is None:
        parser.error("need filename and tarfile")
    if len(args):
        parser.error("unknown args: "+" ".join(args))
    return options

def copygetlen(fsrc, fdst):
    """copy data from file-like object fsrc to file-like object fdst. return len"""
    totlen = 0
    while 1:
        buf = fsrc.read(16*1024)
        if not buf:
            return totlen
        fdst.write(buf)
        totlen += len(buf)

class TarFileStdin(tarfile.TarFile):
    def addstdin(self, tarinfo, fileobj):
        """Add stdin to archive. based on addfile() """
        self._check("aw")
        tarinfo = copy.copy(tarinfo)
        buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
        bufoffset = self.offset
        self.fileobj.write(buf)
        self.offset += len(buf)

        tarinfo.size = copygetlen(fileobj, self.fileobj)
        blocks, remainder = divmod(tarinfo.size, tarfile.BLOCKSIZE)
        if remainder > 0:
            self.fileobj.write(tarfile.NUL * (tarfile.BLOCKSIZE - remainder))
            blocks += 1
        self.offset += blocks * tarfile.BLOCKSIZE
        # rewrite header with correct size
        buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
        self.fileobj.seek(bufoffset)
        self.fileobj.write(buf)
        self.fileobj.seek(self.offset)
        self.members.append(tarinfo)

class TarInfoStdin(tarfile.TarInfo):
    def __init__(self, name):
        if len(name)>100:
            raise ValueError(name+": filename too long")
        if name.endswith("/"):
            raise ValueError(name+": is a directory name")
        tarfile.TarInfo.__init__(self, name)
        self.size = 99
        self.uid = os.getuid()
        self.gid = os.getgid()
        self.mtime = time.time()
        if pwd:
            self.uname = pwd.getpwuid(self.uid)[0]
            self.gname = grp.getgrgid(self.gid)[0]

def run(tarfilename, newfilename):
    tar = TarFileStdin.open(tarfilename, 'a')
    tarinfo = TarInfoStdin(newfilename)
    tar.addstdin(tarinfo, sys.stdin)
    tar.close()

if __name__ == '__main__':
    options = doargs()
    run(options.tarfile, options.filename)
Melebius
  • 11,750
meuh
  • 3,444