2

I have multiple excel files in this format.
They are clock in and clock out date and time.

---------------------------------
| Name     | Time               |
---------------------------------
| Person A | 03-Jul-17 8:15 AM  |
| Person A | 03-Jul-17 10:32 AM |
| Person A | 03-Jul-17 1:56 PM  |
| Person A | 03-Jul-17 6:15 PM  |
| Person A | 04-Jul-17 8:29 AM  |
| Person A | 04-Jul-17 8:58 AM  |
| Person A | 04-Jul-17 9:43 AM  |
| Person A | 04-Jul-17 1:03 PM  |
| Person A | 04-Jul-17 2:17 PM  |
| Person A | 04-Jul-17 5:58 PM  |
.
.
.
| Person A | 31-Jul-17 7:45 AM  |
| Person A | 31-Jul-17 8:10 AM  |
| Person A | 31-Jul-17 3:26 PM  |
| Person A | 31-Jul-17 7:29 PM  |
---------------------------------

I would want to extract this data and save it as a new excel file in this format:

---------------------------------------------
| Name     | Date      | Time In | Time Out |
---------------------------------------------
| Person A | 03-Jul-17 | 8:15 AM | 6:15PM   |
| Person A | 04-Jul-17 | 8:29 AM | 5:58PM   |
.
.
.
| Person A | 31-Jul-17 | 7:45 AM | 7:29PM   |
---------------------------------------------

Basically it's to arrange the data one entry per date with the earliest time for that date as the Time In and the latest time for that date as the Time Out.

There are multiple excel files in this format, and doing it manually is going to take too long.

If you want to convert them to .csv first edit then convert them back to .xlsx, it's cool.

PS: Bounty of 200 rep up for grabs.

Parto
  • 15,647

3 Answers3

2

There are 2 scripts you need. The one to convert from XLS to CSV is a command xls2csv and the other one a script from github: csv2xls (another csv2xls). There are also csv2xlsx (and another csv2xlsx).

In between the 2 conversions you can edit the files using your favorite tool.

If you want to do it yourself: xlsx files (and the same applies ODT (open/libeoffice) are zipped archives and contain an XML with the data. You can unzip then and the data is in an XML. Manipulating the XML is a bit more difficult that a CSV sure but when the manipulating is automated it becomes rather efficient.

Rinzwind
  • 309,379
1

I converted the files to csv and used a PHP script to parse the content creating it into how I actually wanted. The result was then saved in a new file then those files were converted back to xls then merged into one notebook.

The conversion and merging part was done manually. It's not the best solution but it's working for now.

Here's the script:

// get list of files from data directory
$files = array_diff(scandir('./data'), array('.', '..'));
foreach($files as $file):

    // get all data from the csv file and save in the $data array
    $csvFile = file('data/'.$file);
    $data = $list = [];
    foreach($csvFile as $line) {
        $data[] = str_getcsv($line);
    }
    unset($data[0]);

    // parse the data array and get the different sections: name. date and time
    foreach($data as $v) {
        $date = strtotime($v[1]);
        $list[date('d-m-Y',$date)][] = array(
            'name'=>$v[0],
            'date'=>date('d/m/Y',$date),
            'in'=>$date
        );
    }

    // create a new array and save parsed data in it with header columns
    $new = array(array('Name','Date','Time In','Time Out'));
    foreach($list as $k => $v) {
        $out = max(array_column($v, 'in'));
        $name = $v[0]['name'];
        $new[] = array(
            'name'=>ucwords(strtolower($name)),
            'date'=>$v[0]['date'],
            'in'=>date('h:i A', $v[0]['in']),
            'out'=>date('h:i A', $out)
        );
    }

    // The name of a new file in the new directory using this file name
    $filename = str_replace('.csv', '', basename($file));
    $fn = strtolower($filename.'-log.csv');

    // open the file and output the new array as CSV
    $out = fopen('new/'.$fn, 'w');
    foreach($new as $l) {
        fputcsv($out, $l, ",",'"');
    }
    fclose($out);

endforeach;
Parto
  • 15,647
0

Thanks to @rinzwind I managed to create the following bash script that unzips the xlsx, uses sed to replace certain strings to something else, and then re-zip it again. The script sets autocalculate on, so that if you have any formulas, they are also updated (by default not).

#!/bin/bash
if ( [ -z $1 ] || [ -z $2 ] ); then
    echo "Usage: create-xlsx.sh <PLACEHOLDER_A> <PLACEHOLDER_B>"
    exit 1
fi

DIR=/tmp/mydir rm -rf $DIR mkdir -p $DIR cd $DIR #Note! xlsx is a zip-file #Note! Original xlsx needs to have autocalculate set #See https://stackoverflow.com/questions/18355691/set-xlsx-to-recalculate-formulae-on-open unzip -d $DIR /path/to/original.xlsx sed -i "s/calcPr iterateCount=&quot;100&quot;/calcPr calcMode=&quot;auto&quot; fullCalcOnLoad=&quot;1&quot; iterateCount=&quot;100&quot;/g" $DIR/xl/workbook.xml sed -i "s/PLACEHOLDER_A/$1/g" $DIR/xl/worksheets/sheet1.xml sed -i "s/PLACEHOLDER_B/$2/g" $DIR/xl/worksheets/sheet1.xml zip -r /tmp/output.xlsx *