Add YAZ's line mode MARC as another output format
Right now:
-format stringOutput format. Accepted values: mrk, mrc, xml, json, or solr. (default "mrk")
Please add the yaz-marcdump native format also, called line. From the man page:
-o formatSpecifies output format. Must be one of marcxml, marc (ISO2709), marcxchange (ISO25577), line (line mode MARC), turbomarc (Turbo MARC), or json (MARC-in-JSON).
Could you provide an example of how the format that you are interested look like?
I am not sure how different that format is from the default format in marcli (mrk also known as Mnemonic MARC)
$ yaz-marcdump ./pkg/marc/testdata/test_1a.mrc
01805nam a2200385 i 4500
001 ocm57175940
005 20041206161421.0
006 m d f
007 cr cn-
008 041206s1976 dcua sb f000 0 eng c
040 $a GPO $c GPO $d MvI $d MvI
042 $a pcc
043 $a n-us---
074 $a 0620-A (online)
086 0 $a I 19.4/2:735
100 1 $a Swanson, Vernon E. $q (Vernon Emmanuel), $d 1922-1992.
245 10 $a Guidelines for sample collecting and analytical methods used in the U.S. Geological Survey for determining chemical composition of coal $h [electronic resource] / $c by Vernon E. Swanson and Claude Huffman, Jr.
260 $a [Washington, D.C.] : $b U.S. Dept. of the Interior, U.S. Geological Survey, $c 1976.
336 $a text $2 rdacontent.
337 $a computer $2 rdamedia.
338 $a online resource $2 rdacarrier.
440 0 $a Geological Survey circular ; $v 735.
500 $a Title from title screen (viewed on Dec. 06, 2004)
504 $a Includes bibliographical references.
538 $a Mode of access: Internet from the USGS Web site. Address as of 12/06/04: http://pubs.usgs.gov/circ/c735/index.htm; current access is available via PURL.
650 0 $a Coal $x Analysis.
650 0 $a Coal $x Sampling.
700 1 $a Huffman, Claude.
776 1 $a Swanson, Vernon Emanuel, $d 1922- $t Guidelines for sample collecting and analytical methods used in the U.S. Geological Survey for determining chemical composition of coal $h iv, 11 p. $w (OCoLC)2331861.
856 40 $u http://purl.access.gpo.gov/GPO/LPS56007 $z View online version
907 $a .b37991760 $b 04-08-17 $c 07-26-05
998 $a es001 $b 07-26-05 $c m $d a $e - $f eng $g dcu $h 0 $i 1
910 $a MARCIVE
910 $a Hathi Trust report None
945 $g 0 $j 0 $l esb $o n $p $0.00 $q $r $s - $t 255 $u 0 $v 0 $w 0 $x 0 $y .i138993579 $z 07-26-05
It's a pretty standard MARC suite of tools, from Indexdata, creators of Zebra, used on most Koha deployments. Unlike Marcedit is FOSS.
Also, if you use bat, could try this syntax highlight.
Oh, I see, it's pretty close to the format that I use by default (as shown below) but not identical, it does look easy to implement so I'll take a look at implementing this in the next few weeks.
$ ./marcli -file test_1a.mrc
=LDR 01805nam a2200385 i 4500
=001 ocm57175940
=005 20041206161421.0
=006 m d f
=007 cr cn-
=008 041206s1976 dcua sb f000 0 eng c
=040 \\$aGPO$cGPO$dMvI$dMvI
=042 \\$apcc
=043 \\$an-us---
=074 \\$a0620-A (online)
=086 0\$aI 19.4/2:735
=100 1\$aSwanson, Vernon E.$q(Vernon Emmanuel),$d1922-1992.
=245 10$aGuidelines for sample collecting and analytical methods used in the U.S. Geological Survey for determining chemical composition of coal$h[electronic resource] /$cby Vernon E. Swanson and Claude Huffman, Jr.
=260 \\$a[Washington, D.C.] :$bU.S. Dept. of the Interior, U.S. Geological Survey,$c1976.
=336 \\$atext$2rdacontent.
=337 \\$acomputer$2rdamedia.
=338 \\$aonline resource$2rdacarrier.
=440 \0$aGeological Survey circular ;$v735.
=500 \\$aTitle from title screen (viewed on Dec. 06, 2004)
=504 \\$aIncludes bibliographical references.
=538 \\$aMode of access: Internet from the USGS Web site. Address as of 12/06/04: http://pubs.usgs.gov/circ/c735/index.htm; current access is available via PURL.
=650 \0$aCoal$xAnalysis.
=650 \0$aCoal$xSampling.
=700 1\$aHuffman, Claude.
=776 1\$aSwanson, Vernon Emanuel,$d1922-$tGuidelines for sample collecting and analytical methods used in the U.S. Geological Survey for determining chemical composition of coal$hiv, 11 p.$w(OCoLC)2331861.
=856 40$uhttp://purl.access.gpo.gov/GPO/LPS56007$zView online version
=907 \\$a.b37991760$b04-08-17$c07-26-05
=998 \\$aes001$b07-26-05$cm$da$e-$feng$gdcu$h0$i1
=910 \\$aMARCIVE
=910 \\$aHathi Trust report None
=945 \\$g0$j0$lesb $on$p$0.00$q $r $s-$t255$u0$v0$w0$x0$y.i138993579$z07-26-05
marcli_linux -file "test_10.mrc" -format yaz > "with_marcli_yaz.txt"
yaz-marcdump "test_10.mrc" > "with-yaz-marcdump.txt"
diff "with_marcli_yaz.txt" "with-yaz-marcdump.txt"
There are several differences. At least: no new line between records (and at the end). Also there is an extra space at the end of each data field.
ah! I didn't think about testing with multiple records in the file. Let me take a closer look. Thank you for testing it so quickly and reporting the errors!
Version 1.3.1 fixes this, the output that I get with marcli is identical to the one with yaz-marcdump. Let me know if you see other issues. Thank you!
I tested with more files. Some issues found:
- Removed 001 if 100% numeric
- Removed spaces at the end of last subfield
- Removed subfields present but empty
- Problems with some Unicode characters (Greek letters, diacritics). For e.g.: México turns to México
This lines might be handy
#!/bin/bash
find . -type f -name "*.mrc" | while read -r file; do
echo "Processing $file"
out1=$(yaz-marcdump "$file")
out2=$(marcli_linux -file "$file" -format yaz)
tmp1=$(mktemp)
tmp2=$(mktemp)
echo "$out1" > "$tmp1"
echo "$out2" > "$tmp2"
if ! diff -q "$tmp1" "$tmp2" > /dev/null; then
echo "Differences found in $file (first 5 differences side-by-side):"
diff --color=always -y --suppress-common-lines --color=always "$tmp1" "$tmp2" | head -n 5
else
echo "Outputs are identical for $file"
fi
echo
rm "$tmp1" "$tmp2"
done