pyvips Flush support while creating large images

Hi, I would like to ask if there is any way in pyvips to flush a pipeline during writing, for two reasons:

write with streaming input
update image based on already written parts.

(from what I found, it seems so far that it is only support a single write at the end of the pipeline)

What I'm trying to do is to create a very large tiff image (about 165000 x 465000 ) made of several smaller images. The problem is that the images are streaming and one image have to be deleted before next image is available, Another problem is that to write the next part of the image, I need to use data from what (should) have been written by now (which is unavaialable as the pipeline haven't started yet). but it appears that once a pipeline in pyvips is written it cannot be read again.

so I'd like to ask, is there any way to flush the image or initiate the pipeline step by step? and... if so, is there also a way to read from the flushed data?

Many thanks for this library and for your support.

May 13 '20 08:05 BLooperZ

Hi @BLooperZ,

This is supposed to be automatic, I think. Could you give some more details? How many images are you trying to join? What resource is being exhausted?

May 13 '20 10:05 jcupitt

If you can make a small, sample program that does shows the problem, that would be great.

May 13 '20 10:05 jcupitt

Thank you so much for the quick response, There are about 3500 images of size 3600 x 2400.

Here are the sample code I have came up with for demonstration:

import os
import pyvips
import numpy as np

INPUT = 'input.jpg'

# from: https://github.com/libvips/pyvips/blob/master/examples/pil-numpy-pyvips.py
from npvips import vips2numpy, numpy2vips

def generate_image(x, y):
    im = pyvips.Image.black(100, 100)
    im = im.colourspace(pyvips.Interpretation.SRGB)
    im = im.draw_circle([(x * 129 + y * 31 + 190) % 256, (x * 39) % 256, (y * 12) % 256], 50, 50, 50, fill=True)
    # for demonstration: file is overriden so data is not available on next iteration
    im.write_to_file(INPUT) 
    return INPUT, 100, 100


def transform_tile(im, tile, x, y):
    # Assume this was external computation which uses numpy
    if x == 0:
        return numpy2vips(vips2numpy(tile))
    left = vips2numpy(im.crop(x - 50, y, 50, 100))
    right = vips2numpy(tile.crop(0, 0, 50, 100))
    arr = np.hstack([right, left])
    return numpy2vips(arr)

def generate_images(cols, rows):
    for row in range(rows):
        for col in range(cols):
            im_file, width, height = generate_image(row, col)
            # tiles are not actually aligned in the general use case, this just simplify thing in this example
            yield im_file, (col * width, row * height)

# is this how to create a new image?
im = pyvips.Image.black(1000, 1000)
im = im.colourspace(pyvips.Interpretation.SRGB)

for im_file, (x, y) in generate_images(10, 10):
    tile = pyvips.Image.new_from_file(im_file, access=pyvips.Access.SEQUENTIAL)
    
    # this and the next block are mutually exclusive, if both executed the image is completely black
    transformed = transform_tile(im, tile, x, y)
    im = im.insert(transformed, x, y, expand=True, background=[255.0, 255.0, 255.0])

    # EDIT: this line is what i meant by "next block"
    # im = im.insert(tile, x, y, expand=True, background=[255.0, 255.0, 255.0])

    # # if uncommenting this, the circles will also become completely black
    # # console prints: vips warning: linecache: error reading tile 0x88: VipsJpeg: out of order read at line 100
    # im.write_to_file(
    #     'im.tif',
    #     compression='jpeg',
    #     tile=True, tile_width=512, tile_height=512,
    #     pyramid=True, bigtiff=True
    # )

im.write_to_file(
    'im.tif',
    compression='jpeg',
    tile=True, tile_width=512, tile_height=512,
    pyramid=True, bigtiff=True
)

While written the example I also had some related discovery, it seems that if the image is read (vips2numpy reads a buffer) the pipeline does make a progress (resulting with circles of different colors) but then the image must be created again or it become unusable (doing insert make the image black)

if the image have not been read, all circles are of the same (last) color.

I think the solution is, therefore to read a cropped image to make the pipeline flush, but I can't say I understand why it works.

May 13 '20 14:05 BLooperZ

If you have images coming from numpy, they will all need to be in RAM. That's a lot of pixels :( 3600 * 2400 * 3500 == 30GB of memory.

You'll probably get a stack overflow as well if you do 3500 x = x.insert(y ..) in a row. That's making a very deep pipeline.

I'd guess you'll need to go via the filesystem -- write your 3500 images to a set of files and then assemble them, perhaps in sections.

May 13 '20 14:05 jcupitt

Thank you, The input itself is indeed in filesystem but get overriden, do you mean it have to be copied? (I would like to avoid that) My question is, is it possible to use the sink itself or some other temporary file(/s), to serve as "working image" (to read current image state, and to write the changes incrementally)? (and then write the final state of the image to the target file if it wasn't written already)

(or perhaps that what you meant by going through the filesystem?)

May 13 '20 15:05 BLooperZ

If you're using numpy, you will need all the pixels in memory, there's no way around that.

You could perhaps swap numpy for pyvips, that should help a bit. What kind of processing are you doing?

May 14 '20 11:05 jcupitt

Sorry, but I don't know exactly what is happening during processing, might be some kind of blending. I'd rather ignore the processing itself for now. Perhaps it is possible to save the results to temporary files (as sections?) and join them afterwards? I see there is support for temporary files, but do not understand when and how to use them, and could not find an example. Thank you.

May 17 '20 07:05 BLooperZ

Yes, i would sort your tiles by y and write the image in a series of sections. You'll need to add an alpha so you know which parts are unused. Save the intermediates as vips format (.v), then use composite to join them up.

May 17 '20 09:05 jcupitt

I made you a tiny demo:

#!/usr/bin/python3

import sys
import random
import pyvips

# we generate the image in a series of sections, then join the sections up in a
# second pass
image_width = 150000
image_height = 500000
n_images = 10000
tile_height = 2048

# the top and height of this section come from argv
section_top = int(sys.argv[3])
section_height = int(sys.argv[4])

# the [x, y] position of each tile 
# you'll have a fixed set of positions you reuse, we just use a fixed seed here
random.seed(0)
positions = [[random.randint(0, image_width), random.randint(0, image_height)]
             for _ in range(n_images)]

# find the set of tiles we paint for this section
clipped_tiles = list(filter(lambda p: 
    p[1] > section_top - tile_height and \
        p[1] < section_top + section_height, 
    positions))

print(f'generating {len(clipped_tiles)} tiles ...')

# now you know the tiles that need painting for this section, use numpy to
# generate just that subset
# we reuse a single tile for simplicity 
tile = pyvips.Image.new_from_file(sys.argv[1])

print(f'building pipeline ...')

# the background we paint into 
image = pyvips.Image.black(image_width, section_height, bands=3)
image = image.copy(interpretation='srgb')

for x, y in clipped_tiles:
    image = image.insert(tile, x, y - section_top)

previous_progress = -1
def eval_handler(image, progress):
    global previous_progress

    if progress.percent > previous_progress:
        print(f'{progress.percent}%, {progress.eta}s to go')
        previous_progress = progress.percent

# feedback eval progress
image.set_progress(True)
image.signal_connect('eval', eval_handler)

print(f'writing {sys.argv[2]} ...')

image.write_to_file(sys.argv[2])

Run this repeatedly to generate a series of sections:

./sectionjoin.py ~/pics/k2.jpg s01.v 0 10000
./sectionjoin.py ~/pics/k2.jpg s02.v 10000 10000
./sectionjoin.py ~/pics/k2.jpg s03.v 20000 10000
...

Then join the sections up:

vips arrayjoin "$(echo s*.v)" output.tif

May 17 '20 11:05 jcupitt

You could do it all in one python file, but it's probably safer to just run it several times. You'll need a lot of disc space. You can probably make fewer and taller sections, it depends how much RAM you have. I'd use a bash loop (obviously) for section generation.

May 17 '20 11:05 jcupitt

Thank you, I'll try it

May 20 '20 10:05 BLooperZ