blksnap icon indicating copy to clipboard operation
blksnap copied to clipboard

Upstream status

Open Fantu opened this issue 3 years ago • 116 comments

@SergeiShtepa Hi, thanks for your works on blksnap kernel module, I think will be very useful for improve backup on linux. I'm curious about the status of its possible upstream integration, from https://github.com/veeam/blksnap/commit/85bad8f010d08fb0d08fc2c33e5ca3630dfe2a82 seems you prepared a patches serie for upstream (for send a mail with it) but I not found it posted searching on lkml.org and google. I saw also other series of months ago on other branches but I not found them on upstream mailing list and found only older your works of 1 year ago or more. Thanks for any reply and sorry for my bad english.

Fantu avatar Jun 08 '22 17:06 Fantu

Thank you for creating the issue. So it's important to someone.

Yes, I haven't tried to offer blksnap in the upstream yet. The problem is that I can't find anyone who could do a code review. And I don 't have enough experience with upstream. In addition, the module is not a regular driver for some device. I suggest adding a filter to the block layer. Mantainers may not like it. It would be great if I could attract other developers of backup tools to work on the module. But I think that while the module is not in the upstream, no one will want to invest their time in this project. Therefore, I doubt the success of this venture.

But I intend to try to do it soon. It remains to double-check something. I hope that someone will give good advice. I will add a link to the patch discussion in upstream to this issue.

Any feedback is welcome.

SergeiShtepa avatar Jun 08 '22 20:06 SergeiShtepa

Thanks for reply, I did/do small contributions on some open source projects but I never did on linux kernel, if I remember good I gave only an help in testing one xen part preparing to upstream many years ago. To know if can be acceptable your work and receive review of expert kernel developers (who can give you their opinions on it and their advices to improve where needed) post the patches is really important, from a fast look I found this doc that explain how to do it: https://www.kernel.org/doc/html/latest/process/submitting-patches.html (probably you already saw it) and from a fast look to your latest patches serie ( https://github.com/veeam/blksnap/commit/85bad8f010d08fb0d08fc2c33e5ca3630dfe2a82 ) seems you already did a good work on major requirements. I found this project in a search done after I found one blk-snap patches serie you posted 2 year ago so I think posting patches is also a way let know it to any other interested developers and users (as well as receiving upstream developers review)

Fantu avatar Jun 08 '22 21:06 Fantu

seems you already did a good work on major requirements.

Thanks. Indeed, it is the good quality of the code that is an important component of success.

a way let know it to any other interested developers and users

Of course, that's why I'll be offering a patch.

SergeiShtepa avatar Jun 09 '22 07:06 SergeiShtepa

I agree that your code represents a potentially important improvement for backup of linux systems, especially as new file systems proliferate. Blksnap is a distinct improvement over earlier such attempts like dattobd and elastio-snap. I have been lightly testing blksnap on a computer I routinely use where there are multiple file systems being used. Yours is a worthwhile project that will hopefully get merged into the mainstream kernel at some point.

andymend avatar Jun 12 '22 14:06 andymend

Thank you @andymend . Thank you all for your support. I just sent a patch. If it passes the robot check, we will see it in the list soon.

SergeiShtepa avatar Jun 13 '22 16:06 SergeiShtepa

for what I see here 2 kernel developers reviewed and replied and there is also 2 replied from an automatic test bot (reporting some warnings), for now, no one has objected or criticized the implementation method but only recommended improvements for a v2.

from a quick search I did not find other proposals for implementing "temporary" snapshots for better backup integrity and that supports many filesystems

unfortunately I have not yet had time to try blksnap but from what I have seen from experience this functionality integrated into the kernel is very useful and as I have already seen several backup solutions have made their own modules for similar things but no one who has proposed its implementation in the kernel

I have used several backup solutions over the years, most with file level backup using rsync, however even in such cases I think it would be useful because either the services stop during the whole backup or you risk not having it integrated and the larger the part to back up and the more time it takes, the greater the risk of integrity being compromised. For example, if you have to back up a server of hundreds of gb or over 1 tb (in most cases via the network the times increase), it takes hours and during the operations to numerous files in various interconnected parts and rsync buckups of the parts before such them, others during and others after would be problematic for its integrity on restore while with a snapshot and rsync backup on it you would have a backup with minimal integrity risk (reduced to the fraction of snapshot creation time FWIK) and restore you would know that you have a fairly healthy situation at the time of the snapshot and not a high risk of integrity problems for the time it took to make the backup. And without such a snapshot, the risks can only be mitigated by stopping the services (as mentioned) or doing it outside working hours.

Another thing I tried at works in latest years is to use btrfs with integrated snapshot for backups (I also tried zfs years ago) but I use only in some cases because for example on virtualization systems, the COW it has a significant impact on performance and there are also some space issues keeping snapshots on the same production filesystem. this could be a solution to make a backup with minimum downtime stopping services to take the snapshot (with blksnap) and maximum integrity probability, or at least making the snapshot to active services like what a COW filesystem like btrfs and zfs can do but without having to use such filesystems.

I hope these examples based on my experience give an idea of how useful a system like blksnap is When there is a calm period where I am not too busy I will start testing it in the meantime thanks again for your work SergeiShtepa

sorry for my bad english

Fantu avatar Jun 20 '22 12:06 Fantu

Thanks @Fantu.

I expected a very calm reaction to the patch. I plan to:

  • correct a flaws, indicated by the reviewers
  • move some of the logic into the kernel module, simplifying the main kernel code even more
  • fix bugs (QA found something).

So, the next patch will be better.

Alas, the performance of the blksnap module should not be better than DM snapshots (LVM2). When handling a write request, it is required to read a chunk of data, and then write this chunk of data to a new location. There are no magic here. When writing, the load on the drive increases threefold.

Theoretically, snapshots on Btrfs should be better. But for BTRFS, we cannot read the entire block device. We have to synchronize the source and target file systems. The Veeam Agent for Linux implements btrfs backup support by its native means. The performance was not the best.

The presence of a change tracker and the ease of allocating space for the difference storage are the main advantages of blksnap, compared to "classic" snapshots.

SergeiShtepa avatar Jun 20 '22 15:06 SergeiShtepa

thanks for reply and your works about btrfs I know that is better where there is no "limit" or performance issue (for example I use on some backup storage and fileservers), but for example in "low cost" virtualization servers there was significant performance issue, disabling COW partially solves the issue because using snapshot will require it and ignore the previous options to disable cow, so still using ext4 and do temp. snapshot only on backup time, avoid the downtime or having it shortly, having high integrity probability and lower performance only when backup with snapshot is running still seems to me a good idea. however I can only be helpful enough when I start using it

Fantu avatar Jun 20 '22 15:06 Fantu

Hello everyone. I am very glad that the first patch was noticed and I received quite useful feedback. A lot will have to be rewritten. And that's great.

SergeiShtepa avatar Jul 14 '22 09:07 SergeiShtepa

hi, is there any news about v2 patch serie for upstream? I suppose anyway it would not be reviewed in time for the kernel 6.0 :( edit: merge window for 6.0 is already closed, I hope for the 6.1

Fantu avatar Aug 18 '22 16:08 Fantu

Hi. That's what I really want to do. But I had to switch to more urgent tasks. I hope to be able to reduce my task list soon.

SergeiShtepa avatar Aug 24 '22 09:08 SergeiShtepa

Hi!

This looks like a workable version of the patch for linux kernel 6.1-rc1 . It contains corrections in accordance with the comments that were made by the maintainers to the first version of the patch. I plan to test this for two or three days. I invite everyone to join the testing.


Feedback is welcome.

SergeiShtepa avatar Oct 20 '22 15:10 SergeiShtepa

This patch has been sent.

SergeiShtepa avatar Nov 02 '22 15:11 SergeiShtepa

good, I not checked it in detail but I noted a small mistake on the version of the patch serie posted, v1 instead of v2, but if you have sent it by now there is nothing else to do (or they would receive it double), I saw there is at least a list of changes from v1 in the cover so it should be "ok" anyway

edit: found the link of the serie posted, I put here if can be useful: https://lore.kernel.org/lkml/[email protected]/

Fantu avatar Nov 02 '22 16:11 Fantu

Link to patchwork.

SergeiShtepa avatar Nov 03 '22 11:11 SergeiShtepa

@SergeiShtepa with a search I found this: https://lwn.net/Articles/914852/ and seems that wrote about a lack of documentation to be accepted in addition to the documentation in code that probably need improvements I suppose they meaning to add another "generic" in Documentation/, I suppose in Documentation/block creating Documentation/block/blksnap.rst and adding in the list of Documentation/block/index.rst I don't saw reply to lack of documentation in the replies of the patch serie posted, you received other reply directly and not in mailing list?

Fantu avatar Nov 15 '22 20:11 Fantu

Thanks. I don't know anything about it. I received one email about an extra space and 3 emails from the robot. I'll try to add a couple of comments to the article.

SergeiShtepa avatar Nov 15 '22 21:11 SergeiShtepa

@SergeiShtepa I saw your replies on lwn, unfortunately even as you explained you want to avoid additional time being wasted reviewing documentation before it is accepted upstream I suppose without more documentation some people won't start reviewing it. For now on mailing list of patch posted I didn't see new replies :( I did a fast check with kernel-doc script of the kernel-doc comments actually present and spotted some warning: one I suppose solved with https://github.com/Fantu/linux-blksnap/commit/4f7c2365bcd5acc27a7bd546dc20b927fbf1b576 others spotted:

drivers/block/blksnap/tracker.h:47: warning: Function parameter or member 'flt' not described in 'tracker'
drivers/block/blksnap/tracker.h:47: warning: Function parameter or member 'submit_lock' not described in 'tracker'

drivers/block/blksnap/diff_storage.c:21: warning: Function parameter or member 'link' not described in 'storage_bdev'
drivers/block/blksnap/diff_storage.c:21: warning: Function parameter or member 'dev_id' not described in 'storage_bdev'
drivers/block/blksnap/diff_storage.c:21: warning: Function parameter or member 'bdev' not described in 'storage_bdev'

drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'link' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'bdev' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'sector' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'count' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'used' not described in 'storage_block'

Fantu avatar Nov 22 '22 19:11 Fantu

To read the entire article https://lwn.net/Articles/914031/ and comments before December 1, you need to be a registered user. After December 1st, access will be unlimited. I tried to answer all the questions. I plan to work on the documentation and try to write an article for LWN. Maybe it will help.

spotted some warning:

Thanks, I'll have to check it out too for next patch.

Perhaps the next v2 patch with documentation and accompanied by an interesting article will attract more attention.

SergeiShtepa avatar Nov 23 '22 15:11 SergeiShtepa

@SergeiShtepa thanks for replies and your work in practice v2 would be the last one posted and v3 the next even if it was written v1, I wouldn't want that perhaps having written v1 had led some people to suppose in a resend without changes, for example it seems strange to me that Christoph Hellwig who had participated a lot in the the first did not give any answer to the second (probably can be good add it in cc of the next) looking also on other sites the article with write about "the lack of doc." seems to have left its mark :( https://www.reddit.com/r/linux/comments/yvd14a/blockdevice_snapshots_with_blksnap_lwnnet_good/ probably a v3 with more documentation and even just with the bot's warning fixes and Randy's reporting will push sites to update about it and attract more people to review

Fantu avatar Nov 23 '22 15:11 Fantu

@SergeiShtepa in the patch serie is missed the add of entry in MAINTAINERS file

Fantu avatar Nov 26 '22 21:11 Fantu

@SergeiShtepa in the patch serie is missed the add of entry in MAINTAINERS file

Maybe... I'll add this in the next patch.

SergeiShtepa avatar Nov 28 '22 10:11 SergeiShtepa

Add docs for /linux/Documentation/block/ : blkfilter and blksnap . I really hope that I managed to find the right words.

Any feedback is welcome.

P.S.: Since yesterday article is available without registration.

SergeiShtepa avatar Dec 02 '22 11:12 SergeiShtepa

@SergeiShtepa thanks for the creation of documentation to include in the patch for upstream my english isn't great but i read them, blkfilter it seems to be ok, about blksnap there is a non-english sentence in "Snapshot overflow resistance" and the part of defects of the alternatives and merits of blksnap leaves me doubtful, I'm afraid that someone who will review or want to watch/try blksnap will get lost in counterproductive discussions.

about btrfs "Obviously cannot be applied to other file systems.", I suppose you mean the fact that the main advantages of transferring to another local or remote destination are lost from or to a different filesystem, and it is, but written like this is wrong, in practice it is possible to use it even if most of the advantages are lost. You can use it both for "temporary" snapshots during which you make a backup (for example with rsync) to another destination; and from a source without btrfs, you make backups with another (for example rsync) and on the destination you use the btrfs snapshots to manage the "history" with some advantages (I've been using it on some systems for years).

the rest of the disadvantages I saw in practice during a long testing some years ago thinking about using btrfs instead of ext4 on several virtualization systems to manage better and faster backups but the performance impact was huge (partly due to the use of hdd) and even if only the last corresponding snapshot was kept for send/receive with the destination there were disadvantages, reduced in part with a single destination and daily backup but considerable with multiple destinations (in my case 2) in "rotation" and having to keep a week or even more the snapshots. I've seen what the arrival of the btrfs fs to full space caused by snapshot, the damage it causes especially with the virtual machine (which do not have direct access to the fs and "amplify" the problem) and also the fact that it is not "automatically managed/avoided" and the workaround of scripting to see if it's about to fill up and deleting the snapshot(s) caused most of the "PROs" to decay when that happens, which is why we didn't use it in production for that use case

however without listing all the possibilities I think it is wrong to write things like "the blksnap module is the best way to create snapshots for backup tools", it is in good part of cases but not all. I'm not good at writing and I don't quite know how to describe such parts properly, objectively and correctly but I think that the documentation it should be done that way people should just understand the need for this method, its importance for a good chunk of cases and the need to get it good and integrated upstream

sorry if I didn't explain myself well and for my bad english

Fantu avatar Dec 02 '22 16:12 Fantu

@Fantu - Thanks for the feedback!

I will try to improve those sections that you have noticed. After the weekend, I will review the documentation again. Maybe my colleagues from the technical documentation department will help me.

My English is also not as good as I would like. So I'm trying to do what I'm capable of. :) Creating documentation is hard work, and now it looks really important.

I can suggest you try to make the document better. Why not try? Even if I don't accept your merge request, it will allow me to take a fresh look at some sections.

SergeiShtepa avatar Dec 02 '22 17:12 SergeiShtepa

@SergeiShtepa saw the changes, now is better, you missed to fix non-english sentence in "Snapshot overflow resistance"

Fantu avatar Dec 05 '22 11:12 Fantu

Prepared a patch blksnap_lk6.1-rc8. There are problems with kernel-docs. I'll fix it. Everyone can try to build a patch, launch it, criticize it, suggest changes. I will test the patch on different architectures and in different test labs.

SergeiShtepa avatar Dec 06 '22 17:12 SergeiShtepa

@SergeiShtepa thanks for your works before I posted a comment linking doc files in the article: https://lwn.net/Articles/917121/ about documentation of next patch serie I did fast test (make htmldocs), link is missed, tried fast to add it and generate again: https://github.com/Fantu/linux-blksnap/commit/0ee9d76019867bdde70e513e10e57398f75991f9

in maintainers file there is a small typo (reported in commit) I suppose will be good have also a git repo (like linux-blksnap to add in https://github.com/veeam) and add in it for make easier keep patch serie for user/dev use, test, rebase etc; for report specific issue, propose PR, after you (as blksnap maintainer) can review/add/improve them before post to kernel mailing list etc

Fantu avatar Dec 06 '22 19:12 Fantu

Thanks. make htmldocs - i`ll check it, and fix.

will be good have also a git repo (like linux-blksnap

I have a local version. Maybe it makes sense to synchronize it with the server.

SergeiShtepa avatar Dec 07 '22 10:12 SergeiShtepa