climbing-data Please remove plagiarized content

This repo appears to completely consist of data scraped from MountainProject user contributions. Putting an open source license on it after scraping without permission doesn't make it open source.

Sep 27 '24 00:09 flynn-d

Duplicate/related:

https://github.com/OpenBeta/climbing-data/issues/6

See also:

https://github.com/OpenBeta/climbing-data/blob/main/CREDITS

Sep 27 '24 04:09 0xdevalias

See https://community.openbeta.io/topic/1247/letter-to-the-climbing-community

Oct 21 '24 19:10 musoke

hi @musoke, yes I'm aware of this letter from Viet and happy with the resolution on OpenBeta.io. I'm looking for a similar resolution here in this repo.

Oct 22 '24 15:10 flynn-d

My comment is intended to provide context for other readers

Oct 22 '24 16:10 musoke

@flynn-d is correct at least about the LICENSE file issue. Including the CC0-1.0 license file alongside the data is at minimum a tad confusing to mechanical processes.

To be clear: in my opinion, no reasonable human would make the mistake of thinking that the scraped data is being released under CC0-1.0. But, unfortunately, one should assume that other code is making and then propagating load-bearing assumptions on LICENSE files in GH Repos. To wit, this data is now already widely distributed under the CC0-1.0 license in many places other than this repo:

The repo has been forked several times.
The data is in the history for this repo and all forks of this repo. (People crawl, store, and distribute these histories.)
The data and associated license are likely in some GitHub crawls.
The underlying raw data from MP is certainly distributed in other open datasets, such as CommonCrawl. There is therefore risk that imperfect data processing pipelines might have cross-applied the license or attribution in this repo to that data because of content similarity.

So even if all of this data including route descriptions can be legally distributed here, by the maintainer, under fair use claims, it's still the case that attaching the CC0-1.0 license to a repo containing the scraped data (rather than just the scraping code) risks confusion. Even if the maintainer isn't ultimately responsible for that confusion, it'd still be a good move to clarify things by separating code+LICENSE from data+LICENSE.

This is a note on technical issues related to data scraping and license attribution to scraped content. I'm not a lawyer.

Oct 22 '24 17:10 nrfulton

I'll respond / make corrective actions by next week. I appreciate everyone's patience.

Oct 24 '24 14:10 vnugent

@vnugent Awaiting your resolution on removing unauthorized content from this repo (and ideally also from the git history, too).

Nov 15 '24 16:11 flynn-d

I have removed all the raw zip files. We will publish data files extracted from our own database in the near future.

@flynn-d For context, I am currently represented by the Electronic Frontier Foundation (EFF). If you are associated with Mountain Project/onX, I recommend directing all communication through their legal representatives. That said, I have always hoped to resolve conflicts amicably as members of the climbing community. However, due to legal threats from them in recent communications, I felt it necessary to seek representation.

@nrfulton I appreciate your perspective, but I'll defer interpretation of the law to qualified legal experts.

Nov 25 '24 18:11 vnugent

Thank you, Viet. I appreciate the update, this completes my request and I'll consider this issue closed. I represent only myself! I do communicate with other MP contributors and admins, but I'm here speaking for myself only.

Nov 25 '24 18:11 flynn-d