data_tooling icon indicating copy to clipboard operation
data_tooling copied to clipboard

Create dataset hal_archives_ouvertes

Open albertvillanova opened this issue 4 years ago • 3 comments

  • uid: hal_archives_ouvertes
  • type: primary
  • description:
    • name: HAL archives ouvertes

    • description: HAL is an open archive where authors can deposit scholarly documents from all academic fields.

      For the attention of the authors The deposit of the full text should be made in agreement with the co-authors and in the respect for the policy of the publishers. The deposit is subject of a control, HAL reserves the right to refuse items that do not meet the criteria of the archive. Any deposit is definitive, no withdrawals will be made after the on-line posting of the publication. Text files in pdf format or image files are sent to CINES for long-term archiving.

      For the attention of the readers In a context of electronic distribution, each author keeps their intellectual property rights.

    • homepage: https://hal.archives-ouvertes.fr/

    • validated: True

  • languages:
    • language_names:
      • English
      • French
      • Portuguese
      • Spanish
    • language_comments: also has (much fewer) data in other BigScience languages
    • language_locations:
      • Europe
    • validated: False
  • custodian:
    • name: CNRS
    • in_catalogue:
    • type: A university or research institution
    • location: France
    • contact_name: HAL support
    • contact_email: [email protected]
    • contact_submitter: False
    • additional: https://en.wikipedia.org/wiki/HAL_(open_archive)
    • validated: False
  • availability:
    • procurement:
      • for_download: No - but the current owners/custodians have contact information for data queries
      • download_url:
      • download_email: [email protected]
    • licensing:
      • has_licenses: Yes

      • license_text: Moissonnage : conditions d’utilisation des données

        Les métadonnées de HAL peuvent être consultées de façon totale ou partielle par moissonnage dans le respect du code de la propriété intellectuelle. Pas d’utilisation commerciale des données extraites. Obligation de citer la source (exemple : hal.archives-ouvertes.fr/hal-00000001).

      • license_properties:

        • multiple licenses
        • copyright - all rights reserved
        • open license
        • research use
      • license_list:

    • pii:
      • has_pii: Yes
      • generic_pii_likely: somewhat likely
      • generic_pii_list:
        • names
      • numeric_pii_likely: unlikely
      • numeric_pii_list:
      • sensitive_pii_likely: very likely
      • sensitive_pii_list:
        • racial or ethnic origin
        • political opinions
        • religious or philosophical beliefs
      • no_pii_justification_class:
      • no_pii_justification_text:
    • validated: False
  • source_category:
    • category_type: collection
    • category_web:
    • category_media: scientific articles/journal
    • validated: False
  • media:
    • category:
      • text
    • text_format:
      • .PDF
    • audiovisual_format:
    • image_format:
    • database_format:
    • text_is_transcribed: No
    • instance_type: article
    • instance_count: 100K<n<1M
    • instance_size: 100<n<10,000
    • validated: False
  • fname: hal_archives_ouvertes.json

albertvillanova avatar Nov 23 '21 11:11 albertvillanova

#self-assign

cakiki avatar Feb 13 '22 07:02 cakiki

Metadata: https://huggingface.co/datasets/bigscience-catalogue-data/hal_archives_ouvertes/blob/main/hal_archives_ouvertes_metadata.jsonl.gz

Metadata sample:

{'openAccess_bool': True,
 'domainAllCode_s': ['scco.psyc'],
 'en_title_s': ['Self-motion and the perception of stationary objects'],
 'title_s': ['Self-motion and the perception of stationary objects'],
 'abstract_s': ["One of the ways we perceive shape is through seeing motion. Visual motion may be actively generated (for example, in locomotion), or passively observed. In the study of the perception of 3D structure from motion (SfM), the non-moving, passive observer in an environment of moving rigid objects has been used as a substitute for an active observer moving in an environment of stationary objects; the 'rigidity hypothesis' has played a central role in computational and experimental studies of SfM. Here we demonstrate that this substitution is not fully adequate, because active observers perceive 3D structure differently from passive observers, despite experiencing the same visual stimulus: active observers' perception of 3D structure depends on extraretinal self-motion information. Moreover, the visual system, making use of the self-motion information treats objects that are stationary (in an allocentric, earth-fixed reference frame) differently from objects that are merely rigid. These results show that action plays a central role in depth perception, and argue for a revision of the rigidity hypothesis to incorporate the special case of stationary objects."],
 'journalTitle_s': 'Nature',
 'journalIssn_s': '0028-0836',
 'journalEissn_s': '1476-4679',
 'authLastName_s': ['Wexler', 'Panerai', 'Lamouret', 'Droulez'],
 'authFirstName_s': ['Mark', 'Francesco', 'Ivan', 'Jacques'],
 'language_s': 'en',
 'halId_s': 'hal-00000019',
 'uri_s': 'https://hal.archives-ouvertes.fr/hal-00000019',
 'docType_s': 'ART',
 'publicationDate_tdate': '2001-01-01T00:00:00Z',
 'fileMain_s': 'https://hal.archives-ouvertes.fr/hal-00000019/document',
 'files_s': ['https://hal.archives-ouvertes.fr/hal-00000019/file/nature.pdf']}

cakiki avatar Feb 13 '22 08:02 cakiki

language files
en 614,053
fr ✔️ 402,232
undetermined 54,033
es 6,067
it 2,024
pt 1,794
de 1,408
ru 557
eu 213
uk 205
zh 201
ja 130
ar 109
pl 109
el 106
hy 95
cs 93
ro 67
oc 56
ca 55
da 54
mr 39
tr 38
vi 34
ko 34
sq 33
nl 33
bg 28
br 21
fa 21
eo 20
id 16
mg 15
hu 15
sv 10
te 9
hr 8
fi 8
no 8
sr 7
he 7
et 7
qu 7
sk 6
lt 6
hi 5
la 5
ms 4
sw 4
ta 3
kk 3
gl 3
co 2
tl 2
mn 2
az 2
ne 2
so 2
mk 2
iu 2
sl 2
be 2
th 2
fl 1
km 1
gn 1
ie 1
bm 1
is 1
ba 1
se 1
bs 1
fo 1
af 1
tk 1
lv 1
sa 1
zu 1
bo 1
0 1
ur 1

cakiki avatar Feb 13 '22 16:02 cakiki