ph-submissions icon indicating copy to clipboard operation
ph-submissions copied to clipboard

Timeline summarization for large-scale past-web events with Python

Open hawc2 opened this issue 1 year ago • 9 comments

Programming Historian in English has received a proposal for a translation from Portuguese, with the provisional title 'Timeline summarization for large-scale past-web events with Python: the case of Arquivo.pt' by @dcgomes and @rncampos.

I have circulated this proposal for feedback within the English team. We have considered this proposal for:

  • Openness: we advocate for use of open source software, open programming languages and open datasets
  • Global access: we serve a readership working with different operating systems and varying computational resources
  • Multilingualism: we celebrate methodologies and tools that can be applied or adapted for use in multilingual research-contexts
  • Sustainability: we're committed to publishing learning resources that can remain useful beyond present-day graphical user interfaces and current software versions

We are pleased that @dcgomes and @rncampos have developed this Proposal into a Submission to be developed under the guidance of @caiocmello as editor.

@dcgomes and @rncampos have shared their Submission package with our Publishing team by email. Our Publishing team will now process the new translation materials, and prepare a Preview of the initial draft. They will post a comment in this Issue to provide the locations of all key files, as well as a link to the Preview where contributors can read the lesson as the draft progresses.

Our dedicated Ombudspersons are Ian Milligan (English), Silvia Gutiérrez De la Torre (español), Hélène Huet (français), and Luis Ferla (português). Please feel free to contact them at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudspersons will have no impact on the outcome of any peer review.

hawc2 avatar May 30 '24 23:05 hawc2

Hello @caiocmello, @dcgomes and @rncampos

You can find the key files here:

You can review a preview of the lesson here:


There are a couple small things which I noticed when processing this submission, outlined below. @dcgomes and @rncampos, I would be grateful if you could address them at this stage of publication (Phase 1: Submission).

  • [ ] You'll notice that I transferred the Google Colab notebook you provided us into a markdown file instead, with cells formatted as code blocks. If you'd like to add a notebook as an additional, optional way for readers to run the code in a cloud environment, I can help set that up for you. We ask that notebooks only contain lesson headings and code, so I would remove all lesson text + images and videos.
  • [ ] When you referred to Youtube videos, I changed the wording to embed them as hyperlinks in the text, rather than video thumbnails. We do have a way to embed videos so they can be played from thumbnails directly in the lesson text, but we are still looking into whether this adheres to our commitment to site lightness. Are you happy to keep to hyperlinks for now?
  • [ ] I think beginning with a 'video summarizing the tutorial' is a little unusual for our lessons. Perhaps this would work better at the end of the lesson, in a 'further resources' section instead? I'm sure @caiocmello will be able to guide on this decision.
  • [ ] I noticed that figures 2 and 3 were still in Portuguese. Are you able to provide alternatives using the English language interface? You also left a note (line 309 of the markdown file) to say you wanted to replace figure 1 with a query "Jorge Sampaio" instead. Could you please send me these images as .png or .jpg to publishing.assistant[@]programminghistorian.org, ideally no more than 840 pixels on the longest side? Thank you.
  • [ ] There is a second author's note just below (line 311) – could you please check whether this is resolved?
  • [ ] In the section ### Where can I find ContaMeHistorias.pt (Tell me Stories)?, you linked to the Conta-me Historias app using https://play.google.com/store/apps/details?id=com.app.projetofinal. We've just closed an Issue removing this link from the original PT version, because it seemed broken. Do you have a working link for the app yourselves? I've currently removed the link from your translation, but we can always add another one back in.

Thank you very much to all ✨

Charlotte

charlottejmc avatar May 31 '24 07:05 charlottejmc

Thank you for processing the files and setting up the preview, @charlottejmc! ✨

--

Olá Daniel @dcgomes and Ricardo @rncampos,

Thank you for your work on this translation!

One further note is that we don't yet have English translations of the alt-text or captions for the figure images. The Portuguese original text is as follows:

  1. alt="Pesquisa por Jorge Sampaio através do componente narrativa do Arquivo.pt" caption="Figura 1: Pesquisa por 'Jorge Sampaio' através da componente narrativa do Arquivo.pt."
  2. alt="Resultados da pesquisa por Jorge Sampaio no Conta-me Histórias para o periodo compreendido entre 07/04/2016 e 17/11/2016" caption="Figura 2: Resultados da pesquisa por 'Jorge Sampaio' no *Conta-me Histórias* para o periodo compreendido entre 2016-04-07 e 2016-11-17."
  3. alt="Jorge Sampaio formaliza apoio a Sampaio da Nóvoa" caption="Figura 3: Jorge Sampaio formaliza apoio a Sampaio da Nóvoa."
  4. alt="Nuvem de palavras com os termos relacionados com a pesquisa Jorge Sampaio ao longo de 10 anos" caption="Figura 4: Nuvem de palavras com os termos relacionados com a pesquisa por 'Jorge Sampaio' ao longo de 10 anos."

Could you share this with Charlotte and I (either as a comment here in the Issue, or by email)?

Thank you, Anisa

anisa-hawes avatar May 31 '24 16:05 anisa-hawes

Hi here go my answers. Please let me now if there is anything else missing on our side. Thanks.

  • [x] You'll notice that I transferred the Google Colab notebook you provided us into a markdown file instead, with cells formatted as code blocks. If you'd like to add a notebook as an additional, optional way for readers to run the code in a cloud environment, I can help set that up for you. We ask that notebooks only contain lesson headings and code, so I would remove all lesson text + images and videos.

OK, I agree. Please set up the colab notebook as you feel more adequate for the target audience.

  • [x] When you referred to Youtube videos, I changed the wording to embed them as hyperlinks in the text, rather than video thumbnails. We do have a way to embed videos so they can be played from thumbnails directly in the lesson text, but we are still looking into whether this adheres to our commitment to site lightness. Are you happy to keep to hyperlinks for now?

Yes.

  • [x] I think beginning with a 'video summarizing the tutorial' is a little unusual for our lessons. Perhaps this would work better at the end of the lesson, in a 'further resources' section instead? I'm sure @caiocmello will be able to guide on this decision.

OK, I agree.

  • [x] I noticed that figures 2 and 3 were still in Portuguese. Are you able to provide alternatives using the English language interface? You also left a note (line 309 of the markdown file) to say you wanted to replace figure 1 with a query "Jorge Sampaio" instead. Could you please send me these images as .png or .jpg to publishing.assistant[@]programminghistorian.org, ideally no more than 840 pixels on the longest side? Thank you.

Done. 3 new images sent to publishing.assistant[@]programminghistorian.org.

  • [x] There is a second author's note just below (line 311) – could you please check whether this is resolved?

Fixed.

  • [ ] In the section ### Where can I find ContaMeHistorias.pt (Tell me Stories)?, you linked to the Conta-me Historias app using https://play.google.com/store/apps/details?id=com.app.projetofinal. We've just closed an Issue removing this link from the original PT version, because it seemed broken. Do you have a working link for the app yourselves? I've currently removed the link from your translation, but we can always add another one back in.

I am waiting @rcampos answer on this matter.

One further note is that we don't yet have English translations of the alt-text or captions for the figure images. The Portuguese original text is as follows:

  • alt="Pesquisa por Jorge Sampaio através do componente narrativa do Arquivo.pt" caption="Figura 1: Pesquisa por 'Jorge Sampaio' através da componente narrativa do Arquivo.pt."

alt="Search for 'Jorge Sampaio' using the Narrative component of Arquivo.pt" caption="Figura 1: Search for 'Jorge Sampaio' using the Narrative component of Arquivo.pt."

  • alt="Resultados da pesquisa por Jorge Sampaio no Conta-me Histórias para o periodo compreendido entre 07/04/2016 e 17/11/2016" caption="Figura 2: Resultados da pesquisa por 'Jorge Sampaio' no Conta-me Histórias para o periodo compreendido entre 2016-04-07 e 2016-11-17."

alt="Search results for 'Jorge Sampaio' on Conta-me Histórias (Tell me Stories)" caption="Figura 2: Search results for 'Jorge Sampaio' on Conta-me Histórias (Tell me Stories)."

  • alt="Jorge Sampaio formaliza apoio a Sampaio da Nóvoa" caption="Figura 3: Jorge Sampaio formaliza apoio a Sampaio da Nóvoa."

alt="Web-archived news page linked from the Conta-me Histórias search results." caption="Figura 3: Web-archived news page linked from the Conta-me Histórias search results."

  • alt="Nuvem de palavras com os termos relacionados com a pesquisa Jorge Sampaio ao longo de 10 anos" caption="Figura 4: Nuvem de palavras com os termos relacionados com a pesquisa por 'Jorge Sampaio' ao longo de 10 anos."

alt="Word cloud with terms related to Jorge Sampaio research over 10 years" caption="Figura 4: Word cloud with terms related to Jorge Sampaio research over 10 years."

dcgomes avatar Jun 20 '24 15:06 dcgomes

Hello @dcgomes,

Thank you for sending over the replacement images, and translating the alt text and captions. I've now made these updates for your lesson.

charlottejmc avatar Jun 26 '24 09:06 charlottejmc

Hello Daniel @dcgomes and Ricardo @rncampos (So lovely to see you in Lisboa last week, @rncampos!),

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.

In this phase, your editor Caio @caiocmello will read your lesson, and provide some initial feedback. Caio will post feedback and suggestions as a comment in this issue, so that you can revise your draft in the following phase (Phase 3: Revision 1).

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Assistant (@charlottejmc) 
All  Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@caiocmello)  
Expected completion date? : July 26
Section Phase 3 <br> Revision 1
Who's responsible? : Author (@author) 
Expected timeframe? : ~30 days after feedback is received

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

anisa-hawes avatar Jun 26 '24 16:06 anisa-hawes

Dear authors @rncampos and @dcgomes,

Thanks very much for your interest in publishing a translation of your lesson with the Programming Historian in English and for your work on this piece. As a first step of this editorial process, I have read your original lesson published in Portuguese, as well as the translated version into English to come up with suggestions to improve readability and accessibility. By reading this lesson, I have encountered some issues that make it difficult to understand and follow. As we believe translations can also be an opportunity for refinement, I would like to share with you some of the questions that came up during this initial phase. I understand that this represents some substantial work. Is this something you feel you have capacity to address within the coming months? We can be flexible with our timeline for author revisions, which is usually ~1 month. If you do have the interest and capacity, I'd be happy to provide you with more focused line edits to support that work.

Suggestions:

  1. Title

I have the impression that the title could match better what the lesson does. The current title ‘Timeline summarisation for large-scale past-web events with Python’, gives the impression that the lesson will focus on how to create a timeline for summarising events. However, the project ‘Tell me Stories’, which is about summarisation, just starts to be addressed from Paragraph 42. Until then, the lesson is focused on the web archive API and its use for data retrieval. My sense is that a title such as ‘An introduction to the Portuguese Web Archive’s API for data retrieval’ would be clearer to readers who are using this lesson.

  1. Concepts definition

I’m not sure if I understood correctly what you mean by timeline summarisation (or sumarização de narrativas, in Portuguese). Please, correct me if I am wrong, but it seems like by using the term ‘summarisation’ you mean selecting the relevant news articles based on a given topic. Is this correct? If so, I think it would be important to define some key concepts used in this lesson (especially considering an interdisciplinary audience). Besides 'summarisation', another concept is ‘timeline’. In Paragraph 3 (line 4) it says: ‘In this context, timelines (automatic temporal summarisation systems)’. Is ‘automatic temporal summarisation systems’ the definition of timeline? It would be nice if you could clarify the origin of this term (if it is used in some specific context) and what it means.

  1. Tell Me Stories

It feels to me like the lesson could benefit from more detailed information about this tool. You mention in Paragraph 52, the use of a tool called YAKE! to determine the relevance of a news article based on a given topic. What are the main mechanisms used by the tool to determine relevance? Just a very brief comment on that would be helpful. I'm unclear about how ‘related terms’ are defined in this context. Are they the result of NER (meaning entities mentioned in the articles considered as relevant)?

  1. Videos

I noticed that you've included a sequence of links to videos. Although these are interesting further resources, they are not essential for understanding the content of the lesson. I think they work well as extra content and I would suggest placing them as a list of ‘references’ or 'further resources' at the end of the lesson.

I look forward to hearing your reflections on this feedback. Please have a think about whether you feel you have capacity to work through these adaptations for the English version of your lesson in the coming months. To reiterate, I'd be happy to share some more detailed feedback and line edits to support your work to make this translation accessible to a broader (multilingual and interdisciplinary) audience.

caiocmello avatar Jun 29 '24 18:06 caiocmello

Hello Daniel @dcgomes and Ricardo @rncampos ,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.

This phase is an opportunity for you to revise your draft in response to @caiocmello's initial feedback.

Ricardo @rncampos, I've sent you an invitation to join us as an Outside Collaborator here on GitHub. This will give you the 'write access' you'll need to edit your lesson directly. Daniel @dcgomes, I've checked to ensure that you have the 'write access' you'll need to edit the draft directly.

We ask authors to work on their own files with direct commits: we prefer you don't fork our repo, or use the Pull Request system to edit in ph-submissions. You can make direct commits to your file here: /en/drafts/translations/timeline-summarization-web-python.md. @charlottejmc and I can help if you encounter any practical problems!

When you and Caio are all happy with the revised draft, we will move forward to Phase 4: Open Peer Review.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@caiocmello) 
All  Phase 2 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Authors (@dcgomes + @rncampos)  
Expected completion date? : Aug 3
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC) 
Expected timeframe? : ~60 days after request is accepted

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

anisa-hawes avatar Jul 03 '24 12:07 anisa-hawes

Hello Daniel @dcgomes and Ricardo @rncampos,

Have you had a chance to consider my initial feedback on your translation?

Please let us know your thoughts, and whether you feel you have capacity to work on these adjustments for the English version of your lesson in the coming months.

Thank you.

caiocmello avatar Jul 24 '24 17:07 caiocmello

Hi @dcgomes and @rncampos, as Managing Editor, I'm stepping in at this point to help progress this ticket. Since we haven't heard from you since June, I believe the best course of action would be for this ticket to be closed. This lesson was accepted outside of our normal submission process because we hoped it would move forward quickly, but that doesn't seem to be the case.

There are some substantial revisions for you to consider. If you can address those changes, we'd encourage you to resubmit this lesson in our future call that will be going out in the next few weeks, with a deadline of early next year.

If the revisions we've requested are taken into consideration and fully implemented, we can aim to prioritize editing and hopefully publishing your lesson next year. Please let us know if you have any thoughts on next steps, and if we don't hear from you in the next couple weeks, we will be closing this ticket. If you have further questions or thoughts, please feel free to reach out to me at [email protected].

hawc2 avatar Oct 04 '24 15:10 hawc2

Hello Daniel @dcgomes and Ricardo @rncampos,

We are closing this Issue due to inactivity. Please write to our Managing Editor @hawc2 if you decide that you'd like to return to work on this translation in 2025.

With our thanks, Anisa

anisa-hawes avatar Dec 05 '24 16:12 anisa-hawes