[BUG]
Describe the bug The get_playlist_info() function only retrieves the first 100 tracks from any playlist that contains more than 100 tracks. It seems the scraper does not handle the dynamic loading (infinite scroll) that Spotify's web player uses to display long playlists.
To Reproduce Steps to reproduce the behavior:
# Code that causes the issue
from spotify_scraper import SpotifyClient
import pprint
# A public playlist with over 100 songs, perfect for testing the limit.
# "This Is Bad Bunny"
playlist_url = "https://open.spotify.com/playlist/37i9dQZF1DX2apWzyECwyZ"
# Initialize the client. Using selenium as it's often needed for playlists.
client = SpotifyClient(browser_type="selenium")
try:
print(f"Fetching playlist: {playlist_url}")
playlist_info = client.get_playlist_info(playlist_url)
# Check the number of tracks returned by the scraper
track_count = len(playlist_info.get('tracks', []))
print(f"Playlist Name: {playlist_info.get('name')}")
print(f"Expected tracks: > 100 (actually 10,000+)")
print(f"Tracks returned by scraper: {track_count}")
# You can also print the last track to see where it stops
if track_count > 0:
pprint.pprint(playlist_info['tracks'][-1])
except Exception as e:
print(f"An error occurred: {e}")
finally:
client.close()
Expected behavior I expected get_playlist_info() to return a list containing all tracks from the specified playlist. For the example URL, this should be over 100 tracks.
Actual behavior The function successfully executes without errors but returns a list containing exactly 100 tracks. The len(playlist_info['tracks']) is always 100 for any playlist longer than that.
Error messages
No error messages are generated. The function fails silently by returning incomplete data.
Environment:
- OS: Windows 11 26100.4202
- Python version: 3.11
- SpotifyScraper version: 2.1.5
- Installation method: pip
Additional context This issue is likely caused by the fact that the Spotify web player dynamically loads tracks as the user scrolls down the page. The current implementation of the scraper seems to only parse the tracks that are present in the initial HTML DOM load, which is limited to the first 100 items. To get the full playlist, the scraper would need to simulate scrolling.
Possible solution The fix would likely require modifying the scraping logic within the get_playlist_info method to handle dynamic content. When using the Selenium backend, a possible implementation could be:
-
Load the playlist page.
-
Enter a loop that: a. Scrapes the currently visible tracks. b. Programmatically scrolls the page down (e.g., driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")). c. Waits for a brief moment for new content to load. d. Checks if new track elements have appeared in the DOM.
-
Exit the loop when scrolling no longer loads new tracks.
-
Consolidate and return the full list of scraped tracks.