toto icon indicating copy to clipboard operation
toto copied to clipboard

UTF-8 Title to Slug Conversion

Open ahmozkya opened this issue 15 years ago • 3 comments

This code does not make the correct utf-8 conversion.

#...

def slugize
    self.downcase.gsub(/&/, 'and').gsub(/\s+/, '-').gsub(/[^a-z0-9-]/, '')
end

#...

The following link will probably help. http://code.djangoproject.com/browser/django/tags/releases/1.1.1/django/contrib/admin/media/js/urlify.js

ahmozkya avatar Mar 05 '10 15:03 ahmozkya

As this feature will include expansive RegExp work, I believe this must be implemented as "pluggable" option, e.g.:

set :utf8, true

So this will enable extra-worker for those who write non-ascii blogs :))

The second thing I guess we need to agree is how we are going to work with such titles. The easiest way is to implement one-way conversion, e.g. files would be in ascii format, like: 2011-08-13-proverka.txt, while titles will allow non-ascii chars without any problems, e.g. проверка. This way is easy, but looks more like a workround, so the best behavior (I believe) is to allow two-way conversion, so URL /2011/08/13/proverka will try to get file 2011-08-13-проверка.txt and then 2011-08-13-proverka.txt

ixti avatar Aug 13 '11 14:08 ixti

I also have this problem. It returns 404 because it cannot find articles with non ASCII characters in the title since the slugize method removes them, and def path at line 274 returns the slug. We have to deslug the path in def go.

sorin-ionescu avatar Feb 15 '12 02:02 sorin-ionescu

Something like bellow should work.

def article route
  path = self.articles.select do |article|
    File.basename(article, ".#{self[:ext]}").slugize.eql? route.join('-')
  end.last || File.join(Paths[:articles], "#{route.join('-')}.#{self[:ext]}")
  Article.new(path, @config).load
end

Unfortunately, there is an encoding issue. File.basename(article, ".#{self[:ext]}").slugize fails to equal route.join('-') in certain cases. Even though the string can look identical to the eye, slugize can generates different slugs.

sorin-ionescu avatar Feb 15 '12 19:02 sorin-ionescu