lmms icon indicating copy to clipboard operation
lmms copied to clipboard

Plugin search - Diacritical marks, accents

Open zonkmachine opened this issue 3 years ago • 20 comments

Make plugin search aware of diacritical marks, accents.

Enhancement Summary

Make letters with diactitical marks turn up in search results sorted as if it was not having a diacritical mark, base letter. Possibly make this sorting optional (setting). In some languages diacritics affects sorting and then in some others they don't.. We would probably need to roll our own as QString::localeAwareCompare() may not work: https://doc.qt.io/qt-5/qstring.html#localeAwareCompare-1 https://stackoverflow.com/questions/286921/efficiently-replace-all-accented-characters-in-a-string/9667817#9667817

Justification

Not many plugins use letters in the name other than the ones provided by the English language but when they do the result is quite confusing. The only example out there I've come across so far is Rézonateur by https://github.com/jpcima/rezonateur

In LV2 plugins there is support for localization of the name but in case this is not implemented or has been missed it would be good to catch the odd letters. In the case of the plugin above it will not turn up in a search for 'rezonateur' and if you scan the list of plugins it will also not turn up around re* but later. You may miss that the plugin installation succeeded.

reznateur

zonkmachine avatar Aug 27 '22 22:08 zonkmachine

We would probably need to roll our own as QString::localeAwareCompare() may not work

What makes you say that?

allejok96 avatar Aug 28 '22 19:08 allejok96

What makes you say that?

This comment by @tresf in discord/dev-only

I was hoping QString's localCompare would work, but from what I'm reading it doesn't.... Instead, I think we need to roll our own... Here's a decent thread: https://stackoverflow.com/a/9667817/3196753

zonkmachine avatar Aug 28 '22 19:08 zonkmachine

https://stackoverflow.com/questions/14009522/how-to-remove-accents-diacritic-marks-from-a-string-in-qt https://stackoverflow.com/questions/12278448/removing-accents-from-a-qstring

Seems to be possible to convert é to e but not œ to oe for example

allejok96 avatar Aug 28 '22 20:08 allejok96

OK. I'm all for a partial/pragmatic fix that is easier to implement.

zonkmachine avatar Aug 28 '22 20:08 zonkmachine

I fooled around with it a bit and the sorting part was easy. Add a hidden column with the name stripped from accents and other nonsense. I don't know about the search though. Grab da code...

rezonateur2

diff --git a/src/gui/modals/EffectSelectDialog.cpp b/src/gui/modals/EffectSelectDialog.cpp
index 2b5c0c58d..324d85459 100644
--- a/src/gui/modals/EffectSelectDialog.cpp
+++ b/src/gui/modals/EffectSelectDialog.cpp
@@ -73,30 +73,38 @@ EffectSelectDialog::EffectSelectDialog( QWidget * _parent ) :
        // and fill our source model
        m_sourceModel.setHorizontalHeaderItem( 0, new QStandardItem( tr( "Name" ) ) );
        m_sourceModel.setHorizontalHeaderItem( 1, new QStandardItem( tr( "Type" ) ) );
+       m_sourceModel.setHorizontalHeaderItem( 2, new QStandardItem( tr( "namestripped" ) ) );
        int row = 0;
        for( EffectKeyList::ConstIterator it = m_effectKeys.begin();
                                                it != m_effectKeys.end(); ++it )
        {
                QString name;
                QString type;
+               QString namestripped;
                if( it->desc->subPluginFeatures )
                {
                        name = it->displayName();
                        type = it->desc->displayName;
+                       namestripped = name.normalized(QString::NormalizationForm_KD);
                }
                else
                {
                        name = it->desc->displayName;
                        type = "LMMS";
+                       namestripped = name;
                }
                m_sourceModel.setItem( row, 0, new QStandardItem( name ) );
                m_sourceModel.setItem( row, 1, new QStandardItem( type ) );
+               m_sourceModel.setItem( row, 2, new QStandardItem( namestripped ) );
                ++row;
        }
 
        // setup filtering
        m_model.setSourceModel( &m_sourceModel );
        m_model.setFilterCaseSensitivity( Qt::CaseInsensitive );
 
        connect( ui->filterEdit, SIGNAL( textChanged( const QString& ) ),
                                &m_model, SLOT( setFilterFixedString( const QString& ) ) );
@@ -128,7 +136,8 @@ EffectSelectDialog::EffectSelectDialog( QWidget * _parent ) :
                                                        QHeaderView::Stretch );
        ui->pluginList->horizontalHeader()->setSectionResizeMode( 1,
                                                QHeaderView::ResizeToContents );
-       ui->pluginList->sortByColumn( 0, Qt::AscendingOrder );
+       ui->pluginList->sortByColumn( 2, Qt::AscendingOrder );
+       ui->pluginList->setColumnHidden(2, true);
 
        updateSelection();
        show();

zonkmachine avatar Aug 29 '22 00:08 zonkmachine

Would we need characters like æ be replaced by ae, a, or e, not just any one? Because when thinking of æ a person can type any of them.

Monospace-V avatar Aug 30 '22 10:08 Monospace-V

We should really use QString::localeAwareCompare() for sorting, as characters like ñ ø ä are actual characters in foreign alphabets in contrast to é which is just an e. Normalizing can be used for search tho.

I don't know if we should even bother to make a custom function to convert double letters like æ. First find a plugin with æ in its name.

If we decide to do it anyway, just go with ae. For better matching we should have fuzzy search to correct spelling errors, that would make a way better user experience than just æ.

allejok96 avatar Aug 30 '22 12:08 allejok96

Would we need characters like æ be replaced by ae, a, or e, not just any one? Because when thinking of æ a person can type any of them.

æ is usually substituted with ae.

I don't know if we should even bother to make a custom function to convert double letters like æ. First find a plugin with æ in its name.

I wouldn't go to great lengths to fix that. I've found two plugins that stick out in the crowd Rézonateur with it's accent and μ-law compressor. I'm pretty sure whoever named μ-law knew it would turn up last in the list. If someone from Scandinavia sticks an æ in there they did it just for the hell of it and expect trouble like this.

We should really use QString::localeAwareCompare() for sorting.

I'm pretty sure I started fiddling with localeAwareCompare() the other day but I couldn't get it to work. This is however QSortFilterProxyModel::setSortLocaleAware().

This line achieves what the patch above did but in one line. It still doesn't affect search.

        m_model.setFilterCaseSensitivity( Qt::CaseInsensitive );
+       m_model.setSortLocaleAware(true);

zonkmachine avatar Aug 30 '22 16:08 zonkmachine

æ is usually substituted with ae.

And if you substitute it like that it will pop up in the wrong place in the sort from a Scandinavian point of view. Letters Å, Ä, Æ, Ö and Ø, are all sorted in the end of their respective alphabets. So they are more correctly sorted by not touching them. From a Scandinavian point of view that is. If your not from Scandinavia and want to test your new plugin 'Ångest' you may think of Å as sorting like A and this is what sortLocaleAware() does. I don't think there is a good way to solve this from our side. Fixing only accents is reasonable. Ligatures? Not so much.

This problem, if it arises, is probably more up-stream than anything else. Give plugins a proper, english sortable, working title and call it whatever you want on the gui, in the manual, and on your homepage. That should work for most people.

zonkmachine avatar Aug 31 '22 07:08 zonkmachine

And if you substitute it like that it will pop up in the wrong place in the sort from a Scandinavian point of view.

Sort =! Search. Quoting discord Convo

Naturally, you'd need to support the original/diactrical versions as well

Sorting is weight-based and UIs have ways of rolling our own. AFAIR, this topic is more about searching than it is about sorting.

tresf avatar Aug 31 '22 16:08 tresf

AFAIR, this topic is more about searching than it is about sorting.

Well, when I made the issue I thought the code behind the sort and search was more or less the same. I no longer hold that belief.

zonkmachine avatar Aug 31 '22 18:08 zonkmachine

This line achieves what the patch above did but in one line. It still doesn't affect search.

Since zonkmachine has found the solution I don't see why we're still discussing that.

Here's the other missing part:

m_sourceModel.setItem(row, 2, new QStandardItem(name + name.normalized(QString::NormalizationForm_KD)));
m_model.setFilterKeyColumn(2);

allejok96 avatar Aug 31 '22 18:08 allejok96

Here's the other missing part:

m_sourceModel.setItem(row, 2, new QStandardItem(name + name.normalized(QString::NormalizationForm_KD)));
m_model.setFilterKeyColumn(2);

Cool! But I couldn't make that work over here. Do you think you could pull something and add me as a reviewer?

zonkmachine avatar Aug 31 '22 18:08 zonkmachine

Ah according to the stackoverflow link you also need to remove special characters. Tested:

	for( EffectKeyList::ConstIterator it = m_effectKeys.begin();
						it != m_effectKeys.end(); ++it )
	{
		QString name;
		QString type;
		if( it->desc->subPluginFeatures )
		{
			name = it->displayName();
			type = it->desc->displayName;
		}
		else
		{
			name = it->desc->displayName;
			type = "LMMS";
		}
		m_sourceModel.setItem( row, 0, new QStandardItem( name ) );
		m_sourceModel.setItem( row, 1, new QStandardItem( type ) );
+		QString normalized = name.normalized(QString::NormalizationForm_KD);
+		normalized.remove(QRegExp("[^a-zA-Z\\s]"));
+		m_sourceModel.setItem( row, 2, new QStandardItem(normalized));
		++row;
	}

	// setup filtering
	m_model.setSourceModel( &m_sourceModel );
	m_model.setFilterCaseSensitivity( Qt::CaseInsensitive );
+	m_model.setFilterKeyColumn(2);


	connect( ui->filterEdit, SIGNAL( textChanged( const QString& ) ),
				&m_model, SLOT( setFilterFixedString( const QString& ) ) );
	connect( ui->filterEdit, SIGNAL( textChanged( const QString& ) ),
					this, SLOT(updateSelection()));
	connect( ui->filterEdit, SIGNAL( textChanged( const QString& ) ),
							SLOT(sortAgain()));

	ui->pluginList->setModel( &m_model );
+	ui->pluginList->setColumnHidden(2, true);

allejok96 avatar Aug 31 '22 19:08 allejok96

Getting there, but now you can't search for Rézonateur with an accent.

zonkmachine avatar Aug 31 '22 21:08 zonkmachine

I guess you need to also normalize the search string.

zonkmachine avatar Sep 03 '22 08:09 zonkmachine

but now you can't search for Rézonateur with an accent

If you combined both the normalized and original string in the hidden column you could search for both

I guess you need to also normalize the search string

You could absolutely, but personally, if I typed an Ö I wouldn't expect it to match O

allejok96 avatar Sep 03 '22 18:09 allejok96

If you combined both the normalized and original string in the hidden column you could search for both

m_sourceModel.setItem( row, 2, new QStandardItem(normalized + name)); Oh, its basic string concatenation. Got it!

You could absolutely, but personally, if I typed an Ö I wouldn't expect it to match O

No, but you wouldn't expect O to match Ö either. The things around French vowels you can (probably) just discard in the search but not the Swedish ones. It's a pretty tricky problem. I find it tempting to just let it be and forward it up-stream. Change the name of the plugins in the LV2 .ttl file, should a problem of this type arise.

zonkmachine avatar Sep 03 '22 22:09 zonkmachine

No, but you wouldn't expect O to match Ö either.

Please allow me to humbly disagree. If there's a plugin collection called "björn collective", I'd expect this to match "bjorn". I'm not sure how much sense this makes in the native language, but for languages that don't use umlat, it's quite common, e.g. https://www.google.com/search?q=bjork

tresf avatar Sep 04 '22 15:09 tresf

Please allow me to humbly disagree. If there's a plugin collection called "björn collective", I'd expect this to match "bjorn". I'm not sure how much sense this makes in the native language,

Yeah, maybye. The more I've looked into this the more unsure I've become on what needs to be done. PS. I'm not going to pull anything from here because I don't feel on top of the issue.

zonkmachine avatar Sep 14 '22 18:09 zonkmachine