i7h icon indicating copy to clipboard operation
i7h copied to clipboard

[Feature] 添加反向接口

Open Nigh opened this issue 2 years ago • 3 comments

将i18n单词还原回原始单词。

正不正确不重要,随便找个能对上的还原回去就行了。效果应该很有意思。

Nigh avatar Apr 10 '24 12:04 Nigh

h!

RimoChan avatar Apr 10 '24 12:04 RimoChan

只针对英文的话可以拿个小词典来遍历。如果要考虑 CJK 的话可能就比较麻烦。

import nltk
import re
from typing import Optional

# Ensure WordNet resource is downloaded
try:
    nltk.data.find('corpora/wordnet.zip')
except LookupError:
    print("WordNet not found; downloading...")
    nltk.download('wordnet')


def to_i18n(word: str) -> str:
    """
    Convert a word to its i18n form (internationalization -> i18n).

    Parameters:
    word (str): The word to convert to i18n form.

    Returns:
    str: The i18n form of the word.
    """
    if len(word) > 2:
        return f"{word[0]}{len(word) - 2}{word[-1]}"
    else:
        return word


def from_i18n(i18n_word: str) -> Optional[str]:
    """
    Attempt to restore an i18n word to its original form using nltk's wordnet.

    Parameters:
    i18n_word (str): The i18n word to restore.

    Returns:
    str: The possible restored word if found, otherwise None.

    Raises:
    ValueError: If the i18n_word does not match the expected i18n format.
    """
    # Match the i18n word pattern
    match = re.match(r'([a-zA-Z])(\d+)([a-zA-Z])', i18n_word)
    if not match:
        raise ValueError('Invalid i18n format')

    start, length, end = match.groups()
    length = int(length) + 2  # Account for the first and last letter

    # Search for a matching word in the WordNet dictionary
    for word in nltk.corpus.wordnet.words():
        if (len(word) == length and word.startswith(start) and
                word.endswith(end) and '_' not in word):
            return word

    return None


if __name__ == "__main__":
    # Example usage
    original_word = "internationalization"
    i18n_word = to_i18n(original_word)
    try:
        restored_word = from_i18n(i18n_word)
        print(f"Original word: {original_word}")
        print(f"I18n word: {i18n_word}")
        print(f"Possible restored word: {restored_word}")
    except ValueError as e:
        print(f"Error: {e}")

输出(输出原始单词只是凑巧):

$ python3 foo.py

Original word: internationalization
I18n word: i18n
Possible restored word: internationalisation

corenel avatar Apr 11 '24 13:04 corenel

@corenel 提pr呀。应该把可能的单词都枚举出来再在里面随机,效果会好点。对于整篇的i18n文章,每次都能生成跟原文长度一模一样的截然不同的文章。

Nigh avatar Apr 11 '24 15:04 Nigh