i7h
i7h copied to clipboard
[Feature] 添加反向接口
将i18n单词还原回原始单词。
正不正确不重要,随便找个能对上的还原回去就行了。效果应该很有意思。
h!
只针对英文的话可以拿个小词典来遍历。如果要考虑 CJK 的话可能就比较麻烦。
import nltk
import re
from typing import Optional
# Ensure WordNet resource is downloaded
try:
nltk.data.find('corpora/wordnet.zip')
except LookupError:
print("WordNet not found; downloading...")
nltk.download('wordnet')
def to_i18n(word: str) -> str:
"""
Convert a word to its i18n form (internationalization -> i18n).
Parameters:
word (str): The word to convert to i18n form.
Returns:
str: The i18n form of the word.
"""
if len(word) > 2:
return f"{word[0]}{len(word) - 2}{word[-1]}"
else:
return word
def from_i18n(i18n_word: str) -> Optional[str]:
"""
Attempt to restore an i18n word to its original form using nltk's wordnet.
Parameters:
i18n_word (str): The i18n word to restore.
Returns:
str: The possible restored word if found, otherwise None.
Raises:
ValueError: If the i18n_word does not match the expected i18n format.
"""
# Match the i18n word pattern
match = re.match(r'([a-zA-Z])(\d+)([a-zA-Z])', i18n_word)
if not match:
raise ValueError('Invalid i18n format')
start, length, end = match.groups()
length = int(length) + 2 # Account for the first and last letter
# Search for a matching word in the WordNet dictionary
for word in nltk.corpus.wordnet.words():
if (len(word) == length and word.startswith(start) and
word.endswith(end) and '_' not in word):
return word
return None
if __name__ == "__main__":
# Example usage
original_word = "internationalization"
i18n_word = to_i18n(original_word)
try:
restored_word = from_i18n(i18n_word)
print(f"Original word: {original_word}")
print(f"I18n word: {i18n_word}")
print(f"Possible restored word: {restored_word}")
except ValueError as e:
print(f"Error: {e}")
输出(输出原始单词只是凑巧):
$ python3 foo.py
Original word: internationalization
I18n word: i18n
Possible restored word: internationalisation
@corenel 提pr呀。应该把可能的单词都枚举出来再在里面随机,效果会好点。对于整篇的i18n文章,每次都能生成跟原文长度一模一样的截然不同的文章。