French morphologizer mislabeling future, conditional, imperative #13717
Replies: 1 comment
-
|
You could try the pipeline i've trained, which is available here: https://github.com/thjbdvlt/solipCysme I've made it because I have the same issue you. It's trained mostly on novels (19e-21e) and texts with a lot of interactions, personnal pronouns and differents moods. The morphologizer uses HunSpell output ( It has some limits. First, there is no Of course I could provide some information if you need it. Let me know! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm using spaCy to model French conversations, and I see that the morphologizer is not performing as well as I'd expect for unambiguous irrealis forms (specifically future tense, conditional mood, imperatives). I understand the underlying reason is probably that these forms aren't frequent in the training data, but are there any potential updates or recommended workarounds?
For example:
POS/TAG). This is similar to [French morphologizer] Mislabelisation of Mood=Imp|Number=Sing|Tense=Present #8147, but broader in that "Remplacez" is unambiguously a verb.MORPHcontainsMood=Imp,Tense=Pres), even though it is unambiguously the future.MORPHcontainsMood=Ind,Tense=Fut), even though it is unambiguously the conditional.I'm working with a set of ~160 common French verbs and tested their whole paradigms in this way. 98% of infinitives are recognized correctly, but only 13% of second person plural imperatives (34% even had incorrect POS like in the example above), 37% of future tense, and 7% of conditional mood forms. Sure enough, I see that these three categories are uncommon in the UD French Sequoia data.
How to reproduce the behaviour
Info about spaCy
spaCy version: 3.8.2
Python version: 3.11.9
Pipelines: fr_core_news_sm (3.8.0)
Beta Was this translation helpful? Give feedback.
All reactions