Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to build a train to detect for Shan Language. #33

Open
saitawngpha opened this issue Sep 4, 2019 · 3 comments
Open

Want to build a train to detect for Shan Language. #33

saitawngpha opened this issue Sep 4, 2019 · 3 comments

Comments

@saitawngpha
Copy link

Dear,
I am interesting in Myanmar Tools that can detect Myanmar fonts with ML. I would like to build for Shan Language too.

Can you mention me where do I have to start?
Best,
STP

@sffc
Copy link
Collaborator

sffc commented Sep 4, 2019

Dear STP,

Yes, this would be a good feature to add. I would suggest the following two classifiers:

  1. Zawgyi versus Unicode (any language) -- what already exists.
  2. Unicode Burmese versus Unicode Shan

To add the Unicode Burmese versus Unicode Shan classifier to Myanmar Tools, you can:

  1. Download the training data as explained in the README
  2. Add methods to BurmeseData.java to read the my.txt (Burmese Unicode) and shn.txt (Shan Unicode) separately
  3. Remove the Category enum from ZawgyiUnicodeMarkovModelBuilder.java and replace it with a boolean at the call sites of trainOnString
  4. Make a copy of GenerateZawgyiUnicodeModelDAT.java named something like GenerateUnicodeBurmeseShanModelDAT.java, and have it load and train on the new data sets
  5. Add a target to Makefile that invokes your new Java function and saves the output to a new dat file named something like unicodeBurmeseShanModel.dat, and add logic to the copy-resources target to copy that file to the client implementations
  6. Add API to read from the new file in the various client implementations. You can start with just one client implementation, like Java. For example, copy ZawgyiDetector.java into a new file named something like ShanDetector.java, pointing it to your new unicodeBurmeseShanModel.dat file
  7. Add tests

Hope that helps!

@saitawngpha
Copy link
Author

Dear Shane F,

Thanks for your help. I will try it and when I have got some problem, I will ask your help.

1 similar comment
@saitawngpha
Copy link
Author

Dear Shane F,

Thanks for your help. I will try it and when I have got some problem, I will ask your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants