Create an application that recognizes the image (image) captured by the camera on Android in real time. Run the trained model on android using PyTorch Mobile.
This ↓

The sample app I made is listed at the bottom, so please take a look if you like.
First, add dependencies (as of February 2020) camera x and pytorch mobile
build.gradle
  def camerax_version = '1.0.0-alpha06'
    implementation "androidx.camera:camera-core:${camerax_version}"
    implementation "androidx.camera:camera-camera2:${camerax_version}"
    implementation 'org.pytorch:pytorch_android:1.4.0'
    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0'
Add the following to the end of the upper ** android {} **
build.gradle
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
After adding the dependency, we will implement the function to take a picture using ** Camera X **, a library that makes it easy to handle the camera on Android.
Below, we will implement the official Camera X Tutorial. Details are mentioned in other articles, so omit it and just the code.
Permission permission
<uses-permission android:name="android.permission.CAMERA" />
Place a button to start the camera and textureView for preview display

activity_main.xml
<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">
    <TextureView
        android:id="@+id/view_finder"
        android:layout_width="0dp"
        android:layout_height="0dp"
        android:layout_marginBottom="16dp"
        app:layout_constraintBottom_toTopOf="@+id/activateCameraBtn"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />
    <androidx.constraintlayout.widget.ConstraintLayout
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:alpha="0.7"
        android:animateLayoutChanges="true"
        android:background="@android:color/white"
        app:layout_constraintEnd_toEndOf="@+id/view_finder"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="@+id/view_finder">
        <TextView
            android:id="@+id/inferredCategoryText"
            android:layout_width="0dp"
            android:layout_height="wrap_content"
            android:layout_marginStart="8dp"
            android:layout_marginTop="16dp"
            android:layout_marginEnd="8dp"
            android:text="Inference result"
            android:textSize="18sp"
            app:layout_constraintEnd_toEndOf="parent"
            app:layout_constraintStart_toStartOf="parent"
            app:layout_constraintTop_toTopOf="parent" />
        <TextView
            android:id="@+id/inferredScoreText"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_marginStart="24dp"
            android:layout_marginTop="16dp"
            android:text="Score"
            android:textSize="18sp"
            app:layout_constraintStart_toStartOf="parent"
            app:layout_constraintTop_toBottomOf="@+id/inferredCategoryText" />
    </androidx.constraintlayout.widget.ConstraintLayout>
    <Button
        android:id="@+id/activateCameraBtn"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginBottom="16dp"
        android:text="Camera activation"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent" />
</androidx.constraintlayout.widget.ConstraintLayout>
use case Camera X offers three use cases: ** preview, image capture, and image analysis **. This time we will use preview and image analysis. It will be easier to sort out the code by matching it with the use case. By the way, the possible combinations are as follows. (From official documentation)
We will implement up to the preview of the use case of Camera X. Almost the same content as Tutorial.
MainActivity.kt
private const val REQUEST_CODE_PERMISSIONS = 10
private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)
class MainActivity : AppCompatActivity(), LifecycleOwner {
    private val executor = Executors.newSingleThreadExecutor()
    private lateinit var viewFinder: TextureView
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        viewFinder = findViewById(R.id.view_finder)
        //Camera activation
        activateCameraBtn.setOnClickListener {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                ActivityCompat.requestPermissions(
                    this, REQUIRED_PERMISSIONS, REQUEST_CODE_PERMISSIONS
                )
            }
        }
        viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
            updateTransform()
        }
    }
    private fun startCamera() {
        //Implementation of preview useCase
        val previewConfig = PreviewConfig.Builder().apply {
            setTargetResolution(Size(viewFinder.width, viewFinder.height))
        }.build()
        val preview = Preview(previewConfig)
        preview.setOnPreviewOutputUpdateListener {
            val parent = viewFinder.parent as ViewGroup
            parent.removeView(viewFinder)
            parent.addView(viewFinder, 0)
            viewFinder.surfaceTexture = it.surfaceTexture
            updateTransform()
        }
        /**We will implement the image analysis useCase here later.**/ 
        CameraX.bindToLifecycle(this, preview)
    }
    private fun updateTransform() {
        val matrix = Matrix()
        val centerX = viewFinder.width / 2f
        val centerY = viewFinder.height / 2f
        val rotationDegrees = when (viewFinder.display.rotation) {
            Surface.ROTATION_0 -> 0
            Surface.ROTATION_90 -> 90
            Surface.ROTATION_180 -> 180
            Surface.ROTATION_270 -> 270
            else -> return
        }
        matrix.postRotate(-rotationDegrees.toFloat(), centerX, centerY)
        //Reflected in textureView
        viewFinder.setTransform(matrix)
    }
    override fun onRequestPermissionsResult(
        requestCode: Int, permissions: Array<String>, grantResults: IntArray
    ) {
        if (requestCode == REQUEST_CODE_PERMISSIONS) {
            if (allPermissionsGranted()) {
                viewFinder.post { startCamera() }
            } else {
                Toast.makeText(
                    this,
                    "Permissions not granted by the user.",
                    Toast.LENGTH_SHORT
                ).show()
                finish()
            }
        }
    }
    private fun allPermissionsGranted() = REQUIRED_PERMISSIONS.all {
        ContextCompat.checkSelfPermission(
            baseContext, it
        ) == PackageManager.PERMISSION_GRANTED
    }
}
This time we will use the trained resnet18.
import torch
import torchvision
model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet.pt")
If it can be executed successfully, a file called resnet.pt will be generated in the same hierarchy. Image recognition is performed using this trained resnet18.
Put the downloaded model in the ** asset folder ** of android studio. (Since it does not exist by default, you can create it by right-clicking on the res folder-> New-> Folder-> Asset folder)
Write the ImageNet class in a file to infer and convert it to a class name. Create a new ** ImageNetClasses.kt ** and write 1000 classes of ImageNet in it. It's too long, so copy it from github.
ImageNetClasses.kt
class ImageNetClasses {
    var IMAGENET_CLASSES = arrayOf(
        "tench, Tinca tinca",
        "goldfish, Carassius auratus",
         //Abbreviation(Please copy from github)
        "ear, spike, capitulum",
        "toilet tissue, toilet paper, bathroom tissue"
    )
}
Next, we will implement image analysis of the use case of Camera X. Create a new file called ImageAnalyze.kt and perform image recognition processing.
In the flow, it feels like loading the model and converting the preview image to a tensor so that it can be used with pytorch mobile with image analysis use case, passing it through the model loaded from the asset folder earlier, and getting the result.
After that, I wrote an interface and a custom listener to reflect the inference result in the view. (I don't know how to write correctly around here, so please let me know if there is a smart way to write it.)
ImageAnalyze.kt
class ImageAnalyze(context: Context) : ImageAnalysis.Analyzer {
    private lateinit var listener: OnAnalyzeListener    //Custom listener for updating View
    private var lastAnalyzedTimestamp = 0L
    //Network model model loading
    private val resnet = Module.load(getAssetFilePath(context, "resnet.pt"))
    interface OnAnalyzeListener {
        fun getAnalyzeResult(inferredCategory: String, score: Float)
    }
    override fun analyze(image: ImageProxy, rotationDegrees: Int) {
        val currentTimestamp = System.currentTimeMillis()
        if (currentTimestamp - lastAnalyzedTimestamp >= 0.5) {  // 0.Infer every 5 seconds
            lastAnalyzedTimestamp = currentTimestamp
            //Convert to tensor(I checked the image format and found YUV_420_It was called 888)
            val inputTensor = TensorImageUtils.imageYUV420CenterCropToFloat32Tensor(
                image.image,
                rotationDegrees,
                224,
                224,
                TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
                TensorImageUtils.TORCHVISION_NORM_STD_RGB
            )
            //Infer with a trained model
            val outputTensor = resnet.forward(IValue.from(inputTensor)).toTensor()
            val scores = outputTensor.dataAsFloatArray
            var maxScore = 0F
            var maxScoreIdx = 0
            for (i in scores.indices) { //Get the index with the highest score
                if (scores[i] > maxScore) {
                    maxScore = scores[i]
                    maxScoreIdx = i
                }
            }
            //Get the category name from the score
            val inferredCategory = ImageNetClasses().IMAGENET_CLASSES[maxScoreIdx]
            listener.getAnalyzeResult(inferredCategory, maxScore)  //Update View
        }
    }
    ////Function to get the path from the asset file
    private fun getAssetFilePath(context: Context, assetName: String): String {
        val file = File(context.filesDir, assetName)
        if (file.exists() && file.length() > 0) {
            return file.absolutePath
        }
        context.assets.open(assetName).use { inputStream ->
            FileOutputStream(file).use { outputStream ->
                val buffer = ByteArray(4 * 1024)
                var read: Int
                while (inputStream.read(buffer).also { read = it } != -1) {
                    outputStream.write(buffer, 0, read)
                }
                outputStream.flush()
            }
            return file.absolutePath
        }
    }
    fun setOnAnalyzeListener(listener: OnAnalyzeListener){
        this.listener = listener
    }
}
I was confused because the image was an unfamiliar type called ImageProxy, but when I checked the format, I thought that I had to convert it to bitmap with YUV_420_888, but pytorch mobile has a method to convert from YUV_420 to tensor, and it can be easily inferred just by throwing it in. It was.
By the way, if you look at the code, you may have thought that it is real-time, but every 0.5 seconds ..
Introduced the ImageAnalyze class created earlier to Camera X as a use case, and finally implemented the interface of the ImageAnalyze class in MainActivity using an anonymous object, and completed it so that the view can be updated.
Add the following code to the end of onCreate. (At the top, I commented "/ ** I will implement the image analysis useCase here ** /" later)
MainActivity.kt
        //Implementation of image analysis useCase
        val analyzerConfig = ImageAnalysisConfig.Builder().apply {
            setImageReaderMode(
                ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE
            )
        }.build()
        //instance
        val imageAnalyzer = ImageAnalyze(applicationContext)
        //Display inference results
        imageAnalyzer.setOnAnalyzeListener(object : ImageAnalyze.OnAnalyzeListener {
            override fun getAnalyzeResult(inferredCategory: String, score: Float) {
                //Change the view from other than the main thread
                viewFinder.post {
                    inferredCategoryText.text = "Inference result: $inferredCategory"
                    inferredScoreText.text = "Score: $score"
                }
            }
        })
        val analyzerUseCase = ImageAnalysis(analyzerConfig).apply {
            setAnalyzer(executor, imageAnalyzer)
        }
        //useCase is preview and image analysis
        CameraX.bindToLifecycle(this, preview, analyzerUseCase)  //Added image analysis to use case
Complete! !! If you can implement it so far, the application at the beginning should be completed. Please play around with it.
This code is listed on github, so please refer to it as appropriate.
Camera X Really convenient! You can easily perform image analysis in combination with pytroch mobile. It can't be helped that the processing makes it heavier. If you can prepare a model, you can easily make various image recognition applications using a camera. After all, I wonder if it is quick to make an application using that model such as transfer learning.
I want to make and release a machine learning application ... ~~ We plan to make a sample app in the near future. (Currently under review) ~~
I added it because it passed the examination. I tried to incorporate the content written in this article into the app. It is published on the Play Store.
If you want to experience it quickly, or if you are willing to download it, we would appreciate it if you could download it.
Play Store: Object Analyzer
English and Japanese support
      
To be honest, there is a big difference between what can be judged and what cannot be judged ...
Recommended Posts